Bypassing ActiveResource objects in a MySQL query

So, over at Directed Edge we were having problems with some new goodies we’re working on that are trying to pull data out of a MySQL database and do some analysis on them in Ruby.

There seemed to be a few options:

  • Do the data analysis directly in SQL. SQL would have been sufficient for our needs, but it’s a bit cumbersome. After futzing with a couple of our critical analysis runs and trying to shoehorn them into SQL, after about an hour of trying (and getting close) it made it clear that we’d be stuck with SQL gymnastics for each and every analysis we needed to run.
  • Don’t do the analysis in Ruby at all. Write a simple plugin interface to drop down to C or C++ to do the analysis there.
  • Figure out the hottest part of the queries and see if we could get most of the benefit for the least amount of effort. Bonus points for being minimally invasive.

The last of those ended up being a working strategy, and surprisingly non-invasive.

We’re often pulling hundreds of thousands of rows across the ActiveRecord border, and doing that was dog slow. Most of the time was in ActiveRecord creating an object for each and every record. Working around that object creation turned out to be pretty easy. This is all the code that was needed:

module ActiveRecord
  module Structs
    def structs(&row_handler)
      results = connection.raw_connection.query(to_sql, :as => :array)
      klass = value_class(results.fields)
      results.each { |row| row_handler.call(klass.new(row)) }
    end

    private

    def value_class(fields)
      i = -1
      name = fields.map { |f| f.to_s.classify }.join + 'Values'
      constructor = 'def initialize(values) ; @values = values ; end'
      accessors = fields.map do |f|
        "def #{f} ; @values[#{i += 1}] ; end"
      end.join(" ; ")
      eval("class #{name} ; #{constructor} ; #{accessors} ; end ; #{name}")
    end
  end
end

What that does is drop down to the raw MySQL connection, which returns an array of values, and uses eval() to create a special purpose class with accessors in the same places as the ActiveRecord object would have. Since those methods are not dynamically invoked, they’re a good bit faster. Also since the object initialization is just one single reference copy (to the array of values), it’s pretty fast too.

All of the code that we had to change in the classes that were using those queries was from:

Foo.select(:bar, :baz).each { |f| ... }

To:

Foo.select(:bar, :baz).structs { |f| ... }

Using eval() in that way is a bit nasty, but for the 4x performance boost, and minimal invasiveness, we’ll take it!

Read More