A Smarter has_many :through?
At work, I’ve been expending a lot of effort on this complicated search functionality where you can enter a search phrase that will full-text search over one model’s fields (we’re using texticle [github], which is awesome) and limit the results by which other models are involved relationally. Sort of like searching Amazon for “green converse” and choosing the “shoes” category.
The object graph behind this is pretty complicated and it’s been a real education in SQL trying to make sure the query that gets generated is both reasonably speedy and right. Several times, I’ve gotten it “working” only to realize I was joining in some table more than once and so either returning some record twice or excluding it when I shouldn’t’ve or joining all rows against all rows and, thus, making everything pass all constraints. My SQL skill has leveled up several times throughout, though, which has been really awesome. This is mostly because I was hand-writing a lot of the join SQL with table aliases and whatnot.
Example
The other day, I realized that Rails 3 (or, anyway, the 3.1 release candidates, which is what this app is using) will let you do something that earlier versions would not: do a has_many :through
relation on another has_many :through
. Say you’ve got Departments composed of Employees. Employees work in groups to create Widgets and which, in turn, get Tags. You can do this number:
class Widget < AR::Base
has_many :tags
has_many :employees
has_many :departments, :through => :employees
end
class Tag < AR::Base
belongs_to :widget
has_many :employees, :through => :widget
has_many :departments, :through => :employees
end
class Employee < AR::Base
belongs_to :widget
belongs_to :department
has_many :tags, :through => :widget
end
class Department < AR::Base
has_many :employees
has_many :widgets, :through => :employees
has_many :tags, :through => :widgets
end
Which enables stuff like:
Department.joins(:tags).where(:tag => { :id => params[:tag_id] })
The SQL
So, though, since I was hand-writing my JOIN
statements before, I’m clearly concerned with what, exactly, it’s going to execute against the database. So I pulled out good ol’ ActiveRecord::Base#to_sql
to see. Here’s what I got (edited without all the quoting and with newlines):
SELECT departments.* FROM departments
INNER JOIN employees ON employees.department_id = departments.id
INNER JOIN widgets ON widgets.id = employees.widget_id
INNER JOIN tags ON tags.widget_id = widgets.id
WHERE tags.id = 3
Hopefully, that query is pretty straight forward and you can see how ActiveRecord has decided how to make all those joins. However, something struck me: I’m joining through with widgets
table, but both employees
and tags
already have widget_id
on them. I’d rather have seen something like:
SELECT departments.* FROM departments
INNER JOIN employees ON employees.department_id = departments.id
INNER JOIN tags ON tags.widget_id = employees.widget_id
WHERE tags.id = 3
The result set should be the same and it’s slightly faster. In this example, joining through the extra table wouldn’t be a big hit, probably, but if we’ve got more objects all related to Widgets and many are, like Departments, related through some other object, we might be (and in my case often are) joining many more tables, so if we can eliminate middle-man joins, it can have an appreciable effect on the query’s speed.
How We Do It
So it turns out you can make ActiveRecord generate the above SQL. You don’t want has_many :through
for the second association. If you do like this:
class Department < AR::Base
has_many :employees
has_many :widgets, :through => :employees
# has_many :tags, :through => :widgets
has_many :tags, :foreign_key => :widget_id, :primary_key => :widget_id
end
You can use the same ActiveRecord query syntax from above to generate the second SQL example. It’s a lot of typing, though, so I wondered: Wouldn’t it be awesome if ActiveRecord knew you when you had this matching-middle-man-foreign-key situation in a query and generated the leaner SQL?
I’m not sure if there are pitfalls to this I’m not seeing (especially related to
uses outside what I’m doing with it right now), but I’ve started digging around
in the Rails source to see where it’s thinking about these kinds of things (led
me to lib/active_record/associations/join_dependency/join_association.rb:72
so
far). I’d love some thoughts and feedback on these ideas or guidance in my
code-diving efforts. I expect I may end up in the Arel source at some point…
we’ll see where it takes me.
I don't have comments on this blog, but if you hit me up via one of the contact methods in the footer, I'm happy to talk about or amend these posts.