At work, I’ve been expending a lot of effort on this complicated search functionality where you can enter a search phrase that will full-text search over one model’s fields (we’re using texticle [github], which is awesome) and limit the results by which other models are involved relationally. Sort of like searching Amazon for “green converse” and choosing the “shoes” category.
The object graph behind this is pretty complicated and it’s been a real education in SQL trying to make sure the query that gets generated is both reasonably speedy and right. Several times, I’ve gotten it “working” only to realize I was joining in some table more than once and so either returning some record twice or excluding it when I shouldn’t’ve or joining all rows against all rows and, thus, making everything pass all constraints. My SQL skill has leveled up several times throughout, though, which has been really awesome. This is mostly because I was hand-writing a lot of the join SQL with table aliases and whatnot.
The other day, I realized that Rails 3 (or, anyway, the 3.1 release candidates, which is what this app is using) will let you do something that earlier versions would not: do a
has_many :through relation on another
has_many :through. Say you’ve got Departments composed of Employees. Employees work in groups to create Widgets and which, in turn, get Tags. You can do this number:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Which enables stuff like:
So, though, since I was hand-writing my
JOIN statements before, I’m clearly concerned with what, exactly, it’s going to execute against the database. So I pulled out good ol’
ActiveRecord::Base#to_sql to see. Here’s what I got (edited without all the quoting and with newlines):
1 2 3 4 5
Hopefully, that query is pretty straight forward and you can see how ActiveRecord has decided how to make all those joins. However, something struck me: I’m joining through with
widgets table, but both
tags already have
widget_id on them. I’d rather have seen something like:
1 2 3 4
The result set should be the same and it’s slightly faster. In this example, joining through the extra table wouldn’t be a big hit, probably, but if we’ve got more objects all related to Widgets and many are, like Departments, related through some other object, we might be (and in my case often are) joining many more tables, so if we can eliminate middle-man joins, it can have an appreciable effect on the query’s speed.
How We Do It
So it turns out you can make ActiveRecord generate the above SQL. You don’t want
has_many :through for the second association. If you do like this:
1 2 3 4 5 6 7
You can use the same ActiveRecord query syntax from above to generate the second SQL example. It’s a lot of typing, though, so I wondered: Wouldn’t it be awesome if ActiveRecord knew you when you had this matching-middle-man-foreign-key situation in a query and generated the leaner SQL?
I’m not sure if there are pitfalls to this I’m not seeing (especially related to
uses outside what I’m doing with it right now), but I’ve started digging around
in the Rails source to see where it’s thinking about these kinds of things (led
far). I’d love some thoughts and feedback on these ideas or guidance in my
code-diving efforts. I expect I may end up in the Arel source at some point…
we’ll see where it takes me.