Robot Has No Heart

Xavier Shay blogs here

A robot that does not have a heart

Logging SQL statistics in rails

When your sysadmin comes to you whinging with a valid concern that your app is reading 60 gazillion records from the DB, you kinda wish you had a bit more information than % time spent in the DB. So I wrote a plugin that counts both the number of selects/updates/inserts/deletes and also the number of records affected.

1
git clone git://github.com/xaviershay/sql-counter.git vendor/plugins/sql_counter

That does the counting, you need to decide how to log it. I am personally quite partial to adding it to the request log line, thus getting stats per request:

1
2
3
4
5
# vendor/rails/actionpack/lib/action_controller/benchmarking.rb:75
log_message << " | Select Records: #{ActiveRecord::Base.connection.select_record_count}"
log_message << " | Selects: #{ActiveRecord::Base.connection.select_count}"

ActiveRecord::Base.connection.reset_counters!

Don’t forget the last line, otherwise you get cumulative numbers. That may be handy, but I doubt it. We’re only logging selects because that’s all we care about at the moment. I am sure this will change in time.

UPDATE: Moved to github, bzr repo is no longer available

A Banana a Day

SQL Optimization

Consider the following join (a is a char(32), b is varchar):

1
select * from t1, t2 where instr(t1.a, t2.b) = 0

The actual code I worked with was a bit more complex, but essentially that is it. It works fine, however performance is rather lacking. This is because for each for every row in t2, the DBMS must perform the instr() function for every row in t1 to check if t1.a is in t2.b until it finds a match.

If a field is used in a function, any indexes on that field cannot be used

That’s bad. In this case, since a is fixed length, we can rewrite the query thusly:

1
select * from t1, t2 where substr(t1.a,0,32) = t2.b

This way, the substr is only performed once for each row of b, and the result can quickly be checked against an index on t2.b.

In the particular case I was working on (there were multiple joins to be rewritten), this cut execution time down from 2 minutes to just under a second.

Lonely Hammer Syndrome

When all you have is a hammer, every problem looks like a nail. After optimising the query referred to above, I found out its context. It’s a data collection problem consisting of a server log where each url contains a UID, a csv that contains details about each uid (page title, etc…), and the two needed to be collated. The original process was:

  1. Import both tables into an access db (the server log into a table with one field – whole_line)
  2. Using link tables and SQL string functions, convert the server log into a friendlier format in a table on a local oracle server
  3. Collate the two files using the above mentioned sql
  4. Export collated data to csv (from access)
  5. Append to master file

Basically, we’re going from csv, to access, to oracle, back through access out to csv again. Additionally, many of these steps required manual intervention. This process has to be done every week – how can it be optimised? Of course there are many ways – I looked into Java and JDBC to cut access out of the loop, but thought if I’m going to the effort I may as well cut out oracle as well. Perl is reknowned for its log-munging ability, so I put together a script (rather quickly – thank you regular expressions!) which can now automatically give formatted data in barely a few minutes, compared to up to 60 minutes of manual labour using the old method.

The moral – expose yourself to as many tools as possible. You don’t need to be an expert in them (I had to google pretty much every perl command :S), but if you know the pros and cons of each you can cut down both development and operational time substantially.

A pretty flower Another pretty flower