Steve Meyers: February 2013

Sometimes, despite all your efforts to optimize your queries, you have one that just takes a while. It requires a full table scan, and there's just no getting around it. In this case, your best option may be to look for ways to use a cache to speed up your query. There are two kinds of caches that are most useful for these situations: a result cache, and a data cache.

A result cache is fairly simple to implement, and is most often used for reporting. When the exact same query is run over and over again, then it might make sense to store the result in a cache table. This is especially true when up-to-the-second accuracy isn't important. Every company I've worked for has had marketing or other administrative reports that were important to the business, but took a long time to run. The people viewing the reports only cared about results up to yesterday, so the report really only needed to be run once a day. Running a 5-minute query once a day isn't really a big deal, especially if you can schedule it to run when your site isn't very busy.

A data cache is a little more complicated, and depends entirely on the nature of your data and the queries you run. At each company I've worked for, we've had fairly large tables that were queried often in ways that defied indexing. The specifics were very different in each case, but our solutions held to a common pattern.

First, we were able to significantly limit the data set based on a criteria generally involving what constituted an "active" record. Second, we were able to de-normalize the data so that JOINs weren't necessary. Third, we were sometimes able to shard the cached data, since the lookups would only happen across some subset of the data. Last, we were often able to add strategic indexes that were not possible or feasible on the full data set. This included geospatial indexes that are only possible with MyISAM tables, while our main data set used InnoDB.

Once you determine your data caching schema, the next big question is how often to cache it? This is something that you'll need to determine based on (a) how long it takes to rebuild the cache, and (b) how fresh your data needs to be. The first is a technical constraint, while the latter is a business constraint. If you keep track of updates, deletes, and inserts to your main data set, then you may be able to just update your cache instead of completely rebuilding it. I'd still recommend completely rebuilding it occasionally, since it is possible that some edge case was missed when determining what has been updated.

One trick that may help with maintaining your cached data is the RENAME TABLES command. You can atomically rename multiple tables with a command such as "RENAME TABLES table to table_new, table_old to table". This allows you to build (or update) a new cache table without impacting the current cache table.

Steve Meyers has 15 experience as a MySQL DBA, web software architect, and system administrator, with an emphasis on scalability

If all else fails, cache it