Steve Meyers: Query in a loop?

I ran across this gem recently on StackOverflow:

$queryxyzzy12=("SELECT * FROM visitorcookiesbrowsing ORDER by id ASC");
$resultxyzzy23=mysql_query($queryxyzzy12) or die(mysql_error());

while($ckxf = mysql_fetch_assoc($resultxyzzy23)){
    $querycrtx=("SELECT * FROM cart WHERE userkey='$ckxf[usercookie]' ORDER by datebegan  DESC");
    $resultcrtx=mysql_query($querycrtx) or die(mysql_error());
    ------ snip ------
}

Besides the interesting variable names used, it's also doing a query inside a loop, which is very inefficient. This would be better written as a single JOIN query:

SELECT 
  v.*, 
  c.* 
FROM 
  visitorcookiesbrowsing v 
  LEFT JOIN cart c ON c.userkey=v.usercookie 
ORDER BY 
  v.id ASC, 
  c.datebegan DESC

The details of JOIN vs. LEFT JOIN depend on the actual intent of the code, which wasn't apparent from the snippet.

If at all possible, never do a query inside a loop. The overhead of doing multiple queries is generally far greater than any possible benefit. If you're using an ORM framework, you need to be aware of how it is loading data from the database. If you are instantiating objects in a loop, you may really be executing a query (or sometimes more!) for each object you instantiate.

This advice applies to update queries as well. It's generally best to create an array of the values that need updating, and then update them all at once. With MySQL, this is fairly easy for INSERT and DELETE queries.

UPDATE queries can be a bit trickier, since you can't easily specify different updates for each value. I've solved this a few different ways, depending on circumstances. If you know that the rows you are updating exist, then you can cheat and do an INSERT ... ON DUPLICATE KEY UPDATE. If you aren't 100% sure the rows exist, then it may be best to create a temporary table, insert all of the update values into it (with a single query), and then do a multi-table UPDATE query.

If you're running MariaDB or Percona Server, you could also try performing your update over the HandlerSocket interface. This would eliminate much of the overhead of running multiple queries, even though you are, in effect, running one query per update.

As always, the performance of these different ways of doing things depends greatly on your data, and what you're doing with it. Always run tests to see how these different methods work with your data, server, and application.

2 comments :

AnonymousApril 24, 2013 at 1:22 AM
The fact that MySQL uses nested loop joins seems lost on many developers. I think they don't deeply understand the join mechanism even if they understand the concept. Of course ORMs complicate matters inordinately. I find that for best performance, it is usually best to forgo using an ORM in any batch processing because it almost invariably involves instantiating many objects in loops. Batch processing should be set oriented rather than row/record oriented (which of course ORMs are.)
UnknownApril 24, 2013 at 7:47 AM
Although I'm not personally an ORM fan, almost all ORMs these days do support batch loading mechanisms. The problem is that ORMs tend to make developers lazy, and they don't consider the underlying consequences of how they use their ORM.