The pitfalls of reusing TYPO3 QueryBuilder: Analyzing a performance bottleneck
TL;DR: Do never reuse an instance of the TYPO3 QueryBuilder
for queries, even if the query is
the same but with different parameters, since this causes a significant performance decreasement when
processing larger amount of records.
I was recently involved in refactoring an older codebase in a TYPO3 project. One part of that codebase was a
data import which was created using a Symfony console command. The command basically fetched data from an
external data source and imported/updated existing records in the TYPO3 database. During execution, I noticed
that it was quite slow. What’s more, it appeared that the routine slowed down as more records were processed,
indicating a bottleneck in the code. After some investigation, it became evident that the slow performance was
caused by the TYPO3 QueryBuilder
, which was instantiated globally in the Symfony command and then reused in
several functions.
The TYPO3 documentation advises against reusing the QueryBuilder
, emphasizing that it holds an internal state.
The notice does however not describe the consequences of reusing a QueryBuilder
instance. Since it caused
a significant performance decreasement in the code I was working on, I was curious and digged deeper into the
topic to analyze, why QueryBuilder
slowed down so much when reused for multiple queries.
Update 04.10.2023: Note, that the shown notice on the screenshot above already has been updated in the TYPO3 documentation.
Performance analysis
To highlight the performance decrease, I created a simple Symfony command that:
- Selected 1000 records from the
tx_extensionmanager_domain_model_extension
table. - Iterated through the query results.
- Updated a field for each of the 1000 records.
The command is executed once using a global instance of the QueryBuilder
for all queries and once using a
new instance of the QueryBuilder
for each query.
The code for updating looked like this:
$queryBuilder
->update('tx_extensionmanager_domain_model_extension')
->set('last_updated', (string)time())
->where(
$this->queryBuilder->expr()->in(
'uid',
$this->queryBuilder->createNamedParameter($uid, Connection::PARAM_INT)
)
)
->executeStatement();
Note, that the QueryBuilder
object in the code above should be assumed as either a global or a new instance.
Command execution time
- Global
QueryBuilder
instance for all queries: 7.8010120391846 seconds - New
QueryBuilder
instance per query: 2.3586421012878 seconds
This clearly demonstrates that reusing an instance of the QueryBuilder
for multiple queries results in a
significant performance decrease.
Profiling with xdebug and IntelliJ IDEA
To gain insights into the code execution, I used the Xdebug profiler to save profiling information to a file. Next, I used the “Xdebug profile analyzer” tool in IntelliJ (see link) to visualize the profiling results.
The screenshot reveals, that the execution of Doctrine\DBAL\Connection->expandArrayParameters
and
Doctrine\DBAL\SQL\Parser->parse
consume a significant percentage of the overall processing time and that memory
usage is notably high.
When navigating through the call tree of the process, it became evident that the initial update statement performed well (8 ms execution time):
However, the last update statement was considerably slower, taking 81 ms compared to the first one:
So as a result, the profiling uncovered, that the Doctrine\DBAL\Connection->expandArrayParameters
calls should
be observed. To do so, I set breakpoints for the first and last iterations and analyzed the data being processed.
The result for the first update call aligned with expectations:
Here we have the SQL for the query and the 2 parameters which are used to replace the placeholder values in the query.
However, the result for the last update call finally revealed why reusing the QueryBuilder
for multiple queries
is a bad idea:
When QueryBuilder
is reused, the first update query is extended with each loop iteration, resulting in an
excessively large SQL query containing 2000 parameters to parse and process. This explains the progressively slower
performance with each iteration and the high memory consumption. Doctrine internally uses the PHP function
preg_match
to replace placeholder values, and as the number of placeholders increases, the replacement process
becomes slower due to the overhead of parsing and processing a large number of patterns.
Conclusion
As a result of the analysis, I think it may be good to change the notice in the documentation to a warning
and outline, that you must never reuse an instance of the QueryBuilder
.
Update 04.10.2023: Thanks a lot to the TYPO3 documentation team, who already updated
the QueryBuilder
documentation
as suggested!