PHP is a handy tool for writing web pages and application suites for placing on the web. Over the years, PHP has gained a reputation as being quite easy to learn and quickly generate pages; the flip-side of that reputation is that it is a 'toy' language and that PHP scripts/modules/programs are inherently bad code. Of course, there's lots of really bad PHP code out there, just as there is lots of bad C, FORTRAN, Java etc code out there. Perhaps there is more bad PHP as inexperienced web designers are drawn to it for some dirty scripting, but that's hardly PHP's fault.
Once you start to use PHP for serious applications (e.g. even a relatively small application suite like Bumblebee has over 18kloc) then you start to wonder about bottlenecks in your code and consider optimising it. For a good overview of what to look for in your PHP code and some comparisons of the speed of local variables vs object members and function calls vs method calls, have a look at John Lim's Optimizing PHP page.
Profiling PHP
While John Lim's Optimizing PHP page has a good summary of what to look for when you're going through the code, it's also nice to know how to get some metrics on your own code. The Advanced PHP Debugger (APD, which is actually a profiler) provides you with some nice tools to do this. APD can be used with KCacheGrind to get some nice visualisations of what is happening in your application and will help to identify bottlenecks.
If you're running Debian and want to profile PHP4 scripts, then all you need to do to pick up the required tools is:
apt-get install php4-cgi php4-apd kcachegrind
To profile your script, you'll need to run it using the CGI version
of PHP not the Apache/IIS module. The CGI and module versions can
co-exist quite nicely within your Apache installation. Add the
following to your server's httpd.conf
(where
the other AddHandler
definitions are placed) and get Apache
to reload its configuration.
AddHandler php-script .pcgi Action php-script /cgi-bin/php4
If you get errors about Action
being unknown, then
make sure you have the following line either in your httpd.conf
or modules.conf
depending on the style of your
configuration.
LoadModule action_module /usr/lib/apache/1.3/mod_actions.so
If your application normally runs through index.php
then accessing it through index.pcgi
will run it using
the CGI version of PHP. You might need to do some reconfiguration of
your application so that any links point to index.pcgi
not index.php
now, but you've already got that
as a single configuration option in a text file, don't you?
To actually load in the profiling tools, you can either copy
your index.php
over to index.pcgi
and make some alterations, or you can do some clever PHP
monkeying as described below.
One of the things that becomes evident when you
use tools like KCacheGrind to view your profile output is that
if you don't have some sort of main()
function
inside your script that wraps around everything in the
script, then you'll get weird looking results that are harder to
understand without going through your code in some detail at the
same time as looking at your profile.
The following code listing for index.pcgi
solves the index.php
/index.pcgi
problem and provides you with a main()
quite nicely.
<?php // index.pcgi, a loader for APD ob_start("ob_gzhandler"); ini_set('apd.statement_tracing', 1); apd_set_pprof_trace(); function main() { include 'index.php'; } main(); ?>
There are a few things to note about this setup:
- Everything is run through
main()
which means that all your code will have a common ancenstor when viewing it through KCacheGrind and you'll be able to see what's happening more easily. - Output buffering is turned on and gzip compression is used.
If you don't have output buffering turned on then it will look like
each
include
andrequire
statement is taking a lot of time as they will cause PHP to flush the output to the browser and skew the time your actual algorithm is taking; similarly functions withecho
/print
would receive a penalty if they fill PHP's normal output buffer and cause TCP/IP traffic meaning that someecho
statements will look like they are taking a lot longer than others (and not in a reproducible way). - Statement tracing is turned on to improve the accuracy of the profiling. This feature is currently only useful if you use KCacheGrind, but it appears that this means that time will be allocated against the actual statements that are taking time rather than just on function exit.
- If your code allows, you could run
main()
inside a loop multiple times to permit some averages to be taken of the execution time more easily. Note, however, that you may end up with redefinitions errors for classes and functions that way (unless you useinclude_once
notinclude
) and that the call trace will get exceedingly large andpprof2calltree
might choke on it. - This gives a lot of information about the entire process which is good for identifying the areas you want to optimise, but you will probably need to write a test harness for just those components when you get to actually optimising them.
- You can conditionally turn on tracing for just some IPs or just
some call functions (e.g. you could test if
$_GET['view'] == 'foo'
if you only want to profile what happens when the user asks for a 'foo'.
The output of APD is quite cryptic but for most cases you don't really need to look at it; LJ posted an overview of PHP profiling which covers this in more detail.
If you are going to use KCacheGrind to view the profile (and it's a great tool for it... highly recommended!) then the basic steps you'll follow are as below. (You might like this script to automate steps 2-4.)
- visit your
.pcgi
page with profiling turned on (You are running with a snapshot of some real world data, aren't you? Databases full ofasdfasdf
tend not to be that good for profiling.) - find the profile on your server's filespace (under Debian
it's in
/var/log/php4-apd/
), it will have a filename like pprof.XXXXX - convert the profile to a calltree:
pprof2calltree -f /var/log/php4-apd/pprof.XXXXX
(If you get out of memory errors from this PHP script, then increase the memory allowed for the command-line version of PHP in/etc/php4/cli/php.ini
; a moderately complicated calltree can take 128MB of memory to parse. You might want to increase the execution time too.) - view the calltree using KCacheGrind:
kcachegrind cachegrind.out.pprof.XXXXX
- look for where the most time is being spent (can you improve that algorithm?), what functions are being called lots (are they necessary?), etc
- be careful at reading too much into one run... especially the first run (Was the database swapped out of memory? The first database call will look like it took a lot of time.)
Improving performance
You might be surprised at where your bottlenecks are... in an object-oriented application, you might find that one of your really small classes is thrashing a lot (method calls method calls method etc) and optimising just that component can make a big difference to your performance while the big scary algorithm that you were worried about because it's O(n2) isn't so much of a problem after all because it's only called once and there are other limits on how many records it will ever have to parse.
Calls like count($array)
are quite expensive, particularly
if you put them in a control loop. If you aren't going to change $array
within the loop, then don't use this:
for ($i = 0; $i < count($array); $i++) { // .... do something }
Instead, use this construct and save a lot of time.
$array_length = count($array); for ($i = 0; $i < $array_length; $i++) { // .... do something }
Happy optimising!
Last edited: Wednesday May 31, 2006
Copyright © 1996-2014 Stuart Prescott