update: Just looked at the cache update times of long queries and they did not collide with server crash time.
update2: Found the cause of the problem. Ad server is down and the server hangs even though it seems we set socket timeouts properly. Is there any way to test timeout behaviour?
We have a very busy server. ~3K concurrent connections The server has 32GB ram 2xCPUs. We have service unavailable error issues. The server does not respond with 500 error and the error log shows hundreds/thousands of lines:
[warn] mod_fcgid: can't apply process slot for /var/www/fcgi-bin.d/php5-default/php-fcgi-wrapper
We think it can be a configuration error or a database connection/query error. A php process updates a cache which is a very very complex query result. 3 separate queries run each twice a day. I have enabled the slow query log. I suspect if the query exceeds the php run time limit 20 secs in our case (set in the following files). Any help is appreciated.
We use apache worker mpm model with mod_fcgid.
Here is fcgid.conf file:
<IfModule mod_fcgid.c>
AddHandler fcgid-script .fcgi
SocketPath /var/lib/apache2/fcgid/sock
# Communication timeout: Default value is 20 seconds
IPCCommTimeout 20
# Connection timeout: Default value is 3 seconds
IPCConnectTimeout 3
And /etc/apache2/conf.d/php-fcgid.conf file:
<IfModule !mod_php4.c>
# Path to php.ini <96> defaults to /etc/phpX/cgi DefaultInitEnv PHPRC=/etc/php5/cgi
# Number of PHP childs that will be launched. Leave undefined to let PHP decide.
# DefaultInitEnv PHP_FCGI_CHILDREN 8
# Maximum requests before a process is stopped and a new one is launched
DefaultInitEnv PHP_FCGI_MAX_REQUESTS 5000
# Maximum requests a process handles before it is terminated
MaxRequestsPerProcess 1500
# Maximum number of PHP processes.
MaxProcessCount 45
# Define a new handler "php-fcgi" for ".php" files, plus the action that must follow
AddHandler php-fcgi .php
Action php-fcgi /fcgi-bin/php-fcgi-wrapper
# Define the MIME-Type for ".php" files
AddType application/x-httpd-php .php
# Define alias "/fcgi-bin/". The action above is using this value, which means that
# you could run another "php5-cgi" command by just changing this alias
Alias /fcgi-bin/ /var/www/fcgi-bin.d/php5-default/
# Turn on the fcgid-script handler for all files within the alias "/fcgi-bin/"
<Location /fcgi-bin/>
SetHandler fcgid-script
Options +ExecCGI
</Location>
Apache2 worker mpm config:
<IfModule mpm_worker_module>
StartServers 10
MaxClients 2048
ServerLimit 2048
MinSpareThreads 30
MaxSpareThreads 100
ThreadsPerChild 64
ThreadLimit 100
MaxRequestsPerChild 5000
We looked at the instructions on this web page and loaded high server config: http://2bits.com/articles/apache-fcgid-acceptable-performance-and-better-resource-utilization.html
update: Just looked at the cache update times of long queries and they did not collide with server crash time.
update2: Found the cause of the problem. Ad server is down and the server hangs even though it seems we set socket timeouts properly. Is there any way to test timeout behaviour?
Your problem is pretty well covered by google. It looks like you have to play a bit with configuration (with options like MaxProcessCount).
I'd advice replacing apache with nginx. I experienced better performance. Also, nginx uses a lot less memory than apache. I'm using php-fpm for fast cgi.