nginx php5-fpm upstream timed out (110: Connection timed out) while connecting to upstream

faraklit picture faraklit · Apr 8, 2011 · Viewed 29.1k times · Source

We have a web server running with nginx php5-fpm apc setup. However we experienced upstream connection timeout errors and slow downs during page rendering recently. A quick php5-fpm restart fixed the problem but we could not find the cause.

We have another web server running apache2 under another subdomain, connecting the same database, doing exact same job. But the slow downs occur only on the nginx-fpm server. I think the php5-fpm or apc may cause the problems.

Logs tell that various connection time outs:

upstream timed out (110: Connection timed out) while connecting to upstream bla bla bla

The php5-fpm log does not show anything. Just child starts and finishes:

Apr 07 22:37:27.562177 [NOTICE] [pool www] child 29122 started
Apr 07 22:41:47.962883 [NOTICE] [pool www] child 28346 exited with code 0 after 2132.076556 seconds from start
Apr 07 22:41:47.963408 [NOTICE] [pool www] child 29172 started
Apr 07 22:43:57.235164 [NOTICE] [pool www] child 28372 exited with code 0 after 2129.135717 seconds from start

Server was not loaded when the error occured and load avg was just 2 (2cpus 16cores) and the php5-fpm processes seemed to be working fine.

nginx conf:

user www-data;
worker_processes 14;
pid /var/run/nginx.pid;
# set open fd limit to 30000
worker_rlimit_nofile 30000;

events {
        worker_connections 768;
        # multi_accept on;
}

http {

        ##
        # Basic Settings
        ##

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 65;
        types_hash_max_size 2048;
        # server_tokens off;

        # server_names_hash_bucket_size 64;
        # server_name_in_redirect off;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # Logging Settings
        ##

        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        ##
        # Gzip Settings
        ##

        gzip on;
        gzip_disable "msie6";

        # gzip_vary on;
        # gzip_proxied any;
        # gzip_comp_level 6;
        # gzip_buffers 16 8k;
        # gzip_http_version 1.1;
        # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

        ##
        # Virtual Host Configs
        ##

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;
}

nginx enabled site conf:

    location ~* \.php$ {
        fastcgi_split_path_info ^(.+\.php)(.*)$;
        fastcgi_pass   backend;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
        include fastcgi_params;
        fastcgi_param  QUERY_STRING     $query_string;
        fastcgi_param  REQUEST_METHOD   $request_method;
        fastcgi_param  CONTENT_TYPE     $content_type;
        fastcgi_param  CONTENT_LENGTH   $content_length;
        fastcgi_intercept_errors        off;
        fastcgi_ignore_client_abort     off;
        fastcgi_connect_timeout 20;
        fastcgi_send_timeout 20;
        fastcgi_read_timeout 180;
        fastcgi_buffer_size 128k;
        fastcgi_buffers 4 256k;
        fastcgi_busy_buffers_size 256k;
        fastcgi_temp_file_write_size 256k;
    }

## Disable viewing .htaccess & .htpassword
    location ~ /\.ht {
        deny  all;
    }
}
upstream backend {
        server 127.0.0.1:9000;
}

fpm conf:

pm.max_children = 500
pm.start_servers = 100
pm.min_spare_servers = 50
pm.max_spare_servers = 100
pm.max_requests = 10000

There are emergency restart settings in fpm conf file. I do not know if they help us fix the issue?

emergency_restart_interval = 0

Answer

Phillip B Oldham picture Phillip B Oldham · Apr 8, 2011

Firstly, reduce the PHP-FPM max_requests to 100; you want PHP threads to restart much sooner than 10000 req's.

Secondly, you've only got one PHP process running with lots of children. This is fine for development, but in production you want to have more PHP processes each with fewer children, so that if that process goes down for any reason there are others which can take up the slack. So, rather than a ratio of 1:50 as you have now, go for a ratio of 10:5. This will be much more stable.

To achieve this you may want to look at something like supervisor to manage your PHP processes. We use this in production and it has really helped increase our uptime and reduce the amount of time we spend managing/monitoring the servers. Here's an example of our config:

/etc/php5/php-fpm.conf:

[global]
daemonize = no

[www]
listen = /tmp/php.socket

/etc/supervisor.d/php-fpm.conf:

[program:php]
user=root
command=/usr/sbin/php-fpm -c /etc/php5/php.ini -y /etc/php5/php-fpm.conf
numprocs=10
process_name=%(program_name)s

/etc/nginx/conf/php.backend:

upstream backend {
    server unix:/tmp/php.socket
}

EDIT:

As with all server set-ups, don't rely on guess-work to track down where your issues are. I recommend installing Munin along with the various PHP(-FPM) and Nginx plugins; these will help you track hard statistics on requests, response times, memory usage, disk accesses, thread/process levels... all essential when tracking down where the issues are.

In addition, as I mentioned in a comment below, adding both server- and client-side caching to your set-up, even at a modest level, can aid in providing a better experience for users, whether it's using nginx's native caching support or something more specific like varnishd. Even the most dynamic of sites/apps have many static elements which can be held in memory & served faster. Serving these from cache can help reduce the load overall and ensure that those elements which absolutely need to be dynamic have all the resources they need when they need them.