I maintain a custom built CMS-like application.
Whenever a document is submitted, several tasks are performed that can be roughly grouped into the following categories:
Category 1 includes updates to various MySQL tables relating to a document's content.
Category 2 includes parsing of HTML content stored in MySQL LONGTEXT fields to perform some automatic anchor tag transformations. I suspect that a great deal of computation time is spent in this task.
Category 3 includes updates to a simple MySQL-based search index using just a handful of fields corresponding to the document.
All of these tasks need to complete for the document submission to be considered complete.
The machine that hosts this application has dual quad-core Xeon processors (a total of 8 cores). However, whenever a document submits, all PHP code that executes is constrained to a single process running on one of the cores.
My question:
What schemes, if any, have you used to split up your PHP/MySQL web application processing load among multiple CPU cores? My ideal solution would basically spawn a few processes, let them execute in parallel on several cores, and then block until all of the processes are done.
Related question:
What is your favorite PHP performance profiling tool?
PHP has full Multi-Threading support which you can take full advantage of in so many ways. Have been able to demonstrate this Multi-Threading ability in different examples:
A quick Search would give additional resources.
MySQL is fully multi-threaded and will make use of multiple CPUs, provided that the operating system supports them, It would also maximize system resources if properly configured for performance.
A typical setting in the my.ini
that affect thread performance is :
thread_cache_size = 8
thread_cache_size can be increased to improve performance if you have a lot of new connections. Normally, this does not provide a notable performance improvement if you have a good thread implementation. However, if your server sees hundreds of connections per second you should normally set thread_cache_size high enough so that most new connections use cached threads
If you are using Solaris then you can use
thread_concurrency = 8
thread_concurrency enables applications to give the threads system a hint about the desired number of threads that should be run at the same time.
This variable is deprecated as of MySQL 5.6.1 and is removed in MySQL 5.7. You should remove this from MySQL configuration files whenever you see it unless they are for Solaris 8 or earlier.
InnoDB: :
You don't have such limitations if you are using Innodb has the storage engine because it full supports thread concurrency
innodb_thread_concurrency // Recommended 2 * CPUs + number of disks
You can also look at innodb_read_io_threads
and innodb_write_io_threads
where the default is 4
and it can be increased to as high as 64
depending on the hardware
Others:
Other configurations to also look at include key_buffer_size
, table_open_cache
, sort_buffer_size
etc. which cal all result in better performance
PHP:
In pure PHP you can create MySQL Worker where each query are executed in separate PHP threads
$sql = new SQLWorker($host, $user, $pass, $db);
$sql->start();
$sql->stack($q1 = new SQLQuery("One long Query"));
$sql->stack($q2 = new SQLQuery("Another long Query"));
$q1->wait();
$q2->wait();
// Do Something Useful
Here is a Full Working Example of SQLWorker
I suspect that a great deal of computation time is spent in this task.
If you already know the problem then it makes it easier to solve via event loops , Job Queue or using Threads.
Working on one document one at a time can be a very, very slow, painful process. @ka once hacked his way out using ajax to calling multiple request, Some Creative minds would just fork the process using pcntl_fork but if you are using windows
then you can not take advantage of pcntl
With pThreads
supporting both windows and Unix systems, You don't have such limitation. Is as easy as .. If you need to parse 100 document? Spawn 100 Threads ... Simple
HTML Scanning
// Scan my System
$dir = new RecursiveDirectoryIterator($dir, RecursiveDirectoryIterator::SKIP_DOTS);
$dir = new RecursiveIteratorIterator($dir);
// Allowed Extension
$ext = array(
"html",
"htm"
);
// Threads Array
$ts = array();
// Simple Storage
$s = new Sink();
// Start Timer
$time = microtime(true);
$count = 0;
// Parse All HTML
foreach($dir as $html) {
if ($html->isFile() && in_array($html->getExtension(), $ext)) {
$count ++;
$ts[] = new LinkParser("$html", $s);
}
}
// Wait for all Threads to finish
foreach($ts as $t) {
$t->join();
}
// Put The Output
printf("Total Files:\t\t%s \n", number_format($count, 0));
printf("Total Links:\t\t%s \n", number_format($t = count($s), 0));
printf("Finished:\t\t%0.4f sec \n", $tm = microtime(true) - $time);
printf("AvgSpeed:\t\t%0.4f sec per file\n", $tm / $t);
printf("File P/S:\t\t%d file per sec\n", $count / $tm);
printf("Link P/S:\t\t%d links per sec\n", $t / $tm);
Output
Total Files: 8,714
Total Links: 105,109
Finished: 108.3460 sec
AvgSpeed: 0.0010 sec per file
File P/S: 80 file per sec
Link P/S: 907 links per sec
Class Used
Sink
class Sink extends Stackable {
public function run() {
}
}
LinkParser
class LinkParser extends Thread {
public function __construct($file, $sink) {
$this->file = $file;
$this->sink = $sink;
$this->start();
}
public function run() {
$dom = new DOMDocument();
@$dom->loadHTML(file_get_contents($this->file));
foreach($dom->getElementsByTagName('a') as $links) {
$this->sink[] = $links->getAttribute('href');
}
}
}
Experiment
Trying parsing 8,714
files that have 105,109
links without threads and see how long it would take.
Better Architecture
Spawning too many threads which is not a clever thing to do In production. A better approch would be to use Pooling. Have a pool of define Workers then stack with a Task
Performance Improvement
Fine, the example above can still be improved. Instead of waiting for the system to scan all files in a single thread you can use multiple threads to scan my system for files then stack the data to Workers for processing
This has been pretty much answered by the first answer, but there are so many ways for performance improvement. Have you ever considered an Event based approach?
@rdlowrey Quote 1:
Well think of it like this. Imagine you need to serve 10,000 simultaneously connected clients in your web application. Traditional thread-per-request or process-per-request servers aren't an option because no matter how lightweight your threads are you still can't hold 10,000 of them open at a time.
@rdlowrey Quote 2:
On the other hand, if you keep all the sockets in a single process and listen for those sockets to become readable or writable you can put your entire server inside a single event loop and operate on each socket only when there's something to read/write.
Why don't you experiment with event-driven
, non-blocking I/O
approach to your problem. PHP has libevent to supercharge your application.
I know this question is all Multi-Threading
but if you have some time you can look this Nuclear Reactor written in PHP by @igorw
I think you should consider using Cache
and Job Queue
for some of your tasks. You can easily have a message saying
Document uploaded for processing ..... 5% - Done
Then do all the time wasting tasks in the background. Please look at Making a large processing job smaller for a similar case study.
Profiling Tool? There is no single profile tool for a web application from Xdebug to Yslow are all very useful. Eg. Xdebug is not useful when it comes to threads because its not supported
I don't have a favorite