Every user will be able to upload 100 TIFF (black and white) images.
The process requires:
Convert tif
to jpg
.
Resize image to xx.
Crop image to 200px.
Add a text watermark.
Here is my PHP code:
move_uploaded_file($image_temp,$destination_folder.$image_name);
$image_name_only = strtolower($image_info["filename"]);
$name=$destination_folder.$image_name_only.".jpg";
$thumb=$destination_folder."thumb_".$image_name_only.".jpg";
$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$destination_folder.$image_name. ' '.$name.' 2>&1';
exec($exec, $exec_output, $exec_retval);
$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$name. ' -resize 1024x '.$name;
exec($exec, $exec_output, $exec_retval);
$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$name. ' -thumbnail 200x200! '.$thumb;
exec($exec, $exec_output, $exec_retval);
$exec = '"C:\Program Files\ImageMagick-6.9.0-Q16\convert.exe" '.$name. " -background White label:ش.پ12355 -append ".$name;
exec($exec, $exec_output, $exec_retval);
This code works. But the average processing time for every image is 1 second. So for 100 images it will probably take around 100 seconds.
How can I speed up this whole process (convert, resize, crop, watermark)?
EDIT
I have a Server G8:Ram:32G,CPU:Intel Xeon E5-2650(4 Process)
version:ImageMagick 6.9.0-3 Q16 x64
FEATURES:OpenMP
convert logo: -resize 500% -bench 10 1.png
Performance[1]: 10i 0.770ips 1.000e 28.735u 0:12.992
Performance[2]: 10i 0.893ips 0.537e 26.848u 0:11.198
Performance[3]: 10i 0.851ips 0.525e 27.285u 0:11.756
Performance[4]: 10i 0.914ips 0.543e 26.489u 0:10.941
Performance[5]: 10i 0.967ips 0.557e 25.803u 0:10.341
Performance[6]: 10i 0.797ips 0.509e 27.737u 0:12.554
Performance[7]: 10i 0.963ips 0.556e 25.912u 0:10.389
Performance[8]: 10i 0.863ips 0.529e 26.707u 0:11.586
Resource limits:
Width: 100MP;Height: 100MP;Area: 17.16GP;Memory: 7.9908GiB;Map: 15.982GiB;Disk: unlimited;File: 1536;Thread: 8;Throttle: 0;Time: unlimited
Basically, this challenge can be tackled in two different ways, or a combination of the two:
The next few sections discuss the both approaches.
First, check for your exact ImageMagick version and run:
convert -version
In case your ImageMagick has a Q16
(or even Q32
or Q64
, which is possible, but overkill!) in its version string:
This means, all of ImageMagick's internal functions treat all images as having 16 bit (or 32 or 64 bit) channel depths.
This gives you a better quality in image processing.
But it also requires double memory as compared to Q8
.
So at the same time it means a performance degradation.
Hence: you could test what performance benefits you'll achieve by switching to a Q8
-build.
(The Q
is symbol for the 'quantum depth' supported by a ImageMagick build.)
You'll pay your possible Q8
-performance gains with quality loss, though.
Just check what speed up you achieve with Q8
over Q16
, and what quality losses you suffer.
Then decide whether you can live with the drawbacks or not...
In any case Q16
will use twice as much RAM per image to process, and Q32
will again use twice the amount of Q16
.
This is independent from the actual bits-per-pixels seen in the input files.
16-bit image files, when saved, will also consume more disk space than 8-bit ones.
With Q16
or Q32
requiring more memory, you always have to ensure that you have enough of it.
Because exceeding your physical memory would be very bad news.
If a larger Q
makes a process swap to disk, performance will plummet.
A 1074 x 768
pixel image (width x height
) will require the following amounts of virtual memory, depending on the quantum depth:
Quantum Virtual Memory
Depth (consumed by 1 image 1024x768)
------- ------------------------------
8 3.840 kiB (=~ 3,75 MiB)
16 7.680 kiB (=~ 7,50 MiB)
32 15.360 kiB (=~ 14,00 MiB)
Also keep in mind, that some 'optimized' processing pipelines (see below) will need to keep several copies of an image in virtual memory! Once virtual memory cannot be satisfied by available RAM, the system will start to swap and claim "memory" from the disk. In that case, all clever command pipeline optimization is of course gone, and starts to knock over to the very reverse.
ImageMagick's birthday was in the aera when CPUs could handle only 1 bit at a time. That was decades ago. Since then CPU architecture has changed a lot. 16-bit operations used to take twice as long as 8-bit operations, or even longer. Then 16-bit processors arrived. 16-bit ops became standard. CPUs were optimised for 16-bit: Suddenly some 8-bit operations could take even longer than 16-bit equivalents.
Nowadays, 64bit CPUs are common.
So the Q8
vs. Q16
vs. Q32
argument in real terms may even be void.
Who knows?
I'm not aware of any serious benchmarking about this.
It would be interesting if someone (with really deep knowhow about CPUs and about benchmarking real world programs) would run with such a project one day.
Yes, I see you are using Q16
on Windows.
But I still wanted to mention it, for completeness' sake...
In the future there will be other users reading this question and the answers given.
Very likely, since your input TIFFs are black+white only, the image quality output of a Q8
build will be good enough for your workflow.
(I just don't know if it would also be significantly faster:
this largely also depends on the hardware resources you are running this on...)
In addition, if your installation sports support HDRI
(high dynamic resolution images), this may also cause some speed penalty.
Who knows?
So building IM with configure options --disable-hdri --quantum-depth 8
may or may not lead to speed improvements.
Nobody has ever tested this in a serious way...
The only thing we know about this:
these options will decrease image quality.
However most people will not even notice this, unless they take really close looks and make direct image-by-image comparisons...
Next, check if your ImageMagick installation comes with OpenCL and/or OpenMP support:
convert -list configure | grep FEATURES
If it does (like mine), you should see something like this:
FEATURES DPC HDRI OpenCL OpenMP Modules
OpenCL (for C omputing L anguage) utilizes ImageMagick's parallel computing features (if compiled-in). This will make use of your computer's GPU additionally to the CPU for image processing operations.
OpenMP (for M ulti-P rocessing) does something similar: it allows ImageMagick to execute in parallel on all the cores of your system. So if you have a quad-core system, and resize an image, the resizing happens on 4 cores (or even 8 if you have hyperthreading).
The command
convert -version
prints some basic info about supported features. If OpenCL/OpenMP are available, you will see one of them (or both) in the output.
If none of the two show up: look into getting the most recent version of ImageMagick that has OpenCL and/or OpenMP support compiled in.
If you build the package yourself from the sources, make sure OpenCL/OpenMP are used. Do this by including the appropriate parameters into your 'configure' step:
./configure [...other options-] --enable-openmp --enable-opencl
ImageMagick's documentation about OpenMP and OpenCL is here:
-resize
is one of them.Hints and instructions to build ImageMagick from sources and configure the build, explaining various options, are here:
This page also includes a short discussion of the --with-quantum-depth
configure option.
You can now also use the builtin -bench
option to make ImageMagick run a benchmark for your command.
For example:
convert logo: -resize 500% -bench 10 logo.png
[....]
Performance[4]: 10i 1.489ips 1.000e 6.420u 0:06.510
Above command with -resize 500%
tells ImageMagick to run the convert
command and scale the built-in IM logo:
image by 500% in each direction.
The -bench 10
part tells it to run that same command 10 times in a loop and then print the performance results:
Performance[4]:
).10i
).1.489ips
).If your result includes Performance[1]:
, and only one line, then your system does not have OpenMP enabled.
(You may be able to switch it on, if your build does support it: run convert -limit thread 2
.)
Find out how your system's ImageMagick is set up regarding resource limits. Use this command:
identify -list resource File Area Memory Map Disk Thread Time -------------------------------------------------------------------- 384 8.590GB 4GiB 8GiB unlimited 4 unlimited
Above shows my current system's settings (not the defaults -- I did tweak them in the past).
The numbers are the maximum amount of each resource ImageMagick will use.
You can use each of the keywords in the column headers to pimp your system.
For this, use convert -limit <resource> <number>
to set it to a new limit.
Maybe your result looks more like this:
identify -list resource File Area Memory Map Disk Thread Time -------------------------------------------------------------------- 192 4.295GB 2GiB 4GiB unlimited 1 unlimited
files
defines the max concurrently opened files which ImageMagick can use.memory
, map
, area
and disk
resource limits are defined in Bytes.
For setting them to different values you can use SI prefixes, .e.g 500MB).When you do have OpenMP for ImageMagick on your system, you can run.
convert -limit thread 2
This enable 2 parallel threads as a first step. Then re-run the benchmark and see if it really makes a difference, and if so how much. After that you could set the limit to 4 or even 8 and repeat the excercise....
Finally, you can experiment with a special internal format of ImageMagick's pixel cache.
This format is called MPC
(Magick Pixel Cache).
It only exists in memory.
When MPC is created, the processed input image is kept in RAM as an uncompressed raster format. So basically, MPC is the native in-memory uncompressed file format of ImageMagick. It is simply a direct memory dump to disk. A read is a fast memory map from disk to memory as needed (similar to memory page swapping). But no image decoding is needed.
(More technical details: MPC as a format is not portable. It also isn't suitable as a long-term archive format. Its only suitability is as an intermediate format for high-performance image processing. It requires two files to support one image.)
If you still want to save this format to disk, be aware of this:
Its main advantage is experienced when...
MPC was designed especially for workflow patterns which match the criteria "read many times, write once".
Some people say that for such operations the performance improves here, but I have no personal experience with it.
Convert your base picture to MPC first:
convert input.jpeg input.mpc
and only then run:
convert input.mpc [...your long-long-long list of crops and operations...]
Then see if this saves you significantly on time.
Most likely you can use this MPC format even "inline" (using the special mpc:
notation, see below).
The MPR format (memory persistent register) does something similar. It reads the image into a named memory register. Your process pipeline can also read the image again from that register, should it need to access it multiple times. The image persists in the register the current command pipeline exits.
But I've never applied this technique to a real world problem, so I can't say how it works out in real life.
As you describe your process, it is composed of 4 distinguished steps:
Please tell if I understand correctly your intentions from reading your code snippets:
Basically, each step uses its own command -- 4 different commands in total. This can be sped up considerably by using a single command pipeline which performs all the steps on its own.
Moreover, you seem to not really need to keep the unlabelled JPEG as an end result -- yet your one command to generate it as an intermediate temporary file saves it to disk. We can try to skip this step altogether then, and try to achieve the final result without this extra write to disk.
There are different approaches possible to this change. I'll show you (and other readers) only one for now -- and only for the CLI, not for PHP. I'm not a PHP guy -- it's your own job to 'translate' my CLI method into appropriate PHP calls.
(But by all means: please test with my commands first, really using the CLI, to see if the effort is worth while translating the approach to PHP!)
But please make first sure that you really understand the architecture and structure of more complex ImageMagick's command lines! For this goal, please refer to this other answer of mine:
Your 4 steps translate into the following individual ImageMagick commands:
convert image.tiff image.jpg
convert image.jpg -resize 1024x image-1024.jpg
convert image-1024.jpg -thumbnail 200x200 image-thumb.jpg
convert -background white image-1024.jpg label:12345 -append image-labelled.jpg
Now to transform this workflow into one single pipeline command... The following command does this. It should execute faster (regardless of what your results are when following my above steps 0.--4.):
convert image.tiff \
-respect-parentheses \
+write mpr:XY \
\( mpr:XY +write image-1024.jpg \) \
\( mpr:XY -thumbnail 200x200 +write image-thumb.jpg \) \
\( mpr:XY -background white label:12345 -append +write image-labelled.jpg \) \
null:
Explanations:
-respect-parentheses
:
required to really make independent from each other the sub-commands executed inside the \( .... \)
parentheses.+write mpr:XY
:
used to write the input file to an MPR memory register.
XY
is just a label (you can use anything), needed to later re-call the same image.+write image-1024.jpg
:
writes result of subcommand executed inside the first parentheses pair to disk.+write image-thumb.jpg
:
writes result of subcommand executed inside the second parentheses pair to disk.+write image-labelled.jpg
:
writes result of subcommand executed inside the third parentheses pair to disk.null:
:
terminates the command pipeline.
Required because we otherwise would end with the last subcommand's closing parenthesis.In order to get a rough feeling about my suggestion, I did run the commands below.
The first one runs the sequence of the 4 individual commands 100 times (and saves all resulting images under different file names).
time for i in $(seq -w 1 100); do
convert image.tiff \
image-indiv-run-${i}.jpg
convert image-indiv-run-${i}.jpg -sample 1024x \
image-1024-indiv-run-${i}.jpg
convert image-1024-indiv-run-${i}.jpg -thumbnail 200x200 \
image-thumb-indiv-run-${i}.jpg
convert -background white image-1024-indiv-run-${i}.jpg label:12345 -append \
image-labelled-indiv-run-${i}.jpg
echo "DONE: run indiv $i ..."
done
My result for 4 individual commands (repeated 100 times!) is this:
real 0m49.165s
user 0m39.004s
sys 0m6.661s
The second command times the single pipeline:
time for i in $(seq -w 1 100); do
convert image.tiff \
-respect-parentheses \
+write mpr:XY \
\( mpr:XY -resize 1024x \
+write image-1024-pipel-run-${i}.jpg \) \
\( mpr:XY -thumbnail 200x200 \
+write image-thumb-pipel-run-${i}.jpg \) \
\( mpr:XY -resize 1024x \
-background white label:12345 -append \
+write image-labelled-pipel-run-${i}.jpg \) \
null:
echo "DONE: run pipeline $i ..."
done
The result for single pipeline (repeated 100 times!) is this:
real 0m29.128s
user 0m28.450s
sys 0m2.897s
As you can see, the single pipeline is about 40% faster than the 4 individual commands!
Now you can also invest in multi-CPU, much RAM, fast SSD hardware to speed things up even more :-)
But first translate this CLI approach into PHP code...
There are a few more things to be said about this topic. But my time runs out for now. I'll probably return to this answer in a few days and update it some more...
Update: I had to update this answer with new numbers for the benchmarking:
initially I had forgotten to include the -resize 1024x
operation (stupid me!) into the pipelined version.
Having included it, the performance gain is still there, but not as big any more.
-clone 0
to copy image within memoryHere is another alternative to try instead of the mpr:
approach with a named memory register as suggested above.
It uses (again within 'side processing inside parentheses') the -clone 0
operation.
The way this works is this:
convert
reads the input TIFF from disk once and loads it into memory.-clone 0
operator makes a copy of the first loaded image (because it has index 0
in the current image stack).+write
operation saves the respective result to disk.So here is the command to benchmark this:
time for i in $(seq -w 1 100); do
convert image.tiff \
-respect-parentheses \
\( -clone 0 -thumbnail 200x200 \
+write image-thumb-pipel-run-${i}.jpg \) \
\( -clone 0 -resize 1024x \
-background white label:12345 -append \
+write image-labelled-pipel-run-${i}.jpg \) \
null:
echo "DONE: run pipeline $i ..."
done
My result:
real 0m19.432s
user 0m18.214s
sys 0m1.897s
To my surprise, this is faster than the version which used mpr:
!
-scale
or -sample
instead of -resize
This alternative will most likely speed up your resizing sub-operation. But it will likely lead to a somewhat worse image quality (you'll have to verify, if this difference is noticeable).
For some background info about the difference between -resize
, -sample
and -scale
see the following answer:
I tried it too:
time for i in $(seq -w 1 100); do
convert image.tiff \
-respect-parentheses \
\( -clone 0 -thumbnail 200x200 \
+write image-thumb-pipel-run-${i}.jpg \) \
\( -clone 0 -scale 1024x \
-background white label:12345 -append \
+write image-labelled-pipel-run-${i}.jpg \) \
null:
echo "DONE: run pipeline $i ..."
done
My result:
real 0m16.551s
user 0m16.124s
sys 0m1.567s
This is the fastest result so far (I combined it with the +clone
variant).
Of course, this modification can also be applied to your initial method running 4 different commands.
Q8
build by adding -depth 8
to the commands.I did not actually run and measure this, but the complete command would be.
time for i in $(seq -w 1 100); do
convert image.tiff \
-respect-parentheses \
\( -clone 0 -thumbnail 200x200 -depth 8 \
+write d08-image-thumb-pipel-run-${i}.jpg \) \
\( -clone 0 -scale 1024x -depth 8 \
-background white label:12345 -append \
+write d08-image-labelled-pipel-run-${i}.jpg \) \
null:
echo "DONE: run pipeline $i ..."
done
This modification is also applicable to your initial "I run 4 different commands"-method.
parallel
, as suggested by Mark SetchellThis of course is only applicable and reasonable for you, if your overall work process allows for such parallelization.
For my little benchmark testing it is applicable. For your web service, it may be that you know of only one job at a time...
time for i in $(seq -w 1 100); do \
cat <<EOF
convert image.tiff \
\( -clone 0 -scale 1024x -depth 8 \
-background white label:12345 -append \
+write d08-image-labelled-pipel-run-${i}.jpg \) \
\( -clone 0 -thumbnail 200x200 -depth 8 \
+write d08-image-thumb-pipel-run-${i}.jpg \) \
null:
echo "DONE: run pipeline $i ..."
EOF
done | parallel --will-cite
Results:
real 0m6.806s
user 0m37.582s
sys 0m6.642s
The apparent contradiction between user
and real
time can be explained:
the user
time represents the sum of all time ticks which where clocked on 8 different CPU cores.
From the point of view of the user looking onto his watch, it was much faster: less than 10 seconds.
Pick your own preferences -- combine different methods:
Some speedup can be gained (with identical image quality as currently) by constructing a more clever command pipeline.
Avoid running various commands (where each convert
leads to a new process, and has to read its input from disk).
Pack all image manipulations into one single process.
Make use of the "parenthesized side processing".
Make use of -clone
or mbr:
or mbc:
or even combine each of these.
Some speedups can be additionally be gained by trading image quality with performance: Some of your choices are:
-depth 8
(has to be declared on the OP's system) vs. -depth 16
(the default on the OP's system)-resize 1024
vs. -sample 1024x
vs. -scale 1024x
Make use of GNU parallel
if your workflow permits this.