I have a lots of scans of text pages (black text on white background).
My usual approach is to clean those in Gimp using the Curves dialog using a pretty simple curve with only four points: 0,0 - 63,0 - 224,255, 255,255
This makes all the greyish text pitch black plus makes the text sharper and turns most of the whitish pixels pure white.
How can I achieve the same effect in a script using ImageMagick or some other Linux tool that runs completely from the command line?
-normalize
or -contrast-stretch
don't work because they operate with pixel counts. I need an operator which can make the colors 0-63 (grayscale) pitch black, everything above 224 pure white and the rest should be normalized.
The Color Modifications page shows many color manipulation algorithms by ImageMagick.
In this specific case, two algorithms are interesting:
-level
gives you perfect black/white pixels near the ends of the curve and a linear distribution between.
The sigmoidal option creates a smoother curve between the extremes, which works better for color photos.
To get a similar result like in GIMP, you can try to apply one after the other (to make text and black areas really black).
In all cases, you will want to run -normalize
first (or even -contrast-stretch to merge most of the noise) to make sure no black/white levels are wasted. Without this, the darkest color could be lighter than rgb(0,0,0)
and the brightest color could be below pure white.