Cleaning scanned grayscale images with ImageMagick

Aaron Digulla picture Aaron Digulla · Mar 7, 2012 · Viewed 10.3k times · Source

I have a lots of scans of text pages (black text on white background).

My usual approach is to clean those in Gimp using the Curves dialog using a pretty simple curve with only four points: 0,0 - 63,0 - 224,255, 255,255

This makes all the greyish text pitch black plus makes the text sharper and turns most of the whitish pixels pure white.

How can I achieve the same effect in a script using ImageMagick or some other Linux tool that runs completely from the command line?

-normalize or -contrast-stretch don't work because they operate with pixel counts. I need an operator which can make the colors 0-63 (grayscale) pitch black, everything above 224 pure white and the rest should be normalized.

Answer

Aaron Digulla picture Aaron Digulla · Mar 12, 2012

The Color Modifications page shows many color manipulation algorithms by ImageMagick.

In this specific case, two algorithms are interesting:

-level gives you perfect black/white pixels near the ends of the curve and a linear distribution between.

The sigmoidal option creates a smoother curve between the extremes, which works better for color photos.

To get a similar result like in GIMP, you can try to apply one after the other (to make text and black areas really black).

In all cases, you will want to run -normalize first (or even -contrast-stretch to merge most of the noise) to make sure no black/white levels are wasted. Without this, the darkest color could be lighter than rgb(0,0,0) and the brightest color could be below pure white.