How to remove a black background from PDF text before printing

wgpubs picture wgpubs · Sep 28, 2009 · Viewed 23.1k times · Source

I have a PDF with a black background and white/yellow text.

How can I remove the black background before printing and invert the color of the text?

Answer

Chris Dolan picture Chris Dolan · Oct 2, 2009

This is likely to be non-trivial to solve in general, but if you have a predictable collections of PDFs (say, all from the same source) then you may be able to hack together a quick solution like so:

  • install CAM::PDF from CPAN
  • run "getpdfpage.pl my.pdf 1 > page1.txt" to get the graphic codes for page 1
  • search for " rg" to find where the RGB text color is changed (or "RG" for background, or maybe "g" or "G" for grayscale or "k" or "K" for CMYK colors "sc" or "SC" for special colorspaces)
  • edit page1.txt to set the colors you like
  • run "setpdfpage.pl my.pdf 1 page1.txt out.pdf"

All of this can be done programmatically instead of via command line tools too. getpdfpage.pl and setpdfpage.pl are simple little wrappers around the CAM::PDF API.

A general solution would be to use getPageContentTree() to parse the PDF page syntax and search for the color changing operators and alter them. But if your PDF uses a custom color space ("sc") this can be tricky. And searching for the operator that does the full-page black fill could be hard too, depending on the geometry.

If you provide an URL for a sample PDF, I could provide some more specific advice.

UPDATE: on a whim, I wrote a rudimentary color changer script that may work for some PDFs. To use it, run like this example which turns any red element green instead:

perl recolor.pl input.pdf '1 0 0 rg' '0 1 0 rg' out.pdf

This requires you to know the PDF syntax of the color directives you're trying to change, so it may still require something like the getpdfpage.pl steps recommended above.

And the source code:

#!/usr/bin/perl -w                      

use strict;
use CAM::PDF;
use CAM::PDF::Content;

my %COLOROPS = map {$_ => 1} qw(rg RG g G k K sc SC);

my $pdf = CAM::PDF->new(shift) || die $CAM::PDF::errstr;
my @oldcolors;
my @newcolors;
while (@ARGV >= 2) {
   push @oldcolors, parseColor(shift);
   push @newcolors, parseColor(shift);
}
my $out = shift || '-';

for my $p (1 .. $pdf->numPages) {
   my $page = $pdf->getPageContentTree($p);
   traverse($page->{blocks});
   $pdf->setPageContent($p, $page->toString());
}
$pdf->cleanoutput($out);

sub parseColor {
   my ($in) = @_;
   my $ops = CAM::PDF::Content->new($in);
   die 'Invalid color syntax in ' . $in if !$ops->validate();
   my @blocks = @{$ops->{blocks}};
   die 'Expected one color operator in ' . $in if @blocks != 1;
   my $color = $blocks[0];
   die 'Not a color operator in ' . $in if !exists $COLOROPS{$color->{name}};
   return $color;
}

sub traverse {
   my ($blocks) = @_;
   for my $op (@{$blocks}) {
      if ($op->{type} eq 'block') {
         traverse($op->{value});
      } elsif (exists $COLOROPS{$op->{name}}) {
       COLOR:
         for (my $i=0; $i < @oldcolors; ++$i) {
            my $old = $oldcolors[$i];
            if ($old->{name} eq $op->{name} && @{$old->{args}} == @{$op->{args}}) {
               for (my $v=0; $v < @{$op->{args}}; ++$v) {
                  next COLOR if $old->{args}->[$v]->{value} != $op->{args}->[$v]->{value};
               }
               # match! so we will replace                                                                                  
               $op->{name} = $newcolors[$i]->{name};
               @{$op->{args}} = @{$newcolors[$i]->{args}};
               last COLOR;
            }
         }
      }
   }
}