OCR: Image to text?

Question 1

OCR: Image to text?

ios ocr xcode4.5 tesseract leptonica

The iOSDev · Nov 6, 2012 · Viewed 11.6k times · Source

Answer

Answer

Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:

I need to get the only one cropped image block with number plate contained in it.
From that plate need to find out the portion of the number portion using the data I got using the method provided here.
Then converting the image data to almost black and white using the RGB data found through the above method.
Then the data is converted to the Image using the method provided here.

Above 4 steps are combined in to one method like this as below :

-(void)getRGBAsFromImage:(UIImage*)image
{
    NSInteger count = (image.size.width * image.size.height);
    // First get the image into your data buffer
    CGImageRef imageRef = [image CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);

    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);

    // Now your rawData contains the image data in the RGBA8888 pixel format.
    int byteIndex = 0;
    for (int ii = 0 ; ii < count ; ++ii)
    {
        CGFloat red   = (rawData[byteIndex]     * 1.0) ;
        CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
        CGFloat blue  = (rawData[byteIndex + 2] * 1.0) ;
        CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;

        NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
        if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
        {
            red = 255.0;
            green = 255.0;
            blue = 255.0;
            alpha = 255.0;
            // all value set to 255 to get white background.
        }
        rawData[byteIndex] = red;
        rawData[byteIndex + 1] = green;
        rawData[byteIndex + 2] = blue;
        rawData[byteIndex + 3] = alpha;

        byteIndex += 4;
    }

    colorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef bitmapContext = CGBitmapContextCreate(
                                                       rawData,
                                                       width,
                                                       height,
                                                       8, // bitsPerComponent
                                                       4*width, // bytesPerRow
                                                       colorSpace,
                                                       kCGImageAlphaNoneSkipLast);

    CFRelease(colorSpace);

    CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);

    UIImage *img = [UIImage imageWithCGImage:cgImage];

    //use the img for further use of ocr

    free(rawData);
}

Note:

The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.

UPDATE :

    CGImageRef imageRef = [plate CGImage];
    CIContext *context = [CIContext contextWithOptions:nil]; // 1
    CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
    CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
    CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
    CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
    UIImage *img = [UIImage imageWithCGImage:cgImage];

Just replace the above method's(getRGBAsFromImage:) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.

Question 2

Before mark as copy or repeat question, please read the whole question first.

I am able to do at pressent is as below:

To get image and crop the desired part for OCR.
Process the image using tesseract and leptonica.
When the applied document is cropped in chunks ie 1 character per image it provides 96% of accuracy.
If I don't do that and the document background is in white color and text is in black color it gives almost same accuracy.

For example if the input is as this photo :

Photo start

enter image description here

Photo end

What I want is to able to get the same accuracy for this photo enter image description here
without generating blocks.

The code I used to init tesseract and extract text from image is as below:

For init of tesseract

in .h file

tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;

in .m file

tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");

For get text from image

- (void)processOcrAt:(UIImage *)image
{
    [self setTesseractImage:image];

    tesseract->Recognize(NULL);
    char* utf8Text = tesseract->GetUTF8Text();
    int conf = tesseract->MeanTextConf();

    NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];

    [self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
                           withObject:arr
                        waitUntilDone:YES];
    free(utf8Text);
}

- (void)ocrProcessingFinished0:(NSArray *)result
{
    UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
   [alt show];
}

But I don't get proper output for the number plate image either it is null or it gives some garbage data for the image.

And if I use the image which is the first one ie white background with text as black then the output is 89 to 95% accurate.

Please help me out.

Any suggestion will be appreciated.

Update

Thanks to @jcesar for providing the link and also to @konstantin pribluda to provide valuable information and guide.

I am able to convert images in to proper black and white form (almost). and so the recognition is better for all images :)

Need help with proper binarization of images. Any Idea will be appreciated

OCR: Image to text?

Answer

Related questions