Get correct image orientation by Google Cloud Vision api (TEXT_DETECTION)

Jack Fan picture Jack Fan · Dec 22, 2016 · Viewed 10.2k times · Source

I tried Google Cloud Vision api (TEXT_DETECTION) on 90 degrees rotated image. It still can return recognized text correctly. (see image below)

That means the engine can recognize text even the image is 90, 180, 270 degrees rotated.

However the response result doesn't include information of correct image orientation. (document: EntityAnnotation)

Is there anyway to not only get recognized text but also get the orientation?
Could Google support it similar to (FaceAnnotation: getRollAngle)

enter image description here

Answer

faaez picture faaez · Jun 13, 2018

You can leverage the fact that we know the sequence of characters in a word to infer the orientation of a word as follows (obviously slightly different logic for non-LTR languages):

for page in annotation:
    for block in page.blocks:
        for paragraph in block.paragraphs:
            for word in paragraph.words:
                if len(word.symbols) < MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE:
                    continue
                first_char = word.symbols[0]
                last_char = word.symbols[-1]
                first_char_center = (np.mean([v.x for v in first_char.bounding_box.vertices]),np.mean([v.y for v in first_char.bounding_box.vertices]))
                last_char_center = (np.mean([v.x for v in last_char.bounding_box.vertices]),np.mean([v.y for v in last_char.bounding_box.vertices]))

                #upright or upside down
                if np.abs(first_char_center[1] - last_char_center[1]) < np.abs(top_right.y - bottom_right.y): 
                    if first_char_center[0] <= last_char_center[0]: #upright
                        print 0
                    else: #updside down
                        print 180
                else: #sideways
                    if first_char_center[1] <= last_char_center[1]:
                        print 90
                    else:
                        print 270

Then you can use the orientation of individual words to infer the orientation of the document overall.