0

I need to establish if text on my image is upside down.Examples of my images:

1 enter image description here

2 enter image description here

3

4 enter image description here

5

I do that via comparison of confidence scores of primal image and 180 degree rotated image, but sometimes this approach gives wrong result so i am looking for alternative way to do one more independant check.

I tried to calculate number of black pixels above and below the middle line, but this approach doesnt work sometimes, even for the perfect image without any deffects.

~ $ python3 Upside5.py
Image is upside-down
above_middle
7750
below_middle
9112

Perfect image

Is it possible to fix this approach ? Could you, please, advice me alternative approach of establishing if text is upside down ?

Here is my code for establishing if text is upsdie-down via calculation of black pixels above and below middle line:

import cv2
import numpy as np

def process_image_and_draw_lines(image_path):
    def count_black_pixels_from_bottom(binary_img, N):
        height, width = binary_img.shape
        if N >= height:
            raise ValueError("N is out of range")
        return np.sum(binary_img[height - 1 - N, :] == 0)

    def count_black_pixels_from_top(binary_img, K):
        height, width = binary_img.shape
        if K >= height:
            raise ValueError("K is out of range")
        return np.sum(binary_img[K, :] == 0)

    def draw_line_on_row(img, row_number, color=(0, 0, 255), thickness=1):
        img[row_number:row_number+thickness, :] = color
        return img

    def find_highest_difference_row(binary_img, from_bottom=True):
        height = binary_img.shape[0]
        max_diff = 0
        max_diff_index = 0

        if from_bottom:
            for N in range(height - 1):
                black_pixels_current = count_black_pixels_from_bottom(binary_img, N)
                black_pixels_next = count_black_pixels_from_bottom(binary_img, N + 1)
                difference = black_pixels_next - black_pixels_current
                if difference > max_diff:
                    max_diff = difference
                    max_diff_index = N + 1
            return height - 1 - max_diff_index
        else:
            for K in range(height - 1):
                black_pixels_current = count_black_pixels_from_top(binary_img, K)
                black_pixels_next = count_black_pixels_from_top(binary_img, K + 1)
                difference = black_pixels_next - black_pixels_current
                if difference > max_diff:
                    max_diff = difference
                    max_diff_index = K + 1
            return max_diff_index

    def process_image(image_path):
        img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        if img is None:
            print(f"Error loading image {image_path}")
            return
        
        _, binary_img = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY)
        top_line = find_highest_difference_row(binary_img, from_bottom=False)
        bottom_line = find_highest_difference_row(binary_img, from_bottom=True)

        if bottom_line > top_line:
            top_line, bottom_line = bottom_line, top_line

        middle_line = (top_line + bottom_line) // 2

        img_copy = cv2.cvtColor(binary_img, cv2.COLOR_GRAY2BGR)
        img_copy = draw_line_on_row(img_copy, top_line, color=(0, 0, 255), thickness=1)
        img_copy = draw_line_on_row(img_copy, bottom_line, color=(0, 0, 255), thickness=1)
        img_copy = draw_line_on_row(img_copy, middle_line, color=(0, 255, 0), thickness=1)

        above_middle = np.sum(binary_img[:middle_line, :] == 0)
        below_middle = np.sum(binary_img[middle_line:, :] == 0)

        if above_middle > below_middle:
            print("above_middle")
            print(above_middle)
            print("below_middle")
            print(below_middle)
            print("Image is not upside-down.")
        elif below_middle > above_middle:
            print("Image is upside-down.")
            print("above_middle")
            print(above_middle)
            print("below_middle")
            print(below_middle)
        else:
            print("Number of black pixels above and below the middle line are equal.")

        cv2.imshow('Image with Lines', img_copy)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

    process_image(image_path)

process_image_and_draw_lines('image_5.png')`
2
  • Yes, for now, OCR is the only way to check if text is upside down, but i am looking for alternative way to always do additional check
    – user176953
    Commented Jul 3 at 19:31
  • Probably i would try to train special neural network for the rotation check.But this approach is extremly hard. Maybe i can check the orientation by working only with image directly? I mean there must be certain simple patterns in the image (such as the number of pixels above and below the midline) that allow to check if it is upside down.
    – user176953
    Commented Jul 3 at 19:39

1 Answer 1

-1

While trying to determine if a text is oriented upside down it is worth to be aware that a generally valid approach does not exist as for example in case of "OOO000" and "000OOO" or "MM996699WW" and "WW669966MM" it is impossible to know which orientation is the right one without further hints about the image orientation.

Yes, for now, OCR is the only way to check if text is upside down, but i am looking for alternative way to verify the result with an additional check.

A possible alternative approach to verify the OCR-results by comparing an upside-down image result to original orientation without the effort of training a neural network model or relying on LLMs is to extract the single character objects from the image and then check for characteristic properties of "A","Y","U",V" or "C","E","F","K","L","P" like a human does to decide about the orientation.

The upside-down "C","E","F","K","L","P", have the main amount of pixels on the right side where the properly oriented on the left. In other words a left and right "margin" area of a character bounding box is tested for being occupied with pixels and if there are bounding boxes with a "heavy" right margin it is a hint towards an upside-down text (assuming there are only capital letters and digits in the image along with special handling of "3" and"9" where "1" can be excluded by the width of the bounding box).

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.