根据文本方向检测图像方向角度

时间:2022-11-11
本文介绍了根据文本方向检测图像方向角度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我正在执行一项 OCR 任务,以从多个身份证明文件中提取信息.一个挑战是扫描图像的方向.需要固定 PAN、Aadhaar、驾驶执照或任何身份证明的扫描图像的方向.

已经在 Stackoverflow 和其他论坛上尝试过所有建议的方法,例如 OpenCV minAreaRect、霍夫线变换、FFT、单应性、具有 psm 0 的 tesseract osd.没有一个有效.

逻辑应返回文本方向的角度 - 0、90 和 270 度.附上0、90、270度的图片.这与确定偏度无关.

解决方案

这是一种基于大部分文本偏向一侧的假设的方法.这个想法是我们可以根据主要文本区域的位置来确定角度

  • 将图像转换为灰度和高斯模糊
  • 获取二值图像的自适应阈值
  • 使用轮廓区域查找轮廓和过滤
  • 在蒙版上绘制过滤轮廓
  • 根据方向水平或垂直分割图像
  • 计算每一半的像素数

转换为灰度和高斯模糊后,我们自适应阈值得到二值图像

从这里我们找到轮廓并使用轮廓区域进行过滤以去除小的噪声颗粒和大的边界.我们将通过此过滤器的任何轮廓绘制到蒙版上

为了确定角度,我们根据图像的尺寸将图像分成两半.如果 <代码> 宽度 >height 那么它必须是水平图像,所以我们垂直分成两半.如果 <代码> 高度 >宽度 那么它必须是垂直图像所以我们水平分割成两半

现在我们有两半,我们可以使用 cv2.countNonZero() 来确定每一半的白色像素的数量.以下是确定角度的逻辑:

如果是水平的如果左 >= 右度->0别的度->180如果垂直如果顶部 >= 底部度->270别的度->90

<块引用>

离开9703

右 3975

因此图像是 0 度.这是其他方向的结果

<块引用>

离开 3975

右 9703

我们可以得出结论,图像翻转了 180 度

这是垂直图像的结果.注意因为它是一个垂直的图像,我们水平分割

<块引用>

前 3947 个

底部 9550

因此结果是90度

导入 cv2将 numpy 导入为 npdef 检测角度(图像):掩码 = np.zeros(image.shape,dtype=np.uint8)灰色 = cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(灰色, (3,3), 0)自适应 = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)cnts = cv2.findContours(自适应,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)cnts = cnts[0] 如果 len(cnts) == 2 否则 cnts[1]对于 cnts 中的 c:面积 = cv2.contourArea(c)如果面积 <45000 和区域 >20:cv2.drawContours(掩码,[c],-1,(255,255,255),-1)掩码 = cv2.cvtColor(掩码,cv2.COLOR_BGR2GRAY)h, w = mask.shape# 水平的如果 w >H:左 = 掩码[0:h, 0:0+w//2]右 = 掩码 [0:h, w//2:]left_pixels = cv2.countNonZero(左)right_pixels = cv2.countNonZero(右)如果 left_pixels >= right_pixels 则返回 0 否则 180# 垂直的别的:顶部 = 掩码[0:h//2, 0:w]底部 = 掩码[h//2:, 0:w]top_pixels = cv2.countNonZero(top)bottom_pixels = cv2.countNonZero(底部)如果 bottom_pixels >= top_pixels 则返回 90,否则返回 270如果 __name__ == '__main__':图像 = cv2.imread('1.png')角度 = 检测角度(图像)打印(角度)

I am working on a OCR task to extract information from multiple ID proof documents. One challenge is the orientation of the scanned image. The need is to fix the orientation of the scanned image of PAN, Aadhaar, Driving License or any ID proof.

Already tried all suggested approaches on Stackoverflow and other forums such as OpenCV minAreaRect, Hough Lines Transforms, FFT, homography, tesseract osd with psm 0. None are working.

The logic should return the angle of the text direction - 0, 90 and 270 degrees. Attached are the images of 0, 90 and 270 degrees. This is not about determining the skewness.

解决方案

Here's an approach based on the assumption that the majority of the text is skewed onto one side. The idea is that we can determine the angle based on the where the major text region is located

  • Convert image to grayscale and Gaussian blur
  • Adaptive threshold to get a binary image
  • Find contours and filter using contour area
  • Draw filtered contours onto mask
  • Split image horizontally or vertically based on orientation
  • Count number of pixels in each half

After converting to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image

From here we find contours and filter using contour area to remove the small noise particles and the large border. We draw any contours that pass this filter onto a mask

To determine the angle, we split the image in half based on the image's dimension. If width > height then it must be a horizontal image so we split in half vertically. if height > width then it must be a vertical image so we split in half horizontally

Now that we have two halves, we can use cv2.countNonZero() to determine the amount of white pixels on each half. Here's the logic to determine angle:

if horizontal
    if left >= right 
        degree -> 0
    else 
        degree -> 180
if vertical
    if top >= bottom
        degree -> 270
    else
        degree -> 90

left 9703

right 3975

Therefore the image is 0 degrees. Here's the results from other orientations

left 3975

right 9703

We can conclude that the image is flipped 180 degrees

Here's results for vertical image. Note since its a vertical image, we split horizontally

top 3947

bottom 9550

Therefore the result is 90 degrees

import cv2
import numpy as np

def detect_angle(image):
    mask = np.zeros(image.shape, dtype=np.uint8)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    adaptive = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)

    cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]

    for c in cnts:
        area = cv2.contourArea(c)
        if area < 45000 and area > 20:
            cv2.drawContours(mask, [c], -1, (255,255,255), -1)

    mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    h, w = mask.shape
    
    # Horizontal
    if w > h:
        left = mask[0:h, 0:0+w//2]
        right = mask[0:h, w//2:]
        left_pixels = cv2.countNonZero(left)
        right_pixels = cv2.countNonZero(right)
        return 0 if left_pixels >= right_pixels else 180
    # Vertical
    else:
        top = mask[0:h//2, 0:w]
        bottom = mask[h//2:, 0:w]
        top_pixels = cv2.countNonZero(top)
        bottom_pixels = cv2.countNonZero(bottom)
        return 90 if bottom_pixels >= top_pixels else 270

if __name__ == '__main__':
    image = cv2.imread('1.png')
    angle = detect_angle(image)
    print(angle)

这篇关于根据文本方向检测图像方向角度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

上一篇:使用 Opencv 检测图像中矩形的中心和角度 下一篇:如何下载 Coco Dataset 的特定部分?

相关文章

最新文章