0

我想創建一個基於canvas標籤的PIL圖像對象,該標籤用this網站的Selenium提取。目標是使用pytesseract並獲取驗證碼內容。我的代碼不會產生任何錯誤,但創建的圖像全是黑色的。從Selenium的畫布標籤創建PIL圖像對象

我迄今爲止代碼:

# Run JS code to get data URI 
png_url = driver.execute_script(
     'return document.getElementsByTagName("canvas")[0].toDataURL("image/png");') 
# Parse the URI to get only the base64 part 
str_base64 = re.search(r'base64,(.*)', png_url).group(1) 
# Convert it to binary 
str_decoded = str_base64.decode('base64') 
# Create and show Image object 
image = Image.open(StringIO(str_decoded)) 
image.show()  
# Read image with pytesseract 
recaptcha = pytesseract.image_to_string(image) 

我不知道爲什麼圖像是全黑的。我的代碼基於this教程,它保存了圖像。我不想保存圖像,我希望它只在內存中。

編輯:

我已經在文件系統中保存的圖像和圖像保存好,但與透明的背景下,表現出這樣的時候出現黑色。我怎樣才能使背景變白?

回答

0

所有我需要做的是提取的背景下this答案:

def remove_transparency(im, bg_colour=(255, 255, 255)): 

    # Only process if image has transparency (https://stackoverflow.com/a/1963146) 
    if im.mode in ('RGBA', 'LA') or (im.mode == 'P' and 'transparency' in im.info): 

     # Need to convert to RGBA if LA format due to a bug in PIL (https://stackoverflow.com/a/1963146) 
     alpha = im.convert('RGBA').split()[-1] 

     # Create a new background image of our matt color. 
     # Must be RGBA because paste requires both images have the same format 
     # (https://stackoverflow.com/a/8720632 and https://stackoverflow.com/a/9459208) 
     bg = Image.new("RGBA", im.size, bg_colour + (255,)) 
     bg.paste(im, mask=alpha) 
     return bg 

    else: 
     return im 

完整的代碼,然後:

png_url = driver.execute_script(
      'return document.getElementsByTagName("canvas")[0].toDataURL("image/png");') 
str_base64 = re.search(r'base64,(.*)', png_url).group(1) 
# Convert it to binary 
str_decoded = str_base64.decode('base64') 
image = Image.open(StringIO(str_decoded)) 
image = remove_transparency(image) 
recaptcha = pytesseract.image_to_string(image).replace(" ", "") 
0

您應該創建一個RGB白色圖像和您的RGBA圖像粘貼到它。解決方案可能是this,但也有其他方法。我建議numpy和opencv。