2017-06-06 105 views
2

我正在使用Tesseract進行OCR的Spring-MVC應用程序。我正在爲我傳遞的文件獲取索引超出範圍的異常。有任何想法嗎?Tesseract:OCR方法的索引超出範圍例外

錯誤日誌:

et.sourceforge.tess4j.TesseractException: java.lang.IndexOutOfBoundsException 
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215) 
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196) 
    at com.tooltank.spring.service.GroupAttachmentsServiceImpl.testOcr(GroupAttachmentsServiceImpl.java:839) 
    at com.tooltank.spring.service.GroupAttachmentsServiceImpl.lambda$addAttachment$0(GroupAttachmentsServiceImpl.java:447) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.IndexOutOfBoundsException 
    at javax.imageio.stream.FileCacheImageOutputStream.seek(FileCacheImageOutputStream.java:170) 
    at net.sourceforge.tess4j.util.ImageIOHelper.getImageByteBuffer(ImageIOHelper.java:297) 
    at net.sourceforge.tess4j.Tesseract.setImage(Tesseract.java:397) 
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:290) 
    at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212) 
    ... 4 more 

代碼:

private String testOcr(String fileLocation, int attachId) { 
     try { 
      File imageFile = new File(fileLocation); 
      BufferedImage img = ImageIO.read(imageFile); 
      BufferedImage blackNWhite = new BufferedImage(img.getWidth(), img.getHeight(), BufferedImage.TYPE_BYTE_BINARY); 
      Graphics2D graphics = blackNWhite.createGraphics(); 
      graphics.drawImage(img, 0, 0, null); 
      String identifier = String.valueOf(new BigInteger(130, random).toString(32)); 
      String blackAndWhiteImage = previewPath + identifier + ".png"; 
      File outputfile = new File(blackAndWhiteImage); 
      ImageIO.write(blackNWhite, "png", outputfile); 

      ITesseract instance = new Tesseract(); 
      // Point to one folder above tessdata directory, must contain training data 
      instance.setDatapath("/usr/share/tesseract-ocr/"); 
      // ISO 693-3 standard 
      instance.setLanguage("deu"); 
      String result = instance.doOCR(outputfile); 
      result = result.replaceAll("[^a-zA-Z0-9öÖäÄüÜß@\\s]", ""); 
      Files.delete(new File(blackAndWhiteImage).toPath()); 
      GroupAttachments groupAttachments = this.groupAttachmentsDAO.getAttachmenById(attachId); 
      System.out.println("OCR Result is "+result); 
      if (groupAttachments != null) { 
       saveIndexes(result, groupAttachments.getFileName(), null, groupAttachments.getGroupId(), false, attachId); 
      } 
      return result; 
     } catch (Exception e) { 
      e.printStackTrace(); 

     } 
     return null; 
    } 

謝謝。

回答

1

由於Java Image IO(已用Java 9修復)中的一個錯誤,當前版本的Java Tesseract Wrapper(3.4.0作爲此答案已編寫)不適用於Java 9的<。要使用較低版本Java版本,您可以嘗試對Tesseract ImageIOHelper類進行以下修復。只需在項目中製作一份課程副本,並應用必要的更改,即可順利地處理文件和BufferedImages。

注意:此版本不使用原始類中使用的Tiff優化,如果您的項目需要,可以添加它。

public static ByteBuffer getImageByteBuffer(RenderedImage image) throws IOException { 
    //Set up the writeParam 
    if (image instanceof BufferedImage) { 
     return convertImageData((BufferedImage) image); 
    } 
    ColorModel cm = image.getColorModel(); 
    int width = image.getWidth(); 
    int height = image.getHeight(); 
    WritableRaster raster = cm 
      .createCompatibleWritableRaster(width, height); 
    boolean isAlphaPremultiplied = cm.isAlphaPremultiplied(); 
    Hashtable properties = new Hashtable(); 
    String[] keys = image.getPropertyNames(); 
    if (keys != null) { 
     for (int i = 0; i < keys.length; i++) { 
      properties.put(keys[i], image.getProperty(keys[i])); 
     } 
    } 
    BufferedImage result = new BufferedImage(cm, raster, 
      isAlphaPremultiplied, properties); 
    image.copyData(raster); 
    return convertImageData(result); 
} 
+0

所以我應該用你提供的代碼替換ImageIOHelper中的getImageBytBuffer方法。我如何調用OCR方法?謝謝。 –

+0

只需將固定副本添加到類路徑並調用tesseract通常的方式,它將在庫副本之前使用您的固定副本。 – ruhsuzbaykus

+0

對不起,沒有工作,同樣的例外。我把這個文件放在一個不同的包中,然後在模塊設置 - >模塊 - >依賴項中添加這個包到Intellij 13. –

0

嘗試升級到tess4j版本3.4.1。 這解決了我的問題。