如何跳过Tesseract的图像?

我的文件夹中有超过50k张图像。 这是我写的代码。

public static File folder = new File("D:\\image\\");
public static File[] listofFiles = folder.listFiles();
private static int counter;

public static void main(String[] args) {

    Tesseract tesseract = new Tesseract();
    try {
        tesseract.setDatapath("C:\\Users\\zirpm\\Documents\\Coden\\Libaries\\Tess4J\\tessdata");
        for (int i = 0; i < listofFiles.length; i++) {
            String text = tesseract.doOCR(new File("D:\\image\\"+listofFiles[i].getName()));
            counter++;
            System.out.println("Image Number: "+counter+"  "+text);
        }


    }catch (TesseractException e) {
        e.printStackTrace();
        System.out.println("TESSERACT ERROR");
    }

}

它有时会以某种方式运行到以下错误:

Cannot convert RAW image to Pix with bpp = 64
Please call SetImage before attempting recognition.net.sourceforge.tess4j.TesseractException: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
at com.krissemicolon.Main.main(Main.java:23)
Caused by: java.lang.NullPointerException
at net.sourceforge.tess4j.Tesseract.getOCRText(Unknown Source)
at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)
... 3 more

您如何才能跳过导致该错误的图像并转到下一个图像?

评论
  • wnihil
    wnihil 回复

    Just update the try-catch clause location inside the for-loop, From the Tesseract.html documentation setDatapath() method doesn't throw any exception, just the doOCR() method

           Tesseract tesseract = new Tesseract();
           tesseract.setDatapath("C:\\Users\\zirpm\\Documents\\Coden\\Libaries\\Tess4J\\tessdata");
            for (int i = 0; i < listofFiles.length; i++) {
                try {
                    String text = tesseract.doOCR(new File("D:\\image\\" + listofFiles[i].getName()));
                    counter++;
                    System.out.println("Image Number: " + counter + "  " + text);
    
                } catch (TesseractException e) {
                    e.printStackTrace();
                    System.out.println("TESSERACT ERROR");
                }
            }
    
  • 一条龙
    一条龙 回复

    只需添加另一个try-catch:

    public static File folder = new File("D:\\image\\");
    public static File[] listofFiles = folder.listFiles();
    private static int counter;
    
    public static void main(String[] args) {
    
        Tesseract tesseract = new Tesseract();
        try {
            tesseract.setDatapath("C:\\Users\\zirpm\\Documents\\Coden\\Libaries\\Tess4J\\tessdata");
            for (int i = 0; i < listofFiles.length; i++) {
                try{
                    String text = tesseract.doOCR(new File("D:\\image\\"+listofFiles[i].getName()));
                }catch(TesseractException e){
                    System.out.println("Skipping "+listOfFiles[i].getName());
                }
                counter++;
                System.out.println("Image Number: "+counter+"  "+text);
            }
    
    
        }catch (TesseractException e) {
            e.printStackTrace();
            System.out.println("TESSERACT ERROR");
        }
    

    If a TesseractException occurs, it will inform you of the error and skip it.

    You may also want to remove the outer try-catch-block.