Making an OCR app for Android using Tesseract.


Star on GitHub

Recently I was playing with OCR library by google called as “Tesseract” (cool name for a library!).

App in action.
App in action.

 

Screenshot_2015-08-29-23-08-51

 

 

 

 

 

 

 

 

It was a fun experience. This post shows how you can make a simple OCR app in Android using Tesseract.

We will be using Tess-Two a fork of Tesseract with some additional tools like Liptonica which is an image processing library.

If you want an even easier way to get started with OCR on Android you can try this library built by me. Easy OCR Library. Usage instructions are in the ReadMe.md file there.

Anyways, moving forward I am using Android Studio on Ubuntu 64 bit machine here.

Step 1 : 

Clone the library Tess-Two.


git clone git://github.com/rmtheis/tess-two tess

Step 2 :

Now we need to build  the library.

For building we will need Android NDK.


cd tess
cd tess-two
ndk-build
android update project --path .
ant release

Building may take some time so be patient. Don’t press ctrl+c too soon 😛 .

Step 3  :

Yay! Time to use the library in Android Project.

Copy the tess/tess-two folder into the root folder of your application project.

Step 4 : 

In the tess-two folder you just pasted. Add build.gradle file as Android Studio uses gradle build system.

Add following gradle script in the file.


buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath 'com.android.tools.build:gradle:1.2.3'
    }
}

apply plugin: 'android-library'

android {
    compileSdkVersion 22
    buildToolsVersion "22.0.1"

    defaultConfig {
        minSdkVersion 8
        targetSdkVersion 22
    }

    sourceSets.main {
        manifest.srcFile 'AndroidManifest.xml'
        java.srcDirs = ['src']
        resources.srcDirs = ['src']
        res.srcDirs = ['res']
        jniLibs.srcDirs = ['libs']
    }
}

Step 5 : 

Add the following line in project.settings file.

include ':tess-two'

Step 6 :

Now we have successfully included the Tess-Two library in our project and we are ready to use it.

First we need to capture the picture itself. You can use something like this code sample taken from Easy OCR Library.


public void takePicture(){
        Intent e = new Intent("android.media.action.IMAGE_CAPTURE");
        this.filePathOriginal = FileUtils.getDirectory(this.directoryPathOriginal) + File.separator + Calendar.getInstance().getTimeInMillis() + ".jpg";
        e.putExtra("output", Uri.fromFile(new File(this.filePathOriginal)));

        startActivity(e);
    }

Or you can find the code here.

We will also downscale the image a little so that the recognition is fast.

You  can use following code sample from again Easy OCR Library


 private Bitmap getBitmapFromPath() {
        BitmapFactory.Options bmOptions = new BitmapFactory.Options();
        bmOptions.inSampleSize = 4;
        Bitmap bitmap = BitmapFactory.decodeFile(this.filePath, bmOptions);
        return bitmap;
    }

Step 7 :

Final step. Recognize the text using the library API.


 private String scanImage(){
        TessBaseAPI baseApi = new TessBaseAPI();
        Log.d(Config.TAG, "Data path : " + FileUtils.getDirectory(this.directoryPath));
        baseApi.init(FileUtils.getDirectory(this.directoryPath) + "/", this.trainedDataCode);
        baseApi.setImage(this.mBitmap);
        String recognizedText = baseApi.getUTF8Text();
        baseApi.end();

        return recognizedText;
    }


Again I would recommend using the Easy OCR Library if you are having facing any problem.

That library has many features :

  1. Very easy setup.
  2. Handles all the image processing part in a background thread.
  3. Provides very interface with relative callbacks for the functions of the library.

 

18 responses to “Making an OCR app for Android using Tesseract.”

  1. Truly an EASY OCR scanner. Saves the hazel of NDK build. The program works great but when I tried to make some additions to it, there was an error as shows below. Please help me solve the issue. It is unable to find “libpngt.so”. [couldn’t find “libpngt.so”]

    Caused by: java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file “/data/app/com.wordpress.priyankvex.easyocrscannerdemo-1/base.apk”],nativeLibraryDirectories=[/data/app/com.wordpress.priyankvex.easyocrscannerdemo-1/lib/arm, /vendor/lib, /system/lib]]] couldn’t find “libpngt.so”

    Thanks

    Like

  2. How to hit ant release command in command prompt. I get this message when I go to that path & hit this command. “-bash: /Users/Vishal/Library/Android/sdk/tools/ant: is a directory”

    Like

  3. I am getting this error. Please help me

    @Below Line
    baseApi.init(“/mnt/sdcard/tesseract/tessdata/eng.trainedata”, “eng”);

    Error : Caused by: java.lang.IllegalArgumentException: Data path does not exist!

    Like

  4. This work like a charm but the OCR Capability is not accurate at all why is that… even the app like scanbot uses the same tessaract engine… Any insight???

    Like

    1. They use other image optimisation techniques like CV (computer vision) that enhances the image and increases its readability.

      Like

      1. Is there any way a beginner like me can explore on it .. do you have any suggested documents where i can start exploring.

        Like

  5. Hi priyank
    I use your code for OCR in android studio but the output shows nothing , i dont know why? may be its datatrained issue i use the eng.traineddata
    plzzz suggest me solution

    Like

    1. Is the image clear, that you are capturing? It could be training issue too.

      Like

  6. Yes, the image which is captured is clear but its not show the ocr result please guide me the solution.

    Like

  7. Hi, I am trying to use your easy-ocr-lib. I am getting the following exception
    “java.lang.UnsatisfiedLinkError: Couldn’t load pngt from loader”

    Like

  8. Hi
    Reply me plzzz that what issue occur i cant get it.

    Like

  9. Hi
    did anybody solve the missing libpngt.so ?
    i’m trying the “easy” way by reusing the existing library, but it seems not so straightforward…

    write permission missing in manifest … of course a working simple project would be of great help. But answer/comment on those missing .so file would be certainly helpful

    Thanks !
    Alex

    Like

  10. Idowu Akinyemi Avatar
    Idowu Akinyemi

    Hello Priyank, can you send me your email address. I have a project to discuss with you if you are available.
    Thanks

    Like

  11. Sir, i used tess-two for project. The tess two successfully cloned but ndk-build is giving me error.

    Android NDK: android-9 is unsupported. Using minimum supported version android-16.
    Android NDK: WARNING: APP_PLATFORM android-16 is higher than android:minSdkVersion 9 in ./AndroidManifest.xml. NDK binaries will *not* be compatible with devices older than android-1
    6. See https://android.googlesource.com/platform/ndk/+/master/docs/user/common_problems.md for more information.
    C:/sdk/ndk-bundle/build//../build/core/add-application.mk:178: *** Android NDK: APP_STL gnustl_static is no longer supported. Please switch to either c++_static or c++_shared. See ht
    tps://developer.android.com/ndk/guides/cpp-support.html for more information. . Stop.

    Please help me resolve it.

    Like

  12. I am using tess two in my project to read MRZ code. Its working fine in starting but after few api hit the response become very slow. And after some time it’s back to normal scanning, with making any change. Can’t figure out the issue, plz help.

    Like

    1. Without making any change.*

      Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.