Making an OCR app for Android using Tesseract.


Star on GitHub

Recently I was playing with OCR library by google called as “Tesseract” (cool name for a library!).

App in action.
App in action.

 

Screenshot_2015-08-29-23-08-51

 

 

 

 

 

 

 

 

It was a fun experience. This post shows how you can make a simple OCR app in Android using Tesseract.

We will be using Tess-Two a fork of Tesseract with some additional tools like Liptonica which is an image processing library.

If you want an even easier way to get started with OCR on Android you can try this library built by me. Easy OCR Library. Usage instructions are in the ReadMe.md file there.

Anyways, moving forward I am using Android Studio on Ubuntu 64 bit machine here.

Step 1 : 

Clone the library Tess-Two.


git clone git://github.com/rmtheis/tess-two tess

Step 2 :

Now we need to build  the library.

For building we will need Android NDK.


cd tess
cd tess-two
ndk-build
android update project --path .
ant release

Building may take some time so be patient. Don’t press ctrl+c too soon 😛 .

Step 3  :

Yay! Time to use the library in Android Project.

Copy the tess/tess-two folder into the root folder of your application project.

Step 4 : 

In the tess-two folder you just pasted. Add build.gradle file as Android Studio uses gradle build system.

Add following gradle script in the file.


buildscript {
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath 'com.android.tools.build:gradle:1.2.3'
    }
}

apply plugin: 'android-library'

android {
    compileSdkVersion 22
    buildToolsVersion "22.0.1"

    defaultConfig {
        minSdkVersion 8
        targetSdkVersion 22
    }

    sourceSets.main {
        manifest.srcFile 'AndroidManifest.xml'
        java.srcDirs = ['src']
        resources.srcDirs = ['src']
        res.srcDirs = ['res']
        jniLibs.srcDirs = ['libs']
    }
}

Step 5 : 

Add the following line in project.settings file.

include ':tess-two'

Step 6 :

Now we have successfully included the Tess-Two library in our project and we are ready to use it.

First we need to capture the picture itself. You can use something like this code sample taken from Easy OCR Library.


public void takePicture(){
        Intent e = new Intent("android.media.action.IMAGE_CAPTURE");
        this.filePathOriginal = FileUtils.getDirectory(this.directoryPathOriginal) + File.separator + Calendar.getInstance().getTimeInMillis() + ".jpg";
        e.putExtra("output", Uri.fromFile(new File(this.filePathOriginal)));

        startActivity(e);
    }

Or you can find the code here.

We will also downscale the image a little so that the recognition is fast.

You  can use following code sample from again Easy OCR Library


 private Bitmap getBitmapFromPath() {
        BitmapFactory.Options bmOptions = new BitmapFactory.Options();
        bmOptions.inSampleSize = 4;
        Bitmap bitmap = BitmapFactory.decodeFile(this.filePath, bmOptions);
        return bitmap;
    }

Step 7 :

Final step. Recognize the text using the library API.


 private String scanImage(){
        TessBaseAPI baseApi = new TessBaseAPI();
        Log.d(Config.TAG, "Data path : " + FileUtils.getDirectory(this.directoryPath));
        baseApi.init(FileUtils.getDirectory(this.directoryPath) + "/", this.trainedDataCode);
        baseApi.setImage(this.mBitmap);
        String recognizedText = baseApi.getUTF8Text();
        baseApi.end();

        return recognizedText;
    }


Again I would recommend using the Easy OCR Library if you are having facing any problem.

That library has many features :

  1. Very easy setup.
  2. Handles all the image processing part in a background thread.
  3. Provides very interface with relative callbacks for the functions of the library.

 

Advertisements

13 thoughts on “Making an OCR app for Android using Tesseract.

  1. Truly an EASY OCR scanner. Saves the hazel of NDK build. The program works great but when I tried to make some additions to it, there was an error as shows below. Please help me solve the issue. It is unable to find “libpngt.so”. [couldn’t find “libpngt.so”]

    Caused by: java.lang.UnsatisfiedLinkError: dalvik.system.PathClassLoader[DexPathList[[zip file “/data/app/com.wordpress.priyankvex.easyocrscannerdemo-1/base.apk”],nativeLibraryDirectories=[/data/app/com.wordpress.priyankvex.easyocrscannerdemo-1/lib/arm, /vendor/lib, /system/lib]]] couldn’t find “libpngt.so”

    Thanks

    Like

  2. I am getting this error. Please help me

    @Below Line
    baseApi.init(“/mnt/sdcard/tesseract/tessdata/eng.trainedata”, “eng”);

    Error : Caused by: java.lang.IllegalArgumentException: Data path does not exist!

    Like

  3. This work like a charm but the OCR Capability is not accurate at all why is that… even the app like scanbot uses the same tessaract engine… Any insight???

    Like

      1. Is there any way a beginner like me can explore on it .. do you have any suggested documents where i can start exploring.

        Like

  4. Hi priyank
    I use your code for OCR in android studio but the output shows nothing , i dont know why? may be its datatrained issue i use the eng.traineddata
    plzzz suggest me solution

    Like

  5. Hi, I am trying to use your easy-ocr-lib. I am getting the following exception
    “java.lang.UnsatisfiedLinkError: Couldn’t load pngt from loader”

    Like

  6. Hi
    did anybody solve the missing libpngt.so ?
    i’m trying the “easy” way by reusing the existing library, but it seems not so straightforward…

    write permission missing in manifest … of course a working simple project would be of great help. But answer/comment on those missing .so file would be certainly helpful

    Thanks !
    Alex

    Like

  7. Hello Priyank, can you send me your email address. I have a project to discuss with you if you are available.
    Thanks

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s