Handwritten Multi-digit String Segmentation and Recognition using Deep Learning


Usually, the recognition of the segmented digits is an easier task compared to segmentation and recognition of a multi-digit string. It is often considered as a "Hello World!" example of machine leaning. In this post, we will learn how to develop an application to segment a handwritten multi-digit string image and recognize the segmented digits.
The handwritten digits recognition process passes through three steps preprocessing, segmentation of image into individual digits, and recognition of each digit.

Multi-digit Recognition Diagram

The preprocessing step includes conversion to grayscale, binarization, and dilation.

Preprocessing Steps


  1. Java Development Kit (JDK), you can get it from here.
  2. Netbeans, you can get it from here.
  3. Deeplearning4j, Open-Source Deep-Learning Software for Java and Scala.
  4. Javacv, Java wrapper to Opencv, FFmpeg, and many more.
  5. Source Code, you can get it from https://github.com/tahaemara/multi-digit-segmentation-and-recognition.


  1. Load image

  2. /*Load iamge in grayscale mode*/
    IplImage image = cvLoadImage(IMAGEPATH, 0);
    /*imwrite("samples/gray.jpg", new Mat(image)); // Save gray version of image*/ 
  3. Convert image to grayscale
  4. Assigning each pixel to a value of the range of monochromatic shades from black to white to represent its amount of light.

    Grayscale Image

  5. Binarise image.
  6. Assigning each pixel to only two possible colors typically black and white.
    /*Binarising Image*/
    IplImage binimg = cvCreateImage(cvGetSize(image), IPL_DEPTH_8U, 1);
    cvThreshold(image, binimg, 0, 255, CV_THRESH_OTSU); 
    /*imwrite("samples/binarise.jpg", new Mat(binimg)); // Save binarised version of image*/
    Binarized Image

  7. Invert image color.
  8. Assigning each pixel with its value after applying not operator such that dark areas in the input image become light and light areas become darkRef.
    /*Invert image */
    Mat inverted = new Mat();
    bitwise_not(new Mat(binimg), inverted);
    IplImage inverimg = new IplImage(inverted);
    /*imwrite("samples/invert.jpg", new Mat(inverimg)); // Save dilated version of image*/
    Inverted Image

  9. Dilate image.
  10. Assigning each output pixel to the maximum value of all the pixels in the input pixel's neighborhood. In a binary image, if any of the pixels is set to the value 1, the output pixel is set to 1Ref. This increases the thickness of the digits.

    /*Dilate image to increase the thickness of each digit*/
    IplImage dilated = cvCreateImage(cvGetSize(inverimg), IPL_DEPTH_8U, 1);
    opencv_imgproc.cvDilate(inverimg, dilated, null, 1);
    /*imwrite("samples/dilated.jpg", new Mat(dilated)); // Save dilated version of image*/
    Dilated Image

  11. Segment string image into individual digits.

  12. CvMemStorage storage = cvCreateMemStorage(0);
    CvSeq contours = new CvSeq();
    cvFindContours(dilated.clone(), storage, contours, Loader.sizeof(CvContour.class), 
    CvSeq ptr = new CvSeq();
    List rects = new ArrayList<>();
    for (ptr = contours; ptr != null; ptr = ptr.h_next()) {
        CvRect boundbox = cvBoundingRect(ptr, 1);
        Rect rect = new Rect(boundbox.x(), boundbox.y(), boundbox.width(), boundbox.height());
        cvRectangle(image, cvPoint(boundbox.x(), boundbox.y()),
        cvPoint(boundbox.x() + boundbox.width(), boundbox.y() + boundbox.height()),
        CV_RGB(0, 0, 0), 2, 0, 0);

    Segmented Image

  13. Sort digits (contours) from left to right.

  14. Mat result = new Mat(image);
    Collections.sort(rects, new RectComparator());

    RectComparator Class
    import java.util.Comparator;
    import org.bytedeco.javacpp.opencv_core;
     public class RectComparator implements Comparator {
            public int compare(opencv_core.Rect t1, opencv_core.Rect t2) {
                return Integer.valueOf(t1.x()).compareTo(t2.x());

  15. Add black border to each digit, this increases the accuracy of classification.

  16. Before adding border
    After adding border

  17. Recognize each digit
  18. Each digit will be recognized by a model generated by a Lenet architecture trained with MNIST dataset. This model is generated by class NetworkTrainer in the github repository .
     for (int i = 0; i < rects.size(); i++) {
                Rect rect = rects.get(i);
                Mat digit = new Mat(dilated).apply(rect);
                copyMakeBorder(digit, digit, 10, 10, 10, 10, BORDER_CONSTANT, new Scalar(0, 0, 0, 0));
                resize(digit, digit, new Size(28, 28));
                NativeImageLoader loader = new NativeImageLoader(28, 28, 1);
                INDArray dig = loader.asMatrix(digit);
                INDArray flaten = dig.reshape(new int[]{1, 784});
                INDArray output = restored.output(flaten);
                /*for (int i = 0; i < 10; i++) {
                System.out.println("Probability of being " + i + " is " + output.getFloat(i));
                int idx = Nd4j.getExecutioner().execAndReturn(new IAMax(output)).getFinalResult();
                System.out.println("Best Result is : " + DIGITS[idx]);
                opencv_imgproc.putText(result, DIGITS[idx] + "", new Point(rect.x(), rect.y()), 0, 1.0, new Scalar(0, 0, 0, 0));//print result above every digit
                /*imwrite("samples/digit" + i + ".jpg", digit);// save digits images */
    Result Image


You can build upon this code a lot of real apps like "Mobile Scratch Card Digits Recognition" and more.

Mobile Scratch Card Digits Recognition


Further Reading

Subscribe to Our Mailing List



Blog Tags

Follow me

Facebook Page