How to read text written in images or text visible in your laptop's web camera using Python?

Using the Pytesseract library.

Dec 11, 2022

Optical Character Reading or OCR is a high value technique with multiple applications. Reading vehicle numbers on the roadside, automatic scanning of text content from identity cards, passports etc, scanning books and converting it to pdfs.

What you will learn in this post?

Read text from images.
Input images from your web camera.
Save the given image with the picture text converted into characters into PDF.

To begin with let us look up the GIT Repository of Tesseract. It is at https://github.com/tesseract-ocr/tesseract.

Documentation of Tesseract is available here

https://tesseract-ocr.github.io/

Tesseract is a OCR( Optical Character Recognition) library and its short history is.

Tesseract was originally developed at Hewlett-Packard Laboratories Bristol UK and at Hewlett-Packard Co, Greeley Colorado USA between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google.

Python-tesseract or Pytesseract is a python wrapper for Google's Tesseract-OCR.

Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in image. Its official documentation is at

https://pypi.org/project/pytesseract/

In this post we will be using two methods from the Pytesseract library.

pytesseract.image_to_string(picturefilepath, timeout in seconds)
pytesseract.image_to_pdf_or_hocr(picturefilepath, extension)

Let us start coding now. We will use Colab for coding as we have done in the past.

Colab is available here.

https://colab.research.google.com/

You might want to check our Colab beginners guide here.

The Programming Sutras

Starting the Programming Journey

Let us begin our common programming journey by deciding on the programming language that we will use and the IDE. For our journey we will use Python and our IDE will be Colab. Here is some information about the Python programming language.Python documentation…

3 years ago · 3 likes · Champak Roy

To begin with we will install the libraries.

!sudo apt install tesseract-ocr # sudo permits a user to execute a command as a super user
# apt is short for Advance Packing Tool which extracts a package from a library.
!pip install pytesseract
# Our application requires pytesseract which has tesseract-ocr as a dependency

Run this block before you move to the next section. You might have to restart the runtime to ensure that the libraries are available for use.

Here is the output after running the program.

The next step is importing the libraries.

import pytesseract
from PIL import Image

PIL documentation is available here

PIL documentation

We will use the Image.open(picturepath) function.

Next step is the code for starting the web camera on your laptop and taking a pic

from IPython.display import display, Javascript
 
from google.colab.output import eval_js
from base64 import b64decode

def clickPic(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function clickPicJS(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Press to Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('clickPicJS({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

We create a Javascript function called clickPicJS which starts the camera, takes a pic and returns its url.

Here is the full function

async function clickPicJS(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Press to Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('clickPicJS({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename
}

This is put inside a Python function called

def clickPic(filename='photo.jpg', quality=0.8):

Here is the full code.

def clickPic(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function clickPicJS(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Press to Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('clickPicJS({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

The complete Javascript code is put inside a Javascript function which has the format

Javascript(Javascript code as string)

js = Javascript('''Javascript code''')

This js is now executed using eval_js

 data = eval_js('clickPicJS({})'.format(quality))

The eval_js function is used to rxecute Javscript code here. Its from the

from google.colab.output libraary and imported as

from google.colab.output import eval_js

Here is a pic taken from my web camera

Next, we read the text in this picture.

print(imgpath)
readtext = pytesseract.image_to_string(Image.open(imgpath))
print(readtext)
pdf = pytesseract.image_to_pdf_or_hocr(imgpath, extension='pdf')
with open('extracteddata.pdf', 'w+b') as f:
    f.write(pdf) # pdf type is bytes by default

The text it read is printed on the screen.

Not fully accurate but you need to make amends for the not so great quality of my web camera.

Here is the pdf that was created.

Extracted PDF

We will input a file from the laptop now.

Run this code.

from google.colab import files
uploaded = files.upload()
print(uploaded)
keys=list(uploaded.keys())
print(keys[0])
imgpath=keys[0]

Press the Choose Files button

File input done

Here is the text that it reads.

Its perfect this time.

Here is the pdf that got created.

Extracted PDF

Here is the complete code on Colab.

Code on Colab

The link on GIT

Link on GIT

Please run the code as you proceed and do send in your comments.

The Programming Sutras

How to read text written in images or text visible in your laptop's web camera using Python?

Using the Pytesseract library.

Discussion about this post