How to read text written in images or text visible in your laptop's web camera using Python?
Using the Pytesseract library.
Optical Character Reading or OCR is a high value technique with multiple applications. Reading vehicle numbers on the roadside, automatic scanning of text content from identity cards, passports etc, scanning books and converting it to pdfs.
What you will learn in this post?
Read text from images.
Input images from your web camera.
Save the given image with the picture text converted into characters into PDF.
To begin with let us look up the GIT Repository of Tesseract. It is at https://github.com/tesseract-ocr/tesseract.
Documentation of Tesseract is available here
Tesseract is a OCR( Optical Character Recognition) library and its short history is.
Tesseract was originally developed at Hewlett-Packard Laboratories Bristol UK and at Hewlett-Packard Co, Greeley Colorado USA between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. In 2005 Tesseract was open sourced by HP. From 2006 until November 2018 it was developed by Google.
Python-tesseract or Pytesseract is a python wrapper for Google's Tesseract-OCR.
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in image. Its official documentation is at
In this post we will be using two methods from the Pytesseract library.
pytesseract.image_to_string(picturefilepath, timeout in seconds)
pytesseract.image_to_pdf_or_hocr(picturefilepath, extension)
Let us start coding now. We will use Colab for coding as we have done in the past.
Colab is available here.
You might want to check our Colab beginners guide here.
To begin with we will install the libraries.
!sudo apt install tesseract-ocr # sudo permits a user to execute a command as a super user
# apt is short for Advance Packing Tool which extracts a package from a library.
!pip install pytesseract
# Our application requires pytesseract which has tesseract-ocr as a dependency
Run this block before you move to the next section. You might have to restart the runtime to ensure that the libraries are available for use.
Here is the output after running the program.
The next step is importing the libraries.
import pytesseract
from PIL import Image
PIL documentation is available here
We will use the Image.open(picturepath)
function.
Next step is the code for starting the web camera on your laptop and taking a pic
from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode
def clickPic(filename='photo.jpg', quality=0.8):
js = Javascript('''
async function clickPicJS(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Press to Capture';
div.appendChild(capture);
const video = document.createElement('video');
video.style.display = 'block';
const stream = await navigator.mediaDevices.getUserMedia({video: true});
document.body.appendChild(div);
div.appendChild(video);
video.srcObject = stream;
await video.play();
// Resize the output to fit the video element.
google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
// Wait for Capture to be clicked.
await new Promise((resolve) => capture.onclick = resolve);
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getVideoTracks()[0].stop();
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data = eval_js('clickPicJS({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
return filename
We create a Javascript function called clickPicJS
which starts the camera, takes a pic and returns its url.
Here is the full function
async function clickPicJS(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Press to Capture';
div.appendChild(capture);
const video = document.createElement('video');
video.style.display = 'block';
const stream = await navigator.mediaDevices.getUserMedia({video: true});
document.body.appendChild(div);
div.appendChild(video);
video.srcObject = stream;
await video.play();
// Resize the output to fit the video element.
google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
// Wait for Capture to be clicked.
await new Promise((resolve) => capture.onclick = resolve);
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getVideoTracks()[0].stop();
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data = eval_js('clickPicJS({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
return filename
}
This is put inside a Python function called
def clickPic(filename='photo.jpg', quality=0.8):
Here is the full code.
def clickPic(filename='photo.jpg', quality=0.8):
js = Javascript('''
async function clickPicJS(quality) {
const div = document.createElement('div');
const capture = document.createElement('button');
capture.textContent = 'Press to Capture';
div.appendChild(capture);
const video = document.createElement('video');
video.style.display = 'block';
const stream = await navigator.mediaDevices.getUserMedia({video: true});
document.body.appendChild(div);
div.appendChild(video);
video.srcObject = stream;
await video.play();
// Resize the output to fit the video element.
google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);
// Wait for Capture to be clicked.
await new Promise((resolve) => capture.onclick = resolve);
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
canvas.getContext('2d').drawImage(video, 0, 0);
stream.getVideoTracks()[0].stop();
div.remove();
return canvas.toDataURL('image/jpeg', quality);
}
''')
display(js)
data = eval_js('clickPicJS({})'.format(quality))
binary = b64decode(data.split(',')[1])
with open(filename, 'wb') as f:
f.write(binary)
return filename
The complete Javascript code is put inside a Javascript function which has the format
Javascript(Javascript code as string)
js = Javascript('''Javascript code''')
This js is now executed using eval_js
data = eval_js('clickPicJS({})'.format(quality))
The eval_js function is used to rxecute Javscript code here. Its from the
from google.colab.output libraary and imported as
from google.colab.output import eval_js
Here is a pic taken from my web camera
Next, we read the text in this picture.
print(imgpath)
readtext = pytesseract.image_to_string(Image.open(imgpath))
print(readtext)
pdf = pytesseract.image_to_pdf_or_hocr(imgpath, extension='pdf')
with open('extracteddata.pdf', 'w+b') as f:
f.write(pdf) # pdf type is bytes by default
The text it read is printed on the screen.
Not fully accurate but you need to make amends for the not so great quality of my web camera.
Here is the pdf that was created.
We will input a file from the laptop now.
Run this code.
from google.colab import files
uploaded = files.upload()
print(uploaded)
keys=list(uploaded.keys())
print(keys[0])
imgpath=keys[0]
Press the Choose Files button
File input done
Here is the text that it reads.
Its perfect this time.
Here is the pdf that got created.
Here is the complete code on Colab.
The link on GIT
Please run the code as you proceed and do send in your comments.