Contents
INTRODUCTION
In our planet of 7.4 billion humans, 285 million are visually impaired out of whom 39 million people are completely blind, i.e. have no vision at all, and 246 million have mild or severe visual impairment (WHO, 2011). It has been predicted that by the year 2020, these numbers will rise to 75 million blind and 200 million people with visual impairment [5]. As reading is of prime importance in the daily routine (text being present everywhere from newspapers, commercial products, sign-boards, digital screens etc.) of mankind, visually impaired people face a lot of difficulties. Our device assists the visually impaired by reading out the text to them. There have been numerous advances in this area to help visually impaired to read without much difficulties. The existing technologies use a similar approach as mentioned in this paper, but they have certain drawbacks. Firstly, the input images taken in previous works have no complex background, i.e. the test inputs are printed on a plain white sheet. It is easy to convert such images to text without pre-processing, but such an approach will not be useful in a real-time system [1][2][3]. Also, in methods that use segmentation of characters for recognition, the characters will be read out as individual letter and not a complete word. This gives an undesirable audio output to the user. For our project, we wanted the device to be able to detect the text from any complex background and read it efficiently. Inspired by the methodology used by Apps such as “CamScanner”, we assumed that in any complex background, the text will most likely be enclosed in a box eg billboards, screens etc. By being able to detect a region enclosing four points, we assume that this is the required region containing the text. This is done using warping and cropping. The new image obtained then undergoes edge detection and a boundary is then drawn over the letters. This gives it more definition. The image is then processed by the OCR and TTS to give audio ouput.
2. MOTIVATION
Our device is designed for people with mild or moderate visual impairment by providing the capability to listen to the text. It can also act as a learning aid for people suffering from dyslexia or other learning disabilities that involve difficulty in reading or interpreting words and letters. We wish to enable these people to be independent and self-reliant as they will no longer need assistance to understand printed text. Such people will always have access to information hence they will never feel at a disadvantage. The impact of the development and introduction of our system into the technological world will be a revolutionary boon to modern civilization.
3. PROBLEM STATEMENT
Visual impairment people uses braille system for witting and reading purposes. The visually impaired person feels the arrangement of the raised dots which conveys the information and are very difficult so keeping these things in mind we have designed our system in such a way that reading any book for blind people become easier. The braille system is very difficult and time consuming so if we can convert a text to audio then it would be very faster and easier.
4. OBJECTIVES
The objectives of our project are:
- To extract information (text) and convert them into digital form and then recite it accordingly.
- To be as effective medium for communication.
5. LITERATURE REVIEW
Visual impairment or vision loss is defined as the decreased ability to see clearly and cannot be fixed using glasses. Blindness is the term used for complete vision loss. The common causes of vision loss are uncorrected refractive errors, cataracts and glaucoma. People with visual impairment face a number of difficulties in normal daily activities like walking, driving and reading.[6]
Braille
Braille is writing and reading system used people who have visual impairment. Braille language is written on embossed paper. The braille characters are small rectangular blocks called cells that contain bumps called raised dots. The visually impaired person feels the arrangement of the raised dots which conveys the information. [7]
Although braille readers, keyboards and monitors exist, they are not accessible to the rural communities and braille material is not easily and abundantly available. [8]
Raspberry pi
The raspberry Pi is a small, low cost CPU which can be used with a monitor, keyboard and mouse to become an efficient, full-fledged computer [9]. The reason we chose Raspberry Pi micro-computer for our project is that, firstly, it is an easily available, low-cost device. RPi uses software which are either free or open source, which also makes it cost-effective. The Raspberry Pi uses an SD card for storage and its small size also gives us the advantages of portability.[10]
As a part of the software development, the Open CV (Open source Computer Vision) libraries are utilized for image processing. Each function and data structure was designed with the Image Processing coder in mind. [11] Existing systems & their limitations
One of the biggest advantages of barcode readers is portability. Hence, they can be used by the visually impaired in identifying different products. An extensive database is created which contains all the information about the product. The user simply scans the bar code and the product details are listed through e-braille readers. The disadvantage with this product is that the user might not be able to point the bar code reader in the correct direction. [2]
Another approach is optical enhancement solutions such as an optical zooming device that expands the braille character. However, not all visually impaired people need to know braille language. [4]. Some methods aim at converting text to speech. This is accomplished using a scanner, speakers and a computer. This method is efficient only with simple scanned documents. It cannot extract text from an image with a complex background. [4]

fig 1:- block diagram of proposed system
6.2. HARDWARE SPECIFICATIONS
Raspberry pi
Raspberry pi is a device that contains several important functions on a single chip. It is a system on a chip(SoC). The Raspberry Pi 3 uses Broadcom BCM2837 SoC Multimedia processor. The Raspberry Pi’s CPU is the 4x ARM Cortex-A53, 1.2GHz processor. It has internal memory 1GB LPDDR RAM (900Mhz) and external memory can be extended to 64 GB. In Raspberry Pi 3, the two main new features are wireless internet connection
802.11n and Bluetooth 4.1 classic. It has 40 GPIO pins. The Raspberry pi camera is 5MP and has a resolution of 2592×1944. The Raspberry Pi has a 3.5mm audio port so earphones or speaker can easily be connected to it to hear audio.

Fig.2 schematic diagram of Raspberry pi
Camera Module
The Raspberry Pi camera module size is 25mm square, 5MP sensor much smaller than the Raspberry Pi computer, to which it connects by a flat flex cable (FFC, 1mm pitch, 15 conductor, and type B )
Fig.3 Raspberry Pi Camera Module
The Raspberry Pi camera module offers a unique new capability for optical instrumentation with critical capabilities as follows:
1080p video recording to SD flash memory cards. Simultaneous output of 1080p live video via HDMI, while recording. Sensor type: Omni Vision OV5647 Colour CMOS QSXGA (5megapixel) , Sensor size: 3.67 x 2.74 mm , Pixel Count: 2592 x 1944 ,Pixel Size: 1.4 x 1.4 um, Lens: f=3.6 mm, f/2.9 ,Angle of View: 54 x 41 degrees, Field of View: 2.0 x 1.33 m at 2 m , Full-frame SLR lens, equivalent: 35 mm ,Fixed Focus: 1 m to infinity, Removable lens. Adapters for M12, C-mount, Canon EF, and Nikon F Mount lens interchange. In-camera image mirroring.
6.3. SOFTWARE SPECIFICATIONS
Raspbian is a free operating system, based on Debian, optimized for the Raspberry Pi hardware.Raspbian Jessie is used as the version is RPi’s main operating system in our project. Our code is written in Python language (version 2.7.13) and the functions are called from OpenCV. OpenCV, which stands for Open Source Computer Vision, is a library of functions that are used for real-time applications like image processing, and many others [14]. Currently, OpenCV supports a wide variety of programming languages like C++, Python, Java etc. and is available on different platforms including Windows, Linux, OS X, Android, iOS etc. The version used for our project is opencv-3.0.0. OpenCV’s application areas include Facial recognition system, Gesture recognition, Human–computer interaction (HCI), Mobile robotics, Motion understanding, Object identification,
Segmentation and recognition, Motion tracking, Augmented reality and many more. For performing OCR and TTS operations we install Tesseract OCR and Festival software. Tesseract is an opensource Optical Character Recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages. The package is generally called ‘tesseract’ or ‘tesseract-ocr’. Festival TTS was developed by the “The Centre for Speech Technology Research”,UK. It is an open source software that has a framework for building efficient speech synthesis systems. It is multi-lingual (supports British English, American English and Spanish). As Festival is a part of the package manager for Raspberry Pi, it is easy to install.
Image Processing
Books and papers have letters. Our aim is to extract these letters and convert them into digital form and then recite it accordingly. Image processing is used to obtain the letters. Image processing is basically a set of functions that is used upon an image format to deduce some information from it. The input is an image while the output can be an image or set of parameters obtained from the image. Once the image is being loaded, we can convert it into gray scale image. The image which we get is now in the form of pixels within a specific range. This range is used to determine the letters. In gray scale, the image has either white or black content; the white will mostly be the spacing between words or blank space.
Feature Extraction
In this stage we gather the essential features of the image called feature maps. One such method is to detect the edges in the image, as they will contain the required text. For this we can use various axes detecting techniques like: Sobel, Kirsch, Canny, Prewitt etc. The most accurate in finding the four directional axes: horizontal, vertical, right diagonal and left diagonal is the Kirsch detector. This technique uses the eight point neighborhood of each pixel.
Optical Character Recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine encoded text. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is crucial to the computerization of printed texts so that they can be electronically searched, stored more compactly, displayed on-line and used in machine processes such as machine translation, textto- speech and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
Tesseract
Tesseract is a free software optical character recognition engine for various operating systems. Tesseract is considered as one of the most accurate free software OCR engines currently available. It is available for Linux, Windows and Mac OS.
An image with the text is given as input to the Tesseract engine that is command based tool. Then it is processed by Tesseract command. Tesseract command takes two arguments: First argument is image file name that contains text and second argument is output text file in which, extracted text is stored. The output file extension is given as .txt by Tesseract, so no need to specify the file extension while specifying the output file name as a second argument in Tesseract command. After processing is completed, the content of the output is present in .txt file. In simple images with or without color (gray scale), Tesseract provides results with 100% accuracy. But in the case of some complex images Tesseract provides better accuracy results if the images are in the gray scale mode as compared to color images. Although Tesseract is command-based tool but as it is open source and it is available in the form of Dynamic Link Library, it can be easily made available in graphics mode.
7. METHODOLOGY
fig4:-basic methodology of project
Image acquisition: In this step, the inbuilt camera captures the images of the text. The quality of the image captured depends on the camera used. We are using the Raspberry Pi’s camera which 5MP camera with a resolution of 2592×1944.
Image pre-processing: This step consists of color to gray scale conversion, edge detection, noise removal, warping and cropping and thresholding. The image is converted to gray scale as many OpenCV functions require the input parameter as a gray scale image. Noise removal is done using bilateral filter. Canny edge detection is performed on the gray scale image for better detection of the contours. The warping and cropping of the image are performed according to the contours. This enables us to detect and extract only that region which contains text and removes the unwanted background. In the end, Thresholding is done so that the image looks like a scanned document. This is done to allow the OCR to efficiently convert the image to text.
Fig 5. image preprocessing
Image to text conversion: The above diagram(fig.5) shows the flow of Text-To-Speech. The first block is the image pre-processing modules and the OCR. It converts the preprocessed image, which is in .png form, to a .txt file. We are using the Tesseract OCR.
Text to speech conversion: The second block is the voice processing module. It converts the .txt file to an audio output. Here, the text is converted to speech using a speech synthesizer called Festival TTS. The Raspberry Pi has an on-board audio jack, the on-board audio is generated by a PWM output.
7.1. Algorithm
step1:- Start with initial values
step2:-Import sub-process with initialization of GPIO pins
step3:-If button pressed,
i. capture image with webcam(camera)
ii. perform Tesseract OCR iii. Thresholding save into text file iv. Festival software operation for text to speech
step4:-Repeat step3
7.2. Flowchart

CODE

YOUTUBE LINK
10. CONCLUSION
By implemented this system visually impaired can easily listen whatever they want to listen. And with the help of the translation tools he can convert the text to the desired language and then again by using the Google speech recognition tool he can convert that changed text into voice. By that they can be independent. And it is less cost compared to other implementations. Text-to-Speech device can change the text image input into sound with a performance that is high enough and a readability tolerance of less than 2%, with the average time processing less than three minutes for A4 paper size. This portable device, does not require internet connection, and can be used independently by people. Through this method, we can make editing process of books or web pages easier
Really this is nice topic