Contents
ABSTRACT
The scientific research has given enough technologies for the facilitation of human
daily life. But nothing comes with the only the positive sides and hence the accidents
due to the drowsiness in the technological area is undeniable. And this device aims at
detecting drowsiness and yawn of the person. For this, the device is composed of
Raspberry Pi 4, and Pi camera module. The main portion of this project is carried out
by the image processing. And for this, the real time video is acquired from the pi camera
module continuously. With the application of Haar Cascade Classifier, the face of the
person is detected along with his/her both eyes and lips. And with the help of dlib, the
coordinates of eyes and lips are also extracted. After the eyes and lips are detected,
ratio of upper and lower lobe of the eyes are calculated and it is called EAR. Also, the
ratio of upper and lower lips is calculated. The threshold value for EAR, and
consecutive EAR and lips ratio are predefined. With each blink of eyes, counter is
increased. If the EAR obtained from the person’s face is less than the predefined
threshold EAR for 2 or more than 2 secs or if the value of counter is greater than
consecutive EAR threshold, then the drowsiness is detected. And if the lips ratio
obtained is greater than the predefined lips threshold for 3 or more than 3 secs then the
yawn is detected. For altering the person in each case, there is a specific text message
that is converted into speech with the help of eSpeak. The sensitivity, specificity and
accuracy for eye detection was found to be 97.87%, 92.45% and 95% respectively.
Similarly for lips detection, sensitivity, specificity and accuracy was found to be
97.82%, 90.74%, 94% respectively. In this way, this project aims at detecting
drowsiness and yawn of the person and alert him/her in time.
INTRODUCTION
Background
According to the Nepal Police, 10,178 road crashes occurred in the fiscal year 2016/17 that resulted in 2384 deaths, 4250 serious injuries, 8290 minor injuries, and 7708 occurrences of vehicle damage. And one of the major reasons for the RTA is drowsiness and fatigue. Drowsiness is a state of strong desire for sleep. In other words, a tired state between sleeping and being awake. The persons who are likely to get drowsiness are drivers who do not get enough sleep, commercial drivers who operate vehicles such as tow trucks, tractor trailers, and buses, shift workers (who work the night shift or long shifts), persons with untreated sleep disorders such as sleep apnoea, where breathing repeatedly stops and starts, persons who use medications that make them sleepy [1]. Also, the alcohol consumption has been found as one of the major reasons for drowsiness during work or driving. Because of these factors, many
accidents occur in roads, industries and other sectors because these factors cause frequent yawning, nodding off, difficulty to concentrate and inability to remember things [2]. Exhausted drivers who doze off at the wheel are responsible for about 40% of road accidents, says a study by the Central Road Research Institute (CRRI) on the 300-km
Agra-Lucknow Expressway [3]. Sleepiness can result in crashes any time of the day or night, but three factors are most commonly associated with drowsy-driving crashes. Determining a precise number of drowsy-driving crashes, injuries, and fatalities is not yet possible [4]. In Nepal, on the morning of October 21 2020, six people died and 16 others were injured when a passenger bus met with an accident near Arunkhola along the East-West Highway in Nawalparasi (East). According to the District Traffic Police Office, the crash occurred as the driver dozed off while driving. The speeding 40-seater night bus was carrying 52 passengers [5]. Drowsiness causes accidents not only on the roads but also in many workplaces and industries. Overly sleepy employees are 70% more likely to be involved in workplace accidents than colleagues who are not sleep-deprived. These workplace accidents can have dire consequences. In a Swedish study of over 50,000 workers, those who selfreported disturbed sleep was twice as likely to die in an accident related to the workplace [5]. Fatigue and drowsiness played a vital role in some of the world biggest disasters. One of the world’s biggest disaster is The Chernobyl Disaster. Long hours, tight deadlines, and working at night was found to be the one of the reasons. Investigators concluded that fatigue due to 13-hour shifts was a leading contributor to the human error that led to the explosion. Reactor Four experienced a power increase resulting in a radioactive explosion that killed two workers that night, and directly killed 28 people in the four months after the accident (due to severe radiation poisoning). The disaster occurred in a small Ukrainian town located along the Pripyat River, just 16km from the Belarus border. Radiation was sent all across Eastern Europe and the USSR; it still lingers across Ukraine today. A 2600km exclusion zone is still enforced, three decades later [6].
Beside some of these accidents and disasters, there are many more of them which is
caused due to the drowsiness of the workers or drivers. But it is very difficult to control
and prevent all the accidents and disasters manually by executing some rules and regulations.
And this is where technology comes in, there have been a lot of devices
that were invented around the world to reduce this cause but most of them are very
expensive or not available in the local market. This report discusses about a system that
uses computer vision and image processing to control the damage caused by the above
stated phenomenon.
Objectives
To detect the drowsiness of the person by calculating the EAR.
To detect the yawning by calculating the Lips Ratio.
Scope and Applications
The system will detect the drowsiness and yawning of the person and alert.
The system will prevent the risk of accident and improve safety.
LITERATURE REVIEW
Literature Review
The projects addressing similar problem like in this project, have been already done in
national and international level but each of such projects are provided with different
features but here in this project all such special features from different projects are
combined into a single project.
“A review of state-of-art techniques” [7], presents a driver drowsiness detection
based on behavioral measures using machine learning techniques. Faces contain
information that can be used to interpret levels of drowsiness. These include eye blinks,
head movements and yawning. The recent rise of deep learning requires that these
algorithms be revisited to evaluate their accuracy in detection of drowsiness. As a
result, this paper reviews machine learning techniques which include support vector
machines, convolutional neural networks and hidden Markov models in the context of
drowsiness detection. Furthermore, a meta-analysis is conducted on 25 papers that use
machine learning techniques for drowsiness detection. Finally, this paper lists publicly
available datasets that can be used as benchmarks for drowsiness detection.
“Driver Drowsiness Detection Using Eye-Closeness Detection” [8], attempted to
address the issue by creating an experiment in order to calculate the level of drowsiness.
A requirement for this paper was the utilization of a Raspberry Pi Camera and
Raspberry Pi 3 module, which were able to calculate the level of drowsiness in drivers.
The frequency of head tilting and blinking of the eyes was used to determine whether
or not a driver felt drowsy.
“Detecting Driver Drowsiness Based on Sensors” [9], have attempted to determine
driver drowsiness using the following measures: (1) vehicle-based measures; (2)
behavioral measures and (3) physiological measures. A detailed review on these
measures will provide insight on the present systems, issues associated with them and
the enhancements that need to be done to make a robust system. In this paper, we review
these three measures as to the sensors used and discuss the advantages and limitations
of each. The various ways through which drowsiness has been experimentally
6manipulated is also discussed. We conclude that by designing a hybrid drowsiness
detection system that combines non-intrusive physiological measures with other
measures one would accurately determine the drowsiness level of a driver.
“Driver Drowsiness Detection Model Using Convolutional Neural Networks
Techniques for Android Application” [10], focuses on the detection of such micro
sleep and drowsiness using neural network-based methodologies. Our previous work
in this field involved using machine learning with multi-layer perceptron to detect the
same. In this paper, accuracy was increased by utilizing facial landmarks which are
detected by the camera and that is passed to a Convolutional Neural Network (CNN)
to classify drowsiness. The achievement with this work is the capability to provide a
lightweight alternative to heavier classification models with more than 88% for the
category without glasses, more than 85% for the category night without glasses. On
average, more than 83% of accuracy was achieved in all categories.
“Driver drowsiness detection system” [11], a module for Advanced Driver
Assistance System (ADAS) is presented to reduce the number of accidents due to
drivers fatigue and hence increase the transportation safety; this system deals with
automatic driver drowsiness detection based on visual information and Artificial
Intelligence. We propose an algorithm to locate, track, and analyse both the drivers face
and eyes to measure PERCLOS, a scientifically supported measure of drowsiness
associated with slow eye closure.
“Tracker for sleepy drivers at the wheel” [12], in this paper a tracker has been
created which plans to evaluate driver’s fatigue, exhaustion, and diversion throughout
driving. The framework composed is a non-intrusive constant checking framework and
it consists of camera which keeps a vigilant eye on driver’s movements to detect
drowsiness. The system deals with detecting eyes in an extracted image from video
input. All the possible actions have been considered and output is generated
accordingly. Drowsiness is determined by observing the eye blinking patterns of the
driver. If eyes are found to be closed for a particular time period given by threshold
value, the framework reaches the determination that the driver is nodding off and issues
7a notice flag. The system is implemented using Haar cascade object detector using
OpenCV (Open-Source Computer Vision Library), which detects eyes from the input
image.
“Driver drowsiness detection using face expression recognition” [13], unlike
conventional drowsiness detection methods, which are based on the eye states alone,
we used facial expressions to detect drowsiness. There are many challenges involving
drowsiness detection systems. Among the important aspects are: change of intensity
due to lighting conditions, the presence of glasses and beard on the face of the person.
In this project, we propose and implement a hardware system which is based on infrared
light and can be used in resolving these problems.
METHODOLOGY
SYSTEM BLOCK DIAGRAM
The block diagram of overall system is given below. The Pi camera module gather the
data which is then manipulated and processed by the microcomputer. After processing,
the microcomputer delivers the output.
The Raspberry Pi used is the brain of the project. It is responsible for acquiring,
processing, storing and communicating the information from sensor and module and
then executing the events respectively. From the block diagram, we can see that, the Pi
camera module is connected to the input terminal of the Raspberry Pi. When the
microcomputer receives the data from the input terminal, it starts processing the data.
For the drowsiness and yawn detection, image processing is performed.
Just looking at the overall working structure of the system, the first phase becomes the
acquirement of real time video from the Raspberry Pi camera module which is
connected to the Raspberry Pi. After the video stream is started, the system searches
the face and detects it. The face is detected using Facial Landmarks Prediction. Facial
landmark prediction is the process of localizing key facial structures on a face,
including the eyes, eyebrows, nose, mouth, and jawline. Since the project is based on
drowsiness and yawn detection, only the eyes and lips region are required. Once the
eyes and lips region are detected, Eye Aspect Ratio (EAR) and ratio of upper and lower
lips are calculated. As soon as the EAR and lips ratio are calculated, it is then compared
with the predefined threshold values of EAR and lips ratio. In this project, EAR
threshold is set to 0.3 and lips ratio threshold is set to 25. Also, the consecutive EAR
threshold is set to 20. The consecutive EAR represents the number of times when a
person blinks his/her eyes.
The system will continuously detect the face and calculate respective EAR and lips
ratio in real time and compare them with the predefined threshold values. If at any
instant, the EAR obtained from face is less than 0.3 then the value of count will be
increased. If the value of count is greater than consecutive EAR threshold then the
system will consider it as the drowsiness and alter through text “Please wake up” which
is converted into speech with the help of eSpeak. Also, if the calculated EAR is less
than EAR threshold for 2 or more than 2 sec then the system will consider it as
drowsiness. eSpeak is a command line program which converts the text from the file
into the speech/audio. Also, since the system is constantly monitoring the lips as well,
if at any instant the person opens up the mouth wide for more than 3 or more than 3 sec
such that the lips ratio obtained is greater than that of threshold lips ratio 25, then the
system will consider that the person is yawing. And it will alert the person in real time
with the text “You can take some fresh air” with the help of eSpeak as before.
The system runs continuously without any break unless the power supply to the system
is cut off. Raspberry Pi is basically powered up by a USB type C port. And for Pi
camera, the Raspberry Pi itself can provide the enough power for its operation.
Image Processing
Digital image processing is the use of a digital computer to process digital images
through an algorithm. Computer sees a matrix of numbers between 0 and 255. For a
colored image there will be three channels – R B and G and there will be matrices
associated with each of these channels. Each elements of matrix represents the intensity
of brightness of that pixels. All of the channels will have separate matrices and each of
them will be stacked on to each other to create a 3 D matrix. So, a computer will
interpret a colored image as a 3D matrix. If a size of the image is 700700 and a colored image then there are 700 rows, 700 columns and 3 channels. Hence the total pixels of the image will be 700700*3. For grayscale or black and white image there will be
only 1 channel. All the images will be first converted to numpy array. The basic image
processing operation of this project is shown in the figure below.
Haar Cascade Classifier
Haar Cascade Classifier is one of the few object detection methods with the ability to
detect faces. It offers high-speed computation depending on the number of pixels inside
the rectangle feature and not depending on each pixel value of the image. This method
has three steps for detecting an object namely as Haar-like feature, integral image and
Cascade Classifier. For the detection of the face, Haar features are the main part of the
Haar Cascade Classifier. The Haar features are used to detect the presence of feature in
given image. Each feature results in a single value, which is calculated by the sum of
pixels under black rectangle. The Haar-like feature is a rectangular feature providing
specific indication to an image for rapid face detection [14].
Haar Cascade Classifier is one of the few object detection methods with the ability to
detect faces. It offers high-speed computation depending on the number of pixels inside
the rectangle feature and not depending on each pixel value of the image. This method
has three steps for detecting an object namely as Haar-like feature, integral image and
Cascade Classifier. For the detection of the face, Haar features are the main part of the
Haar Cascade Classifier. The Haar features are used to detect the presence of feature in
given image. Each feature results in a single value, which is calculated by the sum of
pixels under black rectangle. The Haar-like feature is a rectangular feature providing
specific indication to an image for rapid face detection [14].
In obtaining object detection value, Haar-like feature value was calculated using
integral image. It starts scanning the image for the detection of the face from the top
left corner and ends the face detection process at the right bottom of image in order to
detect the face from an image. The integral image could calculate values accurately and
relatively quick by creating new presentation of image by using value of region
previously scanned by specific Haar-like feature. For detection of eyes and lips from
the face, key points are selected. The figure below shows the key points or index of
human face [15].
Shape predictors, also called landmark predictors, are used to predict key (x, y)-
coordinates of a given “shape”. The most common, well-known shape predictor is
dlib’s facial landmark predictor used to localize individual facial structures, including
the:
➢ Eyes
➢ Eyebrows
➢ Nose
➢ Lips/mouth
➢ Jawline
Facial landmarks are used for face alignment (a method to improve face recognition
accuracy), building a “drowsiness detector” to detect tired, sleepy person at work or
behind the wheel.
The value at any point (x, y) is the summed area table of the sum off all the pixels above
and to the left of (x, y), inclusive as shown in Equation (1) below [16].
?(?, ?) = ∑ ?(? ′ , ?′)
? ′ <?
? ′ <?
Where i(x, y) is the value of pixel at (x’, y’) whereas I(x, y) is the sum of integral of
pixel values. The value of integral image, I(x, y) is obtained by sum value previous
index, starting from the left top until right bottom. Moreover, the summed-area table
can be computed efficiently in a single pass over the image, as the value in the summed-
area table at (x, y) in Equation (2).
?(?, ?) = ?(?′, ?′) + ?(?, ? − 1) + ?(? − 1, ?) + ?(? − 1, ? − 1)
Figure 3.6 (a) Input Image (b) Integral Image (c) Using Integral Image to
calculate the sum over rectangle D [17]
Once the summed-area table had been computed, evaluating the sum of intensities over
any rectangular area requires exactly four array references regardless of the area size.
Equation (3) shows the sum of i(x, y) over the rectangle spanned by A, B, C and D
[18].
∑
?(?, ?) = ?(?) + ?(?) − ?(?) − ?(?)
?0<?<?1
<?0?<?1
Value which has been calculated by using integral image would then be compared with
threshold value of specific features provided by AdaBoost. This should be completed
to find potential features because not all features were relevant to use for specific object
detection. AdaBoost combines potential features called weak classifier to become
strong classifier. The Cascade classifier can be divided into two which are strong
classifier and weak classifier. Weak classifier means less accurate or also irrelevant
prediction and strong classifier means more accurate or relevant prediction. Strong
classifier made by AdaBoost can detect object level by level on a cascade.
Eye Aspect Ratio (EAR) and Lips Ratio
After detecting the face of the driver, the calculation of drowsiness and yawn level of
the driver is based on eye blink rate and movement of lips. The Eye Aspect Ratio (EAR)
formula is able to detect the eye blink and Lips ratio is calculated using the scalar value.
For instance, if driver blinks eyes more frequently, it means that the driver is in the
state of drowsiness. Thus, it is necessary to detect the eyes shape accurately in order to
calculate the eye blink frequency. Also, if the driver opens and closes mouth frequently
then it means that the driver is yawning and driver may go in the drowsiness state. From
the landmarks detected in the image with face, the EAR is used as an estimate of the
eye openness state and Lips Ratio is used as an estimate of the mouth openness state.
For every video frame, the eyes and lips landmarks are detected between height and
width of the eyes and upper and lower lips that had been computed. The eye aspect
ratio
can
be
defined
??? =
by
the
equation
below:
|?2 − ?6| + |?3 − ?5|
2 ∗ |?1 − ?4|
The Lips Ratio can be calculated as shown below:
???? ????? =
|?2 − ?8| + |?3 − ?7| + |?4 − ?6|
2 ∗ |?1 − ?5|
Equation of EAR shows the eye aspect ratio formula where P1 until P6 are the 2D
landmark locations. The P2, P3, P5 and P6 are used to measure the height whereas P1
and P4 are used to measure width of the eyes in meter (m) as shown in Figure 3.1.1.2.
The eye aspect ratio is a constant value when the eye is opened, but rapidly falls
approximately to 0 when the eye is closed as shown in the Figure 3.7.
Equation of Lips Ratio shows the lips ratio formula where Q1 until Q8 are the 2D
landmark locations. The Q2, Q3, Q4, Q6, Q7 and Q8 are used to measure the height
whereas Q1 and Q5 are used to measure width of the lips in meter (m) as shown in
Figure 3.8. The Lips Ratio gives the constant value when the mouth is wide open and
gives approximately equal to zero when is mouth is completely closed. This is because
of the minimal distance between the upper and lower lips.
FLOW CHART
Hardware Description
Raspberry Pi 4
The Raspberry Pi is a low cost, credit-card sized computer that plugs into a computer
monitor or TV, and uses a standard keyboard and mouse. It is a capable little device
that enables people of all ages to explore computing, and to learn how to program in
languages like Scratch and Python. It’s capable of doing everything you’d expect a
desktop computer to do, from browsing the internet and playing high-definition video,
to making spreadsheets, word-processing, and playing games. The Raspberry Pi has
the ability to interact with the outside world, and has been used in a wide array of digital
maker projects, from music machines and parent detectors to weather stations and
tweeting birdhouses with infra-red cameras. The Raspberry Pi 4 is the most powerful
and feature-rich Raspberry Pi to be released, an incredible improvement on previous
boards.
Raspberry Pi camera module
The Pi camera module is a portable light weight camera that supports Raspberry Pi. It
communicates with Pi using the MIPI camera serial interface protocol. The Raspberry
Pi Camera Module v2 replaced the original Camera Module in April 2016.The v2
Camera Module has a Sony IMX219 8-megapixel sensor. The Camera Module can be
used to take high-definition video, as well as stills photographs.
Specification
Resolution: 5 Megapixels
Video modes: 1080p30, 720p60 and 640 × 480p60/90
Sensor: Sony IMX219
Picture formats : JPEG (accelerated), JPEG + RAW, GIF, BMP, PNG, RGB888
Video formats: raw h.264 (accelerated)
Triggers: Keypress, UNIX signal, timeout
Software Description
Python Programming Language
Python is an interpreted, object-oriented, high-level programming language with
dynamic semantics. Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python’s simple, easy to learn syntax emphasizes readability and
therefore reduces the cost of program maintenance. Python supports modules and
packages, which encourages program modularity and code reuse. The Python
interpreter and the extensive standard library are available in source or binary form
without charge for all major platforms, and can be freely distributed.
OpenCV
OpenCV is the huge open-source library for the computer vision, machine learning,
and image processing plays a major role in real-time operation. By using it, one can
process images and videos to identify objects, faces, or even handwriting of a human.
When it integrated with various libraries, such as NumPy, python is capable of
processing the OpenCV array structure for analysis. To identify image pattern and its
various features we use vector space and perform mathematical operations on these
features. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac
OS, iOS and Android. When OpenCV was designed the main focus was real-time
applications for computational efficiency. All things are written in optimized C/C++ to
take advantage of multi-core processing.
dlib
dlib is a modern C++ toolkit containing machine learning algorithms and tools for
creating complex software in C++ to solve real world problems. It is used in both
industry and academia in a wide range of domains including robotics, embedded
devices, mobile phones, and large high -performance computing environments.
eSpeak: Speech Synthesizer
eSpeak is a compact, open-source, and software speech synthesizer for Linux,
Windows, and other platforms. It uses a formant synthesis method, providing many
languages in a small size. Much of the programming for eSpeak’s language support is
done using rule files with feedback from native speakers.
CODING( FILE)
DESIGN AND IMPLEMENTATION
Pi Camera
Requirement Analysis
The standard Camera Module is green. They are installed and work in the same way.
Install the Raspberry Pi Camera module by inserting the cable into the Raspberry Pi
camera port. The cable slots into the connector situated between the USB and micro-
HDMI ports, with the silver connectors facing the micro-HDMI ports.
RESULT AND ANALYSIS
After the complete connection of the modules and sensors with the
microcomputer, and implementation of code, the system was ready for testing.
Some of the outcomes of this project are shown in the figure below
The values of EAR were recorded for both the opened and closed eyes and the
average EAR was calculated. The average EAR when eyes were open is found
to be 0.336 and the average EAR for closed eyes was found to be 0.156.
From above table it can be seen that while closing the eyes, EAR cannot be equal
to 0 or less than 0. It is always found to be greater than 0.
28The values of Lips Ratio were recorded when the mouth was wide open and
closed. The average value for both the cases is calculated as shown in the figure.
From the table it is seen that the average lips ratio for closed mouth is 11.18 and
for the open wide open mouth is 28.104
From the above information, we can detect whether the eyes are open or closed
and also whether the mouth is open or closed. From the behavioral study of
drowsiness, the system considers the closing of eyes as drowsiness if the eyes
are closed with EAR less than EAR threshold for 2 or more than 2 secs. And
also, from the behavioral study of yawning, the system considers the opening of
mouth as yawning if the mouth is wide open with Lips Ratio greater than the
Lips Ratio threshold for 3 or more than 3 secs.
In this project, 50 observations for each case were taken out of which 30
observations were taken without wearing glass and 20 were taken with wearing
glass.
Following assumptions were made to calculate the accuracy and sensitivity of
eyes detection.
Positive = closed eyes
Negative = open eyes
29True Positive (TP) = Detect closed eyes when eyes are closed
False Positive (FP) = Detect open eyes when eyes are closed
True Negative (TN) = Detect open eyes when eyes are open
False Negative (FN) = Detect closed eyes when eyes are open
Following output were observed out of 50 observations:
TP = 46
FP = 4
TN = 49
FN = 1
Performance parameters are calculated as follows:
•
??????????? (???? ???????? ????) =
??
??+??
46
= 46+1
= 97.87%
•
??????????? (???? ???????? ????) =
??
??+??
49
= 49+4
= 92.45%
•
???????? =
??+??
??+??+??+??
46+49
46+49+4+1
= 95%
From the observation, the sensitivity obtained is 97.87% which shows that the
system correctly identifies the closed eyes when eyes are closed at the rate of
97.87%.
Also, specificity obtained is 92.45% which denotes that the system can correctly
identify the open eyes when eyes are open at the rate of 92.45%.
The accuracy obtained is 95% which denotes that the system can detect the
closing and opening of eyes with 95% correctness.
Also, to determine the performance parameters for Lips Ratio, 50 observations
were made with following assumptions.
Positive = Open mouth
Negative = Closed mouth
True Positive (TP) = Detect open mouth when mouth is open
False Positive (FP) = Detect closed mouth when mouth is open
True Negative (TN) = Detect closed mouth when mouth is closed
False Negative (FN) = Detect open mouth when mouth is closed
Following outputs were observed from 50 observations taken:
TP = 45
FP = 5
TN = 49
FN = 1
Performance parameters are calculated as follows:
•
??????????? (???? ???????? ????) =
45
45+1
= 97.8%
•
??????????? (???? ???????? ????) =
49
49+5
= 90.74%
•
???????? =
45+49
45+49+5+1
= 94%
31From the observation, the sensitivity obtained is 97.8% which shows that the
system correctly identifies the open mouth when mouth is open at the rate of
97.7%.
Also, specificity obtained is 90.74% which denotes that the system can correctly
identify the closed mouth when mouth is closed at the rate of 92.45%.
The accuracy obtained is 94% which denotes that the system can detect the
closing and opening of mouth with 94% correctness.
Limitations
Though the desired outcome were obtained, yet there are some limitations of this
project. Some of the limitations are:
➢ Since the system developed is based on the Pi camera module, it is found
that the image capturing rate of Pi camera is quite slower than webcam.
Hence, processing and detection of face is slower.
➢ Although the alcohol is detected, this project doesn’t provide any
methods to prevent the driver from starting the engine.
➢ Raspberry Pi camera module doesn’t operate well at night.
➢ For the person with small eyes, the values of threshold EAR may have
to be changed.
DISCUSSION AND CONCLUSION
Discussion and Conclusion
Finally, it is concluded that the system can detect the face landmarks with the help of
dlib predictor and detector and locate the eyes and lips coordinate from the face of the
person. Blinking of eyes and yawning are highly related with the symptoms of
drowsiness. Hence, calculating the eyes detection performance parameters, the
sensitivity was obtained to be 97.87%, specificity 92.45% and accuracy was found to
be 95%. In the similar manner for yawning, lips detection performance parameters were
calculated and sensitivity was found to be 97.82%, specificity 90.74% and accuracy to
be 94%. Hence, with this performance parameters of eyes and lips detection, EAR and
Lips Ratio were calculated. The Pi camera continuously transmits the video streams to
Raspberry Pi. If the EAR calculated is less than EAR threshold for 2 or more than 2
secs then the drowsiness will be detected by the system. Also, if the Lips Ratio
calculated is greater than the Lips Ratio threshold which is 30 for 3 or more than 3 secs
then the system will detect yawning of the person. Since, during drowsiness, eyes and
mouth of the person are highly affected with continuous blinking of eyes and
continuous yawning, this system is capable of detecting the movement of eyes and lips
hence, making it possible to detect the drowsiness of the person.
Future Enhancement
This system is not only applicable to the transportation sector, in fact this system can
be implemented in various sectors which requires constant human monitoring. But
here, the main concern is to detect the drowsiness and yawning of the driver so as to
prevent any kind of minor or major accidents. The further enhancement which can be
carried out may be:
➢ This system can be used to activate the auto-pilot mode in the upcoming
generations of transportation as soon as the system detects drowsiness or
yawning of the driver.
➢ This system can also be implemented to detect the drowsiness of security guards
for proper monitoring.
➢ Raspberry Pi camera module can be replaced by the night vision camera so that
the system can function properly both at day and night.
➢ This system can also be implemented in the factories to constantly monitor the
machine operator
APPENDIX
PROJECT BY:
Nirmal adhikari
Pradip shrestha(https://www.facebook.com/pradip.shrestha.1420
Sandesh sigdel
Sudip poudel