We successively consider the state of drowsiness, its indicators, and the related sensors.
Description Johns (88) appears to have given the earliest, accurate definition of drowsiness, i.e., the state of being drowsy. Massoz (137) provides useful, recent information about this state. Drowsiness is an intermediate arousal state between wakefulness and sleep, i.e., between being awake and being asleep; it thus refers to a state just before potential sleep. A drowsy person has both a difficulty to stay awake and a strong inclination to sleep. It is a continuous, fluctuating state (1) of reduced awareness of the “here and now” (89) and (2) of impaired cognitive and/or psychomotor performance. It is often the result of a monotonous activity, such as a long drive on a monotonous road. It can have a detrimental effect on the safety of driving. For example, in the USA, there were, in 2018, 785 fatal accidents due to drowsiness for a total of 36, 835 people killed in motor vehicle crashes and, in 2019, these numbers were 697 vs 36, 096 (155). It can be viewed as a state of basic physiological need like hunger and thirst, i.e., as an indication that one needs to sleep. It can be considered to be synonymous with sleepiness, somnolence, and sleepening, the latter being a less common term meaning “entry into sleep” (36).
Drowsiness is, however, not synonymous with fatigue. These are two distinct physiological states that are often confused, even in the scientific literature. Fatigue corresponds to the feeling of being tired or exhausted as a result of long periods of physical activity and/or cognitive activity. It is characterized by an increasing difficulty to accomplish an effort linked to a task. It can be considered to be synonymous with tiredness. Talking about fatigue helps one to further narrow down what drowsiness is and is not.
May and Baldwin (139) claim that, for driving, one should distinguish between sleep-related (SR) fatigue and task-related (TR) fatigue based on the causing factors. SR fatigue can be caused by sleep deprivation, long wakefulness, and time of day (with effect of circadian rhythm), while TR fatigue can be caused by certain characteristics of driving, like task demand and duration, even in the absence of SR fatigue. These suggested subcategories of fatigue clearly intersect with drowsiness, but it is difficult to say exactly how.
Fatigue can be alleviated by taking a break (without necessarily sleeping), while drowsiness can be alleviated by sleeping, even by taking a nap or a power nap. One can be drowsy without being fatigued and vice-versa, and one can be both. Fatigue and drowsiness both lead to decrements in performance. In practice, it is difficult to distinguish between them, and even more to quantify how much of a decrement is due to each of them individually, especially in real time and non-invasively. Their indicators appear to be mostly the same. In the driving context, one focuses on monitoring drowsiness, with the main goal of preventing the driver from falling asleep at the wheel.
There are many publications about the various ways of characterizing drowsiness (47; 55; 90; 137) and apparently fewer for fatigue (1). Very few papers tackle both phenomena (199).
Indicators We start with the driver-based indicators, divided into the three categories of physiological, behavioral, and subjective indicators.
The most substantial changes in physiology associated with changes in the level of drowsiness (LoD) lie in the brain activity as measured by the electroencephalogram (EEG). Tantisatirapong et al. (214) model EEG signals using the fractal Brownian motion (fBm) random process. They carried out experiments in a driving simulator, and considered the three time periods of before, during, and after sleep, where they mimic sleep by asking the driver to close his/her eyes, pretending to try to fall asleep. They saw corresponding changes in the computed fractal dimension (related, for self-replicating processes, to the Hurst exponent), which allows them to classify the driver as alert or drowsy. They conclude that the fractal dimension of an EEG signal is a promising indicator of drowsiness. Changes in physiology also manifest themselves in the heart activity, as measured by the ECG. Indeed, as drowsiness increases, the HR decreases and the heart rate variability (HRV) increases (224). However, HRV data vary both between individuals and over time for each individual, depending on both internal and external factors. Therefore, the many confounding factors that also influence HRV must be accounted for in order to use HRV as an indicator of drowsiness (172). The breathing activity is an indicator of drowsiness, as changes in breathing rate or inspiration-to-expiration ratio occur during the transition from wakefulness to drowsiness (100). Drowsiness leads to changes in EDA, also called skin conductance or galvanic skin response (GSR), which relates to the electrical resistance measured via electrodes on the surface of the skin. The skin resistance fluctuates with sweating, the level of which is controlled by the sympathetic nervous system, which autonomously regulates emotional states such as drowsiness (145). The pupil diameter instability has been linked to drowsiness. Indeed, several studies found that the pupil diameter fluctuates at a low frequency and with a high amplitude whenever a subject reports being drowsy (127; 159; 235).
The eye behavior is a good indicator of drowsiness. In a clinical setting, one traditionally characterizes this behavior by electrooculography (EOG) (26), which implies the use of electrodes. But, to do this non-invasively, one generally uses video sequences of the eye(s) and applies image-analysis methods to them. The dynamics of eye closures (in particular, long and slow eye closures) is recognized as a strong and reliable indicator of drowsiness (193). The most-standard indicator of spontaneous eye closure is the percentage of closure (PERCLOS) (41; 42; 234). It is usually defined as the proportion of time (over a given time window) that the eyelids cover at least 70% (or 80%) of the pupils. As the LoD increases, the eye closures become slower and longer, and the upper eyelid droops, and all of this contributes to an increase in PERCLOS. Other reliable, standard indicators include mean blink duration (10; 193), mean blink frequency or interval (124; 193), and eye closing and reopening speeds (193). Recently, Hultman et al. (83) used electrophysiological data obtained by EOG and EEG to detect drowsiness with deep neural networks, and found that, for driver-drowsiness classification, the EOG data (and, more precisely, the related blink data) are more informative than the EEG data.
All the above elements constitute objective indicators of drowsiness. Besides these, there are subjective indicators, consisting in questionnaires and self-reports. While they are not suitable for real-time characterization of drowsiness, they can be used to validate other indicators, as ground truth to train models, and/or to evaluate the performances of systems. These subjective indicators include the Karolinska sleepiness scale (KSS) (5), the Stanford sleepiness scale (SSS) (80), and the visual analog scale (VAS) (147).
The above information allows one to fill the cells of Table 4 at the intersection of the “Drowsiness” column and the “Driver” megarow. The latter lists a total of fourteen indicators. We stress that these may or may not be relevant for each of the five states.
A cell (at the lowest level) in the heart of Table 4 is either empty or filled with one or more related reference(s). For example, this table shows that we found three significant references about “pupil diameter” as an indicator of drowsiness, i.e., (127; 159; 235), while we found no significant reference about “gaze parameters” as an indicator of drowsiness. But the table shows that we found references reporting that this last indicator is useful for the “Emotions” state (discussed later).
Below, as we progressively fill Table 4 and Table 5, we simply indicate which cell(s) is/are concerned. As we progress, the discussion in the last two paragraphs remains valid, after proper adaptation.
As should be clear from this discussion, the finer hierarchical partitioning of Tables 4 and 5 into the lowest-level columns and rows is progressively obtained from the developments in Sections 3 to 10.
We now consider the vehicle-based indicators. In the literature, they are often called measures of driving performance, the latter being known to degrade with increasing drowsiness (53; 102; 233). These indicators characterize the driving behavior. Common such indicators include speed, lateral control (or lane discipline), braking behavior, and wheel steering. These last indicators are found in the central part of Table 4, next to the “Vehicle” header.
The main vehicle-based indicator of drowsiness is the standard deviation of lane position (SDLP) (66; 121; 125; 222). As the term suggests, SDLP measures the driver’s ability to stay centered in his/her lane. Drowsiness can also produce greater variability in driving speed (12). Another important vehicle-based indicator is the steering wheel movement (SWM) (121). It has been shown that a drowsy driver makes fewer small SWMs and more large ones. When a driver loses concentration, the vehicle begins to drift away from the center of the lane, but, when the driver notices the drift, he/she compensates by large SWMs toward the lane center (217).
Jacobé de Naurois et al. (86) conducted a study in a driving simulator, using different artificial neural networks (ANNs) based on various data, to detect drowsiness and predict when a driver will reach a given LoD. The data used are either (1) driver-based, physiological indicators (HR, breathing rate) and behavioral indicators (blinks, PERCLOS, head pose), or (2) vehicle-based indicators (lane deviation, steering wheel angle, acceleration, speed). The results of the study show that the best performance is obtained with behavioral data, successively followed by physiological data and vehicle data, for both detection and prediction.
Most real-time, drowsiness-monitoring systems characterize the LoD at the “present” time using sensor data located in a sliding time window butting up to this present time. This LoD thus corresponds, not to the present, but to roughly the center of the window, thus several seconds, or tens of seconds, in the past. If this “present” LoD is above a dangerous level, it may be too late for the driver or the vehicle to take proper action. Given that, at 100 km/h, it takes about 2 sec to get out of lane (then possibly hitting an obstacle), predictions just 10 to 20 sec into the future would already help. It is thus crucial to be able to predict (1) the future evolution of the LoD and (2) the associated risks.
Ebrahimbabaie (47) and Ebrahimbabaie and Verly (48) developed and tested a prediction system that (1) takes as input a discrete-time, validated LoD signal consisting of the past LoD values produced at regular intervals, up to just before the present time, as in (56; 55) (discussed later), and (2) produces as output several types of predictions. Treating the LoD signal as a realization of an underlying random process (RP), the authors investigate the use of the RPs called “autoregressive (integrated) moving average (AR(I)MA)” (from time-series analysis) and “geometric Brownian motion (GBM)” (found almost exclusively in finance). They show that the LoD signal can generally be modeled as AR(I)MA and GBM within each position of the sliding window (thus locally), they estimate the parameters of the model for each position of the window, and they use them to make predictions of one or more of the following three types: future values of LoD signal, first hitting time (of a critical LoD threshold), and survival probability.
We emphasize that “to predict” means “to tell beforehand”, and thus, in the present context, to use past data to compute now a quantity that describes some future situation. In the literature, this “future situation” often turns out to be a “present situation”, so that no prediction is performed.
The above information allows one to fill, in Table 4, the relevant cells of the “Drowsiness” column and the “Vehicle” megarow.
Note that there are no entries in the “Environment” megarow of the “Drowsiness” column, which means that we did not find any significant technique that uses one or more indicators related to one of the three parts of the environment listed in Section 5.3 (i.e., outside, inside, and contextual) to determine the level of drowsiness of the driver. Some papers attempt to use the time of day to try to capture the moments of the day where drowsiness tends to peak. While the monotonicity of a road is known to increase driver drowsiness, we have not found any paper using environment-based indicators of road monotonicity (e.g., road geometry or traffic density), and describing a way to give values to such indicators based upon available data. As an aside, studies of drowsiness in a driving simulator often use night driving and monotonous conditions to place the driver in a situation conducive to drowsiness.
Sensors Similarly to the indicators, we first address the driver (D)-centric sensors.
In a vehicle, the HR can be monitored using electrodes that can be placed at various locations, including the steering wheel (conductive electrodes (204)) and the seat (capacitive electrodes (113)). ECG monitoring using steering-wheel-based approaches is a feasible option for HR tracking, but requires both hands to touch two different conductive parts of the steering wheel.
Ballistocardiography (BCG) also allows for monitoring the cardiac activity unobtrusively. The underlying sensing concept uses strain-gauge BCG sensors in the seat or in the safety belt to detect both the cardiac activity and the respiratory activity of the driver (239). However, the car vibrations make it difficult to use this sensor in real driving conditions.
Information about the cardiac activity can be obtained using a camera looking at the driver, in particular using photoplethysmography (PPG) imaging (250).
Radar-based methods mainly provide information about movement, which can of course be caused by both the cardiac activity and the respiratory activity. Various sensor locations are possible, including integration into the safety belt, the steering wheel, and the backrest of the seat (85; 192).
Thermal imaging is a tool for analyzing respiration (or breathing) non-intrusively. Kiashari et al. (100) present a method for the evaluation of driver drowsiness based on thermal imaging of the face. Indeed, temperature changes in the region below the nose and nostrils, caused by inspiration and expiration, can be detected by this imaging modality. The procedure (1) uses a sequence of infrared (IR) images4 to produce a corresponding discrete-time signal of respiration, and (2) extracts respiration information from it. The value of each successive signal sample is the mean of the pixels in a rectangular window of fixed size, representing the respiration region in the corresponding IR image, adjusted frame-to-frame using a tracker. The initial respiration region is determined based on the temporal variations of the first few seconds of the sequence, and the region is tracked from frame-to-frame by using the technique of “spatio-temporal context learning” (249), which is based on a Bayesian framework, and models the statistical correlation between (1) the target (i.e., the tracked region) and (2) its surrounding regions, based on the low-level characteristics of the image (i.e., the intensity and position of each pixel). The extracted information is the respiration rate and the inspiration-to-expiration ratio. A classifier uses these rate and ratio to classify the driver as awake or drowsy. A support vector machine (SVM) classifier and a k-nearest neighbors (KNN) classifier are used, and the first does result in the best performance.
François (55) and François et al. (56) describe a photooculographic (POG) system that illuminates one eye with eye-safe IR light and uses as input a sequence of images of this eye acquired by a monochrome camera that is also sensitive in this IR range, and is head-mounted or dashboard-mounted. A large number of ocular parameters, linked to the movements of the eyelids (including blinks) and eyeball (including saccades), are extracted from each video frame and combined into an LoD value, thus producing an LoD signal. The output was validated using EEG, EOG, EMG, and reaction times. The head-mounted system is available commercially as the Drowsimeter R100.
Also using a camera, Massoz et al. (138) characterize drowsiness by using a multi-timescale system that is both accurate and responsive. The system extracts, via convolutional neural networks (CNNs), features related to eye-closure dynamics at four timescales, i.e., using four time windows of four different lengths. Accuracy is achieved at the longest timescales, whereas responsiveness is achieved at the shortest ones. The system produces, from any 1-min sequence of face images, four binary LoDs with diverse trades-offs between accuracy and responsiveness. Massoz et al. (138) also investigate the combination of these four LoDs into a single LoD, which is more convenient for operational use.
Zin et al. (254) classify driver drowsiness by using a feature extraction method, the PERCLOS parameter, and an SVM classifier.
EDA is measured through electrodes placed on the skin of a person. It can thus be measured through a wearable such as a smartwatch. Concerning the other, relevant, physiological, driver-based indicators, (1) it is challenging to get the pupil diameter in real conditions because of issues with illumination conditions and camera resolution, among others reasons, and (2) it is nearly impossible, as of this writing, to characterize the brain activity in real time and in a non-intrusive, reliable way.
Teyeb et al. (215) measure vigilance based on a video approach calculating eye-closure duration and estimating head posture. Teyeb et al. (216) monitor drowsiness by analyzing, via pressure sensors installed in the driver seat, the changes in pressure distribution resulting from the driver’s body moving about in this seat. The authors claim that the techniques of these two papers can be usefully combined into a multi-parameter system.
Bergasa et al. (21) present a system to characterize drowsiness in real time using images of the driver and extracting from them the six visual parameters of PERCLOS, eye-closure duration, blink frequency, nodding frequency, fixed gaze, and face pose. Using a camera, Baccour et al. (15) and Dreißig et al. (45) monitor driver drowsiness based on eye blinks and head movements.
Vehicle-based indicators can be collected in two main ways. Standard indicators such as speed, acceleration, and steering wheel angle, can be extracted from CAN-bus data (116; 60). The CAN bus enables intra-vehicle communications, linking the car sensors, warning lights, and electronic control units (ECUs). More advanced indicators can be obtained in appropriately-equipped vehicles (27; 60). For example, speed and acceleration can be obtained via an inertial measurement unit (IMU), and following distance via a forward-looking radar.
Since SDLP is considered to be a vehicle-based indicator of driver drowsiness, one can quantify this indicator by examining the lane discipline, i.e., the behavior of the vehicle in its lane. This is traditionally done by using cameras (mounted inside, behind the windshield, typically integrated beside the rear-view mirror) (11) and/or laser sensors (mounted at the front of the vehicle) to track the lane-delimiting lines when present. But one can also use the rumble strips (also called sleeper lines, audible lines, or alert strips) when present. While these are designed to produce an audible, acoustic signal intended to be sensed directly by the driver (as an urgent warning or wake-up call), one could imagine using microphones and/or vibration sensors to transform this acoustic/mechanical signal into an electrical signal that is then analyzed via signal processing.
Bakker et al. (17) describe a video-based system for detecting drowsiness in real time. It uses computer vision and machine learning (ML), and is developed and evaluated on naturalistic-driving data. It has two stages. The first extracts, using data from the last 5 minutes, (1) driver-based indicators (e.g., blink duration, PERCLOS, gaze direction, head pose, facial expressions) using an IR camera looking at the driver face, and (2) vehicle-based indicators (e.g., lane positions, lane departures, lane changes) using an IR camera looking at the scene ahead. This stage mostly uses pre-trained, deep-neural-network (DNN) models. All indicators—also called deep features in DNNs—are inputs to the second stage, which outputs an LoD, either binary (alert or drowsy) or regression-like. This stage uses one KNN classifier, trained and validated using KSS ratings as ground truth for the LoD, and personalized for each driver by weighting more his/her data during training, thereby leading to higher performance during operation.
The above information allows one to fill the relevant cells of Table 5.