This section describes our survey of the literature on DM and DMSs. The subsections below successively describe (1) our strategy for building an initial set of references, (2) some conclusions drawn from these references, (3) the design of a table for organizing them, (4) comments about the content of this table, and (5,6) trends observable in it or in some references. The analysis performed here guides the developments in subsequent sections.
To build an initial set of relevant references, we used an approach inspired from Gutiérrez et al. (70). The block (or flow) diagram of Figure 1 describes it.
Our search focused on surveys, reviews, or similar studies about DM and DMSs. We independently performed two searches during February 2021. The first focused on publications from IEEE, Sensors, and ScienceDirect, and the second on publications from ResearchGate; these four databases appeared well-suited for providing a useful set of initial references. We used the search engine specific to each database and a boolean query equivalent to (“survey” OR “review”) AND (“driver” OR “driving”) AND (“detection” OR “detecting” OR “behavior” OR “state” OR “monitoring”). We limited the search to publications in English, and did not place any constraint on the dates of publication. The two searches yielded 124 and 30 items, respectively. After removing 16 duplicates, we obtained a set of 138 references. We manually screened these, and only kept the ones satisfying the two criteria of (1) being in scientific journals or conference proceedings, and (2) providing a survey, review, or similar study of one or more aspects of the domain of interest. This screening led to 56 references, which appear in the first column of Table 23 , and in the References section, which contains additional references quoted later.
The preliminary analysis of the 56 initial references led to the following high-level conclusions:
These conclusions guided the structuring and writing of the bulk of the paper.
When the context is clear, we use “state” for the global state and each of the five substates. The plural “states” and the phrase “state i” each imply that one is talking about one or more substates.
We used the above conclusions to design the structure of a table—namely Table 2—for organizing the 56 initial references in a useful way, in particular for the later synthesis in this paper.
References | States | Indicators | Sensors |
Tests | ||||||||||
|
Drowsiness |
Mental
workload |
Distraction |
Emotions |
Under the
influence | Driver |
Vehicle |
Environment |
Driver |
Vehicle |
Environment |
| ||
|
|
| Physiological | Behavioral | Subjective |
|
||||||||
Ahir and Gohokar (2) | V |
|
| HR, brain | gaze, blink, PERCLOS, facial, body | wheel, lane, speed | cam, elec | ext cam | real, sim |
|||||
Alluhaibi et al. (6) | V |
| V | ang |
| speech | wheel, lane, brake, speed | cam*, mic* | V* |
|
||||
Arun et al. (13) |
| vis, cog |
| HR, brain, EDA, pupil | gaze, blink, body | V | wheel, lane, brake, speed | cam, wea d, eye t | V | sim |
||||
Balandong et al. (18) | V |
|
| HR, brain | gaze, blink, PERCLOS, body | V | wheel, lane, brake, speed | elec | sim |
|||||
Begum (20) | V |
| V | stress |
| HR, brain | seat, ste w, saf b, wea d | real, sim |
||||||
Chacon-Murguia and Prieto-Resendiz (28) | V |
|
| HR, brain, EDA | gaze, blink, body | wheel, lane, brake, speed | ste w, cam | radar | real |
|||||
Chan
et al. (29) | V |
|
| HR, brain | blink, PERCLOS, facial, body | wheel, brake, speed | cam*, mic* | real |
||||||
Chhabra
et al. (31) | V |
| V | alc | breath | gaze, PERCLOS, facial, body | wheel | road | seat, cam*, mic* | V* | real, sim |
|||
Chowdhury et al. (32) | V |
|
| HR, brain, EDA | blink, PERCLOS | sim |
||||||||
Chung et al. (34) |
| stress |
| HR, breath, brain, EDA, pupil | speech | V | wheel, lane, brake, speed | cam, wea d | V | real, sim |
||||
Coetzer and Hancke (35) | V |
|
| brain | gaze, PERCLOS, facial, body | wheel, lane, speed | cam | V | real, sim |
|||||
Dababneh and El-Gindy (37) | V |
|
| brain, EDA, pupil | blink, PERCLOS, body | wheel, lane, speed | road | cam, wea d | radar | real, sim |
||||
Dahiphale and Rao (38) | V |
| V |
| gaze, blink, facial, body | wheel | cam | real |
||||||
Dong
et al. (44) | V |
| V |
| HR, brain, pupil | gaze, blink, PERCLOS, facial, body | V | wheel, lane, speed | road, wea | cam | V | real |
||
El
Khatib
et al. (51) | V |
| man, vis, cog |
| HR, breath, brain, EDA, pupil | gaze, blink, PERCLOS, facial, body, hands | wheel, lane, speed | cam | V* | ext cam, radar | real, sim |
|||
Ghandour
et al. (65) |
| man, vis, aud, cog | stress |
| HR, breath, brain, EDA | gaze, facial, body, speech | V | wheel, brake, speed | cam, wea d | real, sim |
||||
Hecht
et al. (78) | V | V | V |
| HR, brain, EDA, pupil | gaze, blink, PERCLOS, facial, body | V | elec, eye t | real, sim |
|||||
Kang (95) | V |
| V |
| HR, breath, brain, EDA | gaze, blink, facial, body | wheel, lane, brake, speed | seat, ste w, cam | V | real, sim |
||||
Kaplan
et al. (96) | V |
| V |
| HR, brain | gaze, blink, PERCLOS, facial, body, speech | wheel, lane, brake, speed | ste w, cam*, mic*, wea d | V | real, sim |
||||
Kaye et al. (98) | V |
| stress |
| HR, breath, brain, EDA | V | real, sim |
|||||||
Khan and Lee (99) | V |
| man, vis, aud, cog |
| HR, brain, EDA | gaze, PERCLOS, body | wheel, lane, brake, speed | wea d | real |
|||||
Kumari and Kumar (107) | V |
|
| HR, brain | gaze, blink, PERCLOS, body | V | wheel, lane | cam |
|
|||||
Lal and Craig (108) | V |
|
| HR, brain, EDA | PERCLOS, facial | cam | sim |
|||||||
Laouz et al. (109) | V |
|
| HR, brain, EDA | blink, PERCLOS, facial, body | V | wheel, speed | seat, cam, wea d | ext cam | real |
||||
Leonhardt et al. (114) |
|
| HR, breath | seat, ste w, saf b, cam | real |
|||||||||
Liu et al. (126) | V |
|
| HR, brain, pupil | gaze, blink, PERCLOS, body | wheel, lane, speed | cam | V | real |
|||||
Marquart et al. (132) | V |
| pupil | gaze, blink, PERCLOS | V | eye t | real, sim |
|||||||
Marina Martinez et al. (130) |
| ang |
| brake, speed | V* |
|
||||||||
Mashko (134) | V |
|
| HR, brain, EDA | gaze, blink, body | wheel, lane, brake, speed | cam, wea d | V | ext cam, radar | real, sim |
||||
Mashru and Gandhi (135) | V |
|
| HR, breath | blink, PERCLOS, facial, body | V | wheel, lane | seat, ste w, cam, wea d | sim |
|||||
Melnicuk et al. (141) | V | V | cog | stress, ang |
| HR, brain | blink, PERCLOS, facial | wheel, brake, speed | road, traf, wea | seat, ste w, saf b, cam*, wea d | V* | real |
||
Mittal et al. (146) | V |
|
| HR, brain, pupil | blink, PERCLOS, body | V | wheel, lane, brake, speed | cam, elec | V | ext cam | real |
|||
Murugan et al. (151) | V |
|
| HR, breath, brain, EDA, pupil | blink, PERCLOS, body | V | wheel, lane, speed | cam, elec | V | sim |
||||
Nair et al. (153) | V |
| V | alc | gaze, PERCLOS, facial, body | lane | seat, cam* | V | radar |
|
||||
Němcová et al. (160) | V |
| stress |
| HR, breath, brain, EDA | gaze, blink, PERCLOS, facial, body | wheel, lane, brake, speed | seat, ste w, cam, wea d, eye t | V | real, sim |
||||
Ngxande et al. (156) | V |
|
| blink, PERCLOS, facial, body | cam |
|
||||||||
Oviedo-Trespalacios et al. (164) | V | V |
| gaze | wheel, lane, brake, speed | real, sim |
||||||||
Papantoniou et al. (167) | V | V |
| HR, breath, brain | gaze, blink, speech | V | wheel, lane, speed | cam | ext cam, radar | real, sim |
||||
Pratama et al. (174) | V |
|
| HR, brain, EDA | gaze, blink, PERCLOS, facial, body, hands | V | wheel, lane | cam, wea d, elec | ext cam | real, sim |
||||
Ramzan et al. (176) | V |
|
| HR, breath, brain | blink, PERCLOS, facial, body | wheel, lane, speed | cam, wea d, elec | V | real, sim |
|||||
Sahayadhas et al. (186) | V |
|
| HR, brain, pupil | gaze, blink, PERCLOS, body | V | wheel, lane | seat, ste w, cam, wea d | V | real, sim |
||||
Scott-Parker (195) |
| stress, ang |
| HR, brain, EDA | gaze, facial | V | wheel, lane, brake, speed | traf | eye t | ext cam | real, sim |
|||
Seth (196) | V |
|
| cam | V | real |
||||||||
Shameen et al. (198) | V |
|
| brain | gaze, blink | elec | sim |
|||||||
Sigari et al. (202) | V |
|
| gaze, blink, PERCLOS, facial, body | cam | real |
||||||||
Sikander and Anwar (203) | V |
|
| HR, brain, pupil | gaze, blink, PERCLOS, body | V | wheel, lane | seat, ste w, saf b, cam, wea d, elec | real |
|||||
Singh and Kathuria (205) | V | V | V | V |
| pupil | gaze, blink, PERCLOS, facial | wheel, brake, speed | road, traf | cam, wea d | V | ext cam, radar | real |
|
Subbaiah et al. (213) | V |
|
| HR, brain, pupil | blink, PERCLOS, facial, body | cam | real, sim |
|||||||
Tu et al. (219) | V |
|
| HR, brain | blink, PERCLOS, facial, body | wheel, lane, speed | cam*, wea d, elec | V | real, sim |
|||||
Ukwuoma and Bo (220) | V |
|
| HR, breath, brain | blink, PERCLOS, facial, body | wheel, lane, brake | cam, wea d, elec | real |
||||||
Vilaca et al. (225) | V |
| V |
| brain | gaze, body | wheel, lane, brake, speed | cam, mic | V | ext cam |
|
|||
Vismaya and Saritha (227) |
| V |
| gaze, blink, PERCLOS, body | cam, eye t | real, sim |
||||||||
Wang et al. (229) | V |
|
| brain, pupil | gaze, blink, PERCLOS, body | lane | cam, wea d | real, sim |
||||||
Welch et al. (230) |
| stress, ang |
| HR, breath, brain, EDA | blink, facial, speech | wheel, brake, speed | seat, ste w, cam, wea d | V | real, sim |
|||||
Yusoff et al. (244) |
| vis, cog |
| HR, brain, EDA, pupil | gaze, body | V | lane, speed | eye t |
|
|||||
Zhang et al. (248) | V |
|
| HR, brain | gaze, blink, PERCLOS, body | lane, speed | cam | ext cam | real, sim |
|||||
The 56 references are listed in the first column, labelled “References”, by alphabetical order of first author. The three megacolumns following the first column successively correspond to the three key items above, and are accordingly labelled “States”, “Indicators”, and “Sensors”. The last column, labelled “Tests”, indicates whether the technique or system described in a reference was tested in the laboratory, or in real conditions (“in the wild”), or both.
The “States” megacolumn is divided into 5 columns corresponding to the 5 (sub)states of interest. Each of the “Indicators” and “Sensors” megacolumns is divided into 3 columns corresponding to the 3 previously-listed items that a DMS should ideally monitor, i.e., the driver, vehicle, and environment. The column corresponding to the indicators for the driver is further divided into 3 subcolumns corresponding to the qualifiers “physiological”, “behavioral”, and “subjective”. Some other columns could be further subdivided, such as for “Distraction”, but the table deals with such additional subdivisions in a different way.
We successively describe the three megacolumns of Table 2.
States For each of the 56 papers, we indicate which particular (sub)state(s) it addresses. If a paper addresses drowsiness, we place the checkmark “V” in the corresponding column, and similarly for mental workload. For the three other states, we either use a general “V” or give more specific information, often via an abbreviation. There are four types of distraction, i.e., manual, visual, auditory, and cognitive, respectively abbreviated via man, vis, aud, and cog. These types are self-explanatory, but they are addressed later. For emotions, we indicate the type, i.e., stress or anger (ang). For under the influence, we also indicate the type; in all cases, it turns out to be alcohol (alc).
As an example, the second paper, by Alluhaibi et al. (6), addresses drowsiness, distraction, and the emotion of anger.
All abbreviations used in Table 2, for this and other (mega)columns, are defined in Table 3.
Indicators The indicator(s) used by a paper is (are) indicated in the same way as above.
Sensors The sensor(s) used by a paper is (are) indicated in a similar, but not identical, way. If a sensor is embedded in a mobile device (typically a smartphone), rather than in the vehicle, we add a “*”, leading to cam*/mic* for a camera/microphone of a mobile device. In the vehicle column, “V” indicates that the sensor is integrated in the vehicle, whereas “V*” indicates that it is part of a mobile device. For example, the vehicle speed can be obtained via the controller-area-network (CAN) system/bus or a mobile device.
States | Indicators | Sensors | Tests | ||||
Distraction | Driver | Driver | real | real conditions
| |||
aud | auditory | blink | blink dynamics | cam | camera | sim | simulated conditions
|
cog | cognitive | body | body posture | elec | electrode(s) | ||
man | manual | brain | brain activity | eye t | eye tracker | ||
vis | visual | breath | breathing activity | mic | microphone | ||
Emotions | EDA | electrodermal activity | saf b | safety belt | |||
ang | anger | facial | facial expressions | ste w | steering wheel | ||
Under the influence | hands | hands parameters | Environment | ||||
alc | alcohol | HR | heart rate/activity | ext cam | external camera | ||
pupil | pupil diameter | ||||||
Vehicle | |||||||
brake | braking behavior | ||||||
lane | lane discipline | ||||||
wheel | wheel steering | ||||||
Environment | |||||||
road | road geometry | ||||||
traf | traffic density | ||||||
wea | weather | ||||||
Table 2 reveals the following trends.
Drowsiness is the most covered state (with 44 references among the total of 56), distraction is the second most covered (with 20 references), and more than one (sub)state is considered in only 19 references.
Indicators are widely used in most references, in various numbers and combinations. Subjective indicators are not frequent (which is to be expected given the constraints of real-time operation). While several authors, such as Dong et al. (44) and Sahayadhas et al. (186), emphasize the importance of the environment and of its various characteristics (e.g., road type, weather conditions, and traffic density), few references (and, specifically, only 6) take it into account.
While the three “Sensors” columns seem well filled, several references either neglect to talk about the sensor(s) they use, or cover them in an incomplete way. Some references give a list of indicators, but do not say which sensor(s) to use to get access to them. References simply saying that, e.g., drowsiness can be measured via a camera or an eye tracker do not help the reader. Indeed, these devices can be head- or dashboard-mounted, and they can provide access to a variety of indicators such as blink dynamics, PERCLOS, and gaze parameters.
Many systems are tested in real conditions, perhaps after initial development and validation in a simulator. Many papers do not, however, systematically document the test conditions for each method that they describe.
Other trends are not directly observable in Table 2, but can be identified in some individual references.
Experts agree that there does not exist any globally-accepted definition for each of the first four states that we decided to consider. For example, even though many authors try to give a proper definition for drowsiness, there remains a lot of confusion and inconsistencies about the concepts of drowsiness and fatigue, and the difference between them. There is thus a need to define, as precisely as possible, what the first four states are, and this is done in the sequel.
In the more recent references, one sees a trend, growing with time, in the use of mobile devices, and in particular of smartphones (6; 29; 31; 51; 95; 96; 130; 141; 153; 219). A smartphone is relatively low-cost, and one can easily link it to a DMS. This DMS can then use the data provided by the smartphone’s many sensors, such as its inertial devices, microphones, cameras, and navigation system(s). A smartphone can also receive data from wearable sensors (e.g., from a smartwatch), which can provide information such as heart rate (HR), skin temperature, and electrodermal activity (EDA). A smartphone can also be used for its processing unit.