Smart Interrogation System by Detection of Visual Focus of Attention

The paper presents an approach to detect and control the focus of attention of the suspect using his/her eye gaze and head movement direction to build up an automatic interrogation systemspecially to detect lies. To this point, we classified interrogation conversation into different criteria and identified the fatal ones. At first, we conducted psychological experiments on the sampled population to detect the different parameters connected with various symptoms when the suspect tells lies and build our knowledgebase with the results. This knowledgebase helps the system to make strategic decisions and to optimize accuracy. A monitoring camera captures continuous interrogation and feeds the frames to our proposed system. 3D head tracker is used to track the head from image and Active Shape Model (ASM) is utilized to localize face points. Vector Field of Image Gradient (VFIG) is calculated to track the eyeball and its rotation within the eye area. Random eye and head movement, change of eyebrow at the critical level of questionnaire provide us the possibility of detecting lies. Finally, experiments are conducted in a controlled environment to validate our psychological findings.


Introduction
Head and eye gaze behavior is a very interesting research topic for many years in psychology and in the computer vision community. Especially in the field of suspects" visual focus of attention (VOFA) [1,2] detection, this study has become a part of active research in order to detect lies [3]. However, there are very little works by which suspect"s visual attention may be controlled accurately through human machine interface (HMI) and Human Posture Recognition [4]. Generally, humans have very poor ability in detecting deceit and hostile intent, with accuracy rate of 40-60 % [5]. However, when one is being deceitful, she/he is making up something in the brain. This results in increased brain activity and rapid physiological responses which can be measured on the face (including facial blood flow pattern). Humans do not possess the ability to control these physiological responses to emotions. Stress causes abrupt changes in local skin temperature and distinctive facial signatures which provides the main key to lie detection.
Some previous researches on lie detection systems used Polygraph [6][7][8]. It was developed in 1921 which measures and records several physiological indices such as blood pressure, pulse, respiration and skin conductivity and tries to find correlation between these measurements. However, it is highly invasive, very slow (requires several experts) and cannot be used in covert operations (where the suspect is unaware about the experiments.) At present there are several techniques in vogue [9][10][11][12][13][14]. But most of them use body sensors or high cost thermal cameras. Their detection accuracy is also low.
Humans have a well-defined "rigid" skull structure and facial muscle structure -this means there are finite numbers of facial expressions a human can perform. These are called Facial Action Units (FACS) and there are 46 such Action Units. To this point, our proposed technique finds out the best parameters of lie detection from visual domain, such as-Random Eye Movement, changes in the orientation of head and shape of eyebrow changes etc. Based on this, we shall create our knowledgebase to store how people reacted in different stimuli. This knowledgebase will help us to detect the phase of questionnaire (performed while interrogation) and its corresponding possibility of lie. A low-cost USB camera captures continuous video of the suspect while interrogation and feeds it to our proposed system. A multistage computer vision approach has been utilized to track the most critical parameters that show the symptoms of lie. Our proposed technique is cost effective and unbiased. It also provides proper video evidence of interrogation. The proposed system can be a helpful tool to the existing interrogation system.

Lie Detection
Human characteristics are very popular in nature. It is very tough to detect the insights of a human, especially when he is telling lies. For the case of an interrogation environment, the problem becomes tougher because the suspect keeps himself ready for some common question. In our approach, we classify the answers of the victim in four major criteria as follows-Criteria (1, 1): The suspect has done it and also admits it. It would be considered as "True." For example, if the suspect has stolen and told that "Yes, I have done it." Criteria (1, 0): The suspect has done it but denying it. It would be considered as "False." For example, if the suspect has stolen and told that "No, I have not done it." Criteria (0, 1): The suspect has not done it but pretending that he has done it. It would be considered as "False." For example, if the suspect does not have a red shirt but if he says "Yes, I have." Criteria (0, 0): The suspect has not done it but is denying it. It would be considered as "True." For example, if the suspect does not have a red shirt and if he says "No, I don"t have." In our approach Criteria (1, 1) and Criteria (0, 0) are considered TRUE and hence, they are to be bypassed by the system. But Criteria (1, 0) and Criteria (0, 1) are considered as FALSE and must be tracked by the system. Now when the interrogation begins, even the weakest liar can overcome our test without showing any external symptoms because at first s/he is ready to fight with some of the ready answers. However, interrogation means a chain of questions and to support one lie the suspect must tell another lie. With the course of time it becomes tougher for the suspect to answer with his/her ready wit. So, weak liars cannot sustain with the "Chain of Lie." and shows some external symptoms. And the symptoms are very obvious for the criteria (0, 1). The reason behind this fact is that, if anyone claims to do something that s/he has not done, he needs to imagine that situation at first before answering about it. While imagination there is a change in body language showing symptoms of a possible lie. Our main goal is to track those symptoms to detect a lie. To this point we have classified the symptoms in three major criteria-Symptom Type-1: The suspect provides a very random movement of his eye which is a clear indication of his nervousness. S/he also moves his eyeball to the upper right corner of his eye area as an indication that s/he is thinking of something.

Symptom Type-2:
The suspect provides a head movement. In most cases for the weak liar, it is tough for him/her to continue the conversation with direct eye contact with the interrogator. Too much loss of direct eye contact is a clear indication of telling a lie. To avoid the direct eye contact, the suspect moves his/her head downward. Sometimes s/he also makes a frequent head movement.

Symptom Type-3:
The suspect changes the shape of his eyebrow as an indication of his worry. Now if we can detect Symptom Type-1, 2 or 3, we can say that there is a possibility that the suspect is telling a lie. To confirm the possibility, we need to compare it with our knowledgebase. The knowledgebase is the storehouse of symptoms provided by different suspects at different levels of questionnaire obtained from our manual interrogation. Each of the questions in the questionnaire is marked as Phase 1, 2,…, N etc. The symptoms offered by the suspects are also age variant. The teenagers are generally impatient and cannot remain calm and hence provide useful and clear as well as useless symptoms while interrogations. But the aged groups provide very low levels of symptoms which are very hard to track. So to build our knowledgebase to store how people react while telling a lie, we need to consider our samples from different age groups. The statistics from the samples will help us to get the threshold value to confirm a lie. Since, human emotion is the prime parameter of our lie detection strategy; we classify our crime pattern as follows-Crime Type-1: Emotion is connected with the crime. In this situation there is no economic profit. Any type of revenge or avenge (for example killing for extra marital relationship) can be considered as Type -1 crime. Here the suspects show more symptoms of telling a lie.
Crime Type-2: The crime is done only for economic profit, there is no emotional issue. Any type of fraud activities such as stealing, cheating, abduction, mugging etc. can be considered as crime Type 2. Here the suspects show less symptoms. A questionnaire for this type of crime (for stealing in this case) may be as follows: Now we see that the suspect provides external symptoms at different levels of questions. The changed focus of attention offered by the suspect is our key to lie detection. The overall system is depicted in Fig. 1.

Method overview
At first a camera or sensor is needed to detect the VFOA of the suspect. Then the captured video frames are fed one by one to an intelligent system. Thereafter, each video frame is converted into grey scale for easier processing and analyzed each video frame pixel by pixel. These analyzed data is used to detect the loss of attention if the suspect changes his/her VFOA to another direction. How the head direction of the suspect is changed while telling a lie is observed from our psychological tests. The minimum deviation of the head orientation of the suspect can be considered as a clue to detect lie because suspects change the tilt/pan angle of their head while telling a lie. A step by step approach to detect the head movement and Random Eye Movement (REM) as discussed below. Last day was not a vacation. What were they doing with you in school time?
They had a fever and did not go to school.

HIGH 7
Both of them had a fever? May be HIGH Fig. 1. Programming flow chart.

Head pose detection
The main goal of head pose detection is to track the head from the continuous image whether the head is in movement or not. In our present work, we have used the Seeing Machine"s faceAPI [15] to detect and track the head pose, h p of the target person. To detect the head, we have used a haar cascade classifier as a 3-D head tracker [16]. We draw a rectangle outside of the head. For Symptom Type-2, the suspect moves the head within this head rectangle. The orientation of the middle of the face line within this face rectangle clearly defines the head movement at a particular direction as proposed in [17]. If the suspect turns his face downwards without moving his head then the change of the orientation of the nose point with respect to the face rectangle is considered as an alternative to measure the Symptom Type-2.

Face points extraction by active shape model
Our modelling method works by examining the statistics of the coordinates of the labeled points in the head rectangle. In order to be able to compare equivalent points from different shapes, they must be aligned with respect to a set of axes. We achieve the required alignment by scaling, rotating and translating the shape so that they correspond as closely as possible. In our presented approach, it is desirable to minimize a weighted sum of squares of distances between equivalent points on different shapes. Finally, the facial feature points are extracted from the active shape model [18].

Iris center detection
The facial feature points are utilized to detect the eye regions roughly from the face. The VFIG is used to detect the iris center. The VFIG iris center detection technique is described as follows: Let I c be the possible iris center and I gi be the gradient vector in position I xi . If I di is the normalized displacement vector, then it should have some absolute orientation as the gradient I gi. We can determine the optical center I c * of the iris (darkest position of the eye) by computing the dot products of I di and I gi and finding the global maximum of the dot product over the eye image: Where, P = (I di T I gi ) 2 I di = (I xi -I c ) / (|| I xi -I c || 2 ) i = 1, 2,..., N and the displacement vector I di is scaled to unit length in order to obtain an equal weight for all pixel positions in the image.
We create an eye rectangle around our eye. The fluctuation of the coordinate of the eyeball within this eye rectangle provides us a clue transient attention detection as proposed by Debnath [17] and thereby, also provides a clue to Symptom Type-1.

Eyebrow Movement Detection
The American psychologist Gibson gave the concept of optical flow in 1940 [19]. To materialize the visual stimulus provided to animals, he considered the pattern of apparent motion in a visual scene. The considered motion is caused by the relative speed between the observer and the scene. To estimate the optical flow of the eyebrow we need the sequences of the ordered images. This approach tries to evaluate the motion between the two consecutive images taken at time t and t + Δt. In our approach, we have used the Lucas-Kanade method [20]. For a pixel under consideration, it considers the optical flow constant in its neighborhoods. Using the least square criterion, it solves all the optical flow equations for all of the neighborhood pixels. To distinguish among different image elements, we have used The Shi and Tomasi corner detection algorithm [21]. In our proposed work, the eye area has been considered the Region of Interest (ROI), where the change of the shape of the eyebrow is detected.

Field of view of the suspect
The field of view (FOV) of the suspect is divided into three regions: • Central Field of View (CFV): This FOV exists at the center of the human FOV. This zone is set to a 30° cone shaped area (75° to 105°).

• Near Peripheral Field of View (NPFV):
It is defined as the 45° fan shaped area on the both sides of CFV zones. At the right side of CFV (30° to 75°) it is defined as the right near peripheral field of view (RNPFV) and at the left right side of CFV (105° to 150°) it is defined as the left near peripheral field of view (LNPFV).

Data collection for knowledgebase creation
People of different ages show different levels of body language while telling lies. To validate it, we collected data with different age groups. There were a total of 20 participants. We divided them according to their age into four consecutive groups as follows: Group-1 (10 to 14 years old): Four members with the age 11, 11, 12 and 13 years, respectively. Group-2(15 to 19 years old): Four members with the age 16, 16, 18 and 19 years, respectively. Group-3(20 to 24 years old): Four members with the age 23, 24, 24 and 24 years, respectively. Group-4(25 to 29 years old): Four members with the age 25, 26, 27 and 28 years, respectively.
They were interrogated with the same questionnaire (priorly unknown to them) and instructed them to tell lies instantly. The resulting symptoms were tracked very carefully. The summarized results of age variant symptom analysis are given in Table 2. They were interrogated with the same questionnaire and the external symptoms (rotation of head, eye, change of eyebrow) were monitored very carefully. We collected as follows. To trace the relationship between Crime Type-1 and 2 with emotion, we conducted two separate experiments in the controlled environment. We set the threshold value of sustained symptoms to be greater than 3 sec. Less than 3 sec will be considered as transient symptoms. The summary of the collected data is given in Table 4.

Data collection for symptom detection
To validate the performance of the system in controlled environment, we went through several experiments. At first to detect the eyeball rotation within the eye area, the participants were asked to look within the LNPFV and RNPFV with keeping their head in stable state, that means in CFV with varying duration. The illumination of the room was 200 Lux and the distance from the camera was 0.5 m. After that, to track the head rotation, they were asked to rotate their head with varying rotation time i.e. 1, 2 and 3 sec respectively. They were also asked to move their eyebrow (shrink or expand it) so that it can be tracked by the optical flow feature. The average video duration is 3 min with average 12 frames per sec.

Performance Evaluation
In our lie detection technique, we detected Symptom Type-1 (st1) based on random fluctuation of eye movement. The Accuracy of detection while interrogation is expressed as- Based on equation (1), we get the Fig. 4 which shows the eye rotation detection accuracy for varying eye rotation time. Similarly, Symptom Type-2 is tracked based on random head movement or change in the orientation of head. The accuracy is expressed as: Based on equation (2), we get the Fig. 5 which shows the head rotation detection accuracy for varying head rotation time. Symptom Type-3 is tracked with the change in the shape of the eyebrow by implementing Optical Flow Feature in the ROI. The detection accuracy is expressed as: The accuracy of Symptom Type-3 has been illustrated in Fig. 6.

Conclusion
We aimed at detecting lies to build up an automatic interrogation system by tracking the visual focus of attention of the suspect. At first, we created a knowledgebase gleaned from manual experiments to define different parameters of suspicious behavior during telling lies and to optimize the accuracy. From the experimental results it can be concluded that the accuracy of Symptom Types-1 and -2 is dependent on transient duration. And the accuracy increases linearly with the duration. However, Symptom Type-3 detection accuracy depends on the gender of the suspect. The reason is that the width of the female suspect"s eyebrow is less than that of the male suspect. So, it becomes harder to track the change in shape of eyebrows using optical flow features. We conducted our experiments in a controlled environment which may not match with the results of real-life environments. We leave this for our future research.