In the last two decades, the number of terrorist attacks has increased significantly. Driven by psychological, political, religious, or ideological reasons, these cruel actions were commenced by different terrorists from different races religions and countries (Aldahadha, 2018). Those who appear in images and videos occupying a large space in the media stream, covering every single part of their bodies, without making sounds or with distorted voices, leaving nothing to expose their identity, thinking that there “is a place to hide” in such a digital world (Bradbury, 2005). However, we found in a large number of cases that those terrorists appear holding up the so-called victory signs, which are V-shapes of their two fingers; the index, and the middle finger. Since these V-shapes might be the only visible evidence in such situations, we target them for identifying those persons.
Hand shape biometrics is not new technology, it has been around since the 1970s, driven by its low-cost, user-friendly and reliability the number of its publications has increased exponentially (Dutagac, et al., 2008). The hand shape biometric can be considered as a set of methods and techniques used for identifying persons based on their hand silhouette and geometric features, these include and not limited to fingers’ lengths, widths, ratios, angles, etc. These features; among others are extracted from acquired hand images (typically using a camera or a special scanner); the extracted features are compared to stored templates for person identification (Duta, 2009). Research on hand shape biometrics is attractive for the following reasons:
- User-friendly (non-intrusive), using low-cost cameras such as mobile phones’ cameras (Amayeh, et al., 2006; Kumar, et al., 2006).
- The template size is relatively small compared to other biometrics, and high-resolution images are not necessary needed to get the hand shape features (Sidlauskas, 1994).
- Hand shape biometric has no criminal connotation compared to the fingerprint biometric for instance, and therefore has become more acceptable to the end users, and this makes it easier to collect data for research (Kukula & Elliott, 2006).
- If hand images of high-quality are acquired, more distinctive features such as knuckles, fingers and palm minutiae can be incorporated into the identification system (Dutagac, et al., 2008).
- Time consumed by hand-shape methods is relatively short compared to some other biometrics (Sanchez-Reillo, et al., 2000).
Over the last 30 years, several methods have been proposed in the literature for the hand shape biometric. These include the work of (Sanchez-Reillo, et al., 2000), who extracted several features from hands’ shape such as heights, widths, Deviations, and Angles. Using Gaussian Mixture Their method achieved a 97% accuracy rate when tested on a dataset of 20 persons with 10 hand images for each. (Ma, et al., 2004) showed that B-Spline curves can accurately represent the shape of fingers. (Varchol & Levicky, 2007) employed several geometric features for fingers heights, widths and palm. (Aragonès, 2013) used holistic and geometric features. (Luque-Baena, et al., 2013) used Mutual Information, Genetic algorithm, and Linear Discriminant Analysis (LDA) to achieve high accuracy rates on different hand databases.
A method based on Fourier descriptors and finger area functions were employed by (Kang & Wu, 2014) to achieve high accuracy rates when evaluated on the Bosphorus Hand Database (Bogazici-University, 2015). Other similar works include and not limited to: (Dhole, 2012; Guo, et al., 2012; Basheer & Robinson, 2013; Hassanat, et al., 2015; Amayeh, et al., 2010; Pavešić, et al., 2004) and (Sanchez-Reillo, et al., 2000). Recently, deep features have received a great deal of attention from hand biometric research community, such as the work of (Tarawneh, et al., 2018; Sinha, et al., 2016).
As seen in the literature, hand shape Biometric considering all shapes found in the human hand is a well-researched area. To the best of our knowledge, the identification of a person using one part of the hand; such as the victory sign, has never been investigated in the literature. One exception is the use of hand parts for hand gestures of hand sign language, such as the work of (Fong, et al., 2013). However, such behavioral characteristics of hand gestures, which are designed particularly for hand sign language, will not work well for the identification of terrorists, who are showing only the victory sign, as seen in digital images and videos worldwide.
The goal of this paper is to bridge the gap in this research area by proposing new methods to allow for building a computer system that can identify a terrorist from the shape of their shown fingers, the victory sign in our case. The motivation behind this research is not identifying persons from two fingers, as using two fingers neglects many other discriminative features, knowing that, with more distinctive features we get higher accuracy rates (Dutagac, et al., 2008). Therefore, this research is dedicated for terrorists’ identification, particularly, those who are seen performing the victory sign (v-sign) in digital images, because such a v-sign; might be the only identifying/condemning evidence an investigator can pursue. Moreover, the shape information is also the only feature that can be targeted in such a case, because the hand is normally far away from the camera, and therefore we cannot extract other discriminative fine features of the hand, since these fine minutiae cannot be captured by a normal camera, which is normally used for such images.
Since the number of terrorists’ images that can be downloaded from the media stream is not large, and these images having only one terrorist per each, i.e. each person should provide at least two v-signs, one for training and the other for testing, therefore we opt for creating an in-house victory sign hand image database (VSHI) for the purpose of this study, consisting of the v-signs of 400 different volunteers (male and female). With 10 images per each user, we could train/test our proposed methods. The rest of this paper describes the database used for this study, presents the proposed methods, and presents and discusses the results.
Data and Methods
A mobile phone camera (8 megapixels, 3264 × 2448) was used to capture the right-hand victory sign images of 400 persons (male and female) from different age groups in the range of 14 to 50 years. The images were taken in two different sessions (at least 3 days period was allowed to record the second session) with five images for each session for each person. The total number of images is 4000. All the images of the hand were upright with some rotation allowed around –45 to 45 degrees. The mobile phone camera was upright when most images were taken, and that is why the hands appear horizontal in images. Some images were taken where the camera was in a horizontal position; therefore we had to rotate those images to left so as to ease the image processing/analysis in the next steps.
All hand images were in the foreground against a black background to ease hand segmentation. Having known that the shape information is not significantly affected by image resizing, we scaled down all images in VSHI to be (0.125) of their actual size, this was just to increase the speed of our experiments.
VSHI was designed mainly for the purpose of this study, which is to identify persons from their victory signs, particularly, in the case of terrorists. However, it can be used for other purposes such as: distinguishing between male and female and identifying the person’s age groups. Such purposes are also related to identifying a terrorist; nonetheless, those purposes will not be investigated in this work.
We followed the following rule for naming the image files: the file name starts with (P) followed by the person’s number, then the subject gender (male: M or female: F), which is followed by the age (in years) of the person, followed by the session number (S1 or S2), and finally the image number within each session from 1 to 5. Table 1 depicts the naming system of the image files in VSHI database. This naming system is important to extract information needed by the classifier from the file name.
We made VSHI database publically available for download at the Mutah University website: https://www.mutah.edu.jo/biometrix.
All necessary approvals have been received from Mutah University (the place where we conduct this study) for using human subjects in this study, including signing an informed consent form by each volunteer, who consent for their hand images (the v-sign part) to be used for scientific research and publically available. As can be seen from Table 1 and the figures bellow, all subjects’ images are anonymised, and we refer to each subject by P + a number starting from 1 to 400.
A typical hand shape biometric system consists of a sequence of several major steps, including Image Acquisition, hand Segmentation, Feature Extraction, Training/Testing and storing the Hand Templates for identification. Figure 1 shows these steps. In this work, we opted for such a system to fulfill our main goal.
The Image Acquisition step is already done by creating the VSHI database.
Image segmentation is an important processing step in many images, videos, and computer vision applications. Extensive research has been conducted to find different approaches and algorithms for image segmentation (Hassanat, 2014; Hassanat, et al., 2015, 2016). However, it is still difficult to assess whether one algorithm produces more accurate segmentation than another, whether it be for a particular image or set of images (Zhang, et al., 2008). The most common method for evaluating the effectiveness of a segmentation method is based on subjective evaluation, in which a human visually compares the image segmentation results as there is no ground truth for which we can compare the results objectively.
For the purpose of our work, and because the images were recorded on a black background (different from skin color) we used the Otsu segmentation method (Otsu, 1979), as done in our previous work (Hassanat, et al., 2017). See Figures 2 and 3.
Perhaps the only features that can be targeted (for the goal of our paper) are the geometric features because the other stronger features might not be available in a digital image of a terrorist due to the small size of the sub-image of the hand. Normally, geometric features are simple to be computed, not affected by images’ size and their usefulness has been proved in biometric problems (Liao & Pawlak, 1996).
As can be seen from Figure 3, the shape of the palm (the rest of the hand excluding the victory sign) is not the same for the same subject due to the different behavior of subjects when raising their v-signs, i.e. geometric features within class will be different, and consequently affects the learning process. Therefore, we opt for the shapes of the fingers only.
To segment the two fingers three important points were extracted: 1) Two tips of the fingers (the index and middle fingers), and 2) One valley (the point between fingers).
Algorithm (1) finds the two tip points, and Algorithm (2) finds the valley point between fingers.
Algorithm 1: finds fingers tips
Input: Hand image
Output: two points (x,y), Tip1 and Tip2.
Step 1: Read img1 file from a folder of the VSHI database
Step 2: img2 = segmentation (img1).
Step 3: img3 = Morphology_open (img2)//optional to save time.
Step 4: (assuming the all hands are horizontals in the images), Scan img3 from top to bottom and left to right, until finding the first white pixel (p), Tip1 = p
Step 5: Scan img3 from bottom to top and left to right, until finding the first white pixel (p), Tip2 = p
Step 6: if (Abs (p.x – Tip2.x) < img3.width/10) //this avoids the bent fingers. See figure(x), P130.
Repeat steps 5, 6
The condition in step 6 from Algorithm (1) chooses a threshold difference between the valley point and the tip point, if the difference is small; say tenth of the width of the image, this means a small finger, most likely to be a bent finger, and not the tip of the middle finger, therefore the algorithm keeps scanning up until it finds the proper tip point that we are after.
Algorithm 2: finds the valley point (the point between fingers)
Input: img3, Tip1, Tip2
Output: valley points (x,y), valley.
Step 1: Draw a white line between Tip1 and Tip2
Step 2: startPoint = midpoint between Tip1 and Tip2
Step 3: Array BlackPointsBetweenFingers = FloodFillofBlackPoints(img3, startPoint)
Step 4: valley = Furthest point between startPoint and all BlackPointsBetweenFingers)
We used a queue based flood-fill algorithm to find all the black points between the two fingers, after closing the area with a white line between the two tips, the furthest point in this area is considered to be the valley point, we used this method because of the rotated hand images, as a simpler method such as the left profile and our old methods used in (Hassanat, et al., 2017) failed to identify this point for a large number of images.
After identifying all the important feature points, we identify two important lines; the line between startPoint and the valley point, and its perpendicular line at the valley point, see Figure 4-Left. Based on these lines we used simple line equations to segment the pixels of each finger, Pixels above line SV and left to its perpendicular line belong to finger 1, and those located below the same line and at the left of its perpendicular line belong to finger 2, see Figure 4-Right.
Limited by the information available (just two fingers and only geometric features available), we investigated several methods for extracting distinctive features to be used for identification (Hassanat, et al., 2017). We focus on the shape of the fingers only leaving the shape of the palm and the other bent fingers; this is because there is a variation in bending the other fingers within the same class and therefore resulting in a different signature for the same subject.
After several pilot experiments we opt for two methods to extract geometric features from the two fingers: a) the seven Hu invariant Moments (Hu, 1962) plus the Eccentricity for each finger as described in the previous work (Hassanat, et al., 2017), and b) the histogram of the summation of the distances between the contour points from each finger and its regression line.
Method 1 (shape moments)
The shape moments are used in the literature to describe the shapes of objects because they represent important statistical properties of the segmented object and their efficacy was proved to describe objects (Kang & Wu, 2014) and (Birajdar & Mankar, 2013).
The central (j, k)th moment of a subset of points or pixels (S) is defined by
To make a Moment (ηjk) invariant to both translation and scale, we may divide the corresponding central moment by the (00)th Moment using
We can now calculate a set of seven invariant Moments, the so-called Hu Moments (Hu, 1962). These moments contain information about the geometry of the segmented object, and more importantly, they are invariant to translations, scale changes, and rotations. Hu Moments are calculated as follows
- H1 = η20 + η02
- H2 = (η20 + η02)2 + 4η2 11
- H3 = (η30 + 3η12)2 + 3(η21 η03)2
- H4 = (η30 + η12)2 + 3(η21 η03)2
- H5 = (η30 + 3η12)(η30 + 3η12) * [(η30 + η12)2 (η21 η03)2]
- H6 = (η20 η02)[(η30 + η12)2 (η21 + η30)2 + 4η11(η30 + η12)(η21 + η03)]
- H7 = (3η21 η03)(η30 + η12) * [(η30 + η12)2 3(η21 + η03)2 + (η03 3η12)((η21 + η03)[3 * (η30 + η12)2 (η21 + η03)2]
In addition to these 7 features, we added another important feature for describing shapes, which is Eccentricity. Eccentricity (E) can be calculated using
We calculated the Hu Moments (H1–H7) and (E) for each finger in each image in VSHI, obtaining 16 features to identify each person.
Method 2 (Histogram of summation of distances) (HSD)
This method is based on calculating the distances for each contour point from the regression line of each finger, the equation of the regression line is calculated from the contour points of each finger using simple statistic regression
where n is the number of contour points for each finger, x and y are the coordinates of these points. Thus the regression line equation becomes y = Slope(x) + Yintercept, we need this line equation to calculate the distances of each contour point from this regression line later.
To get the essence of the finger shape, we have to record the inner and outer shape of each finger, therefore, all finger’s points are split into two categories, the upper points, and the lower points based on the regression line equation, the points are sorted based on its distances from the tip point, now the distances between contour points and the regression line can be easily calculated. These distances are then summed up to fill a bin in the histogram, to overcome the scale problem (i.e. the different distances from the camera) we divide the value of each bin by the total summation of the histogram.
For example, if the number of bins = 5, we get 4 histograms, 2 for each finger (upper and lower distances), and therefore we get a feature vector containing 20 features, recording the inner and outer shape for each finger, as shown in Figure 5.
The ith value in the histogram can be calculated by:
where i starts from 0 to number of bins minus 1, n is the number of points of the upper or lower contour of a finger, b is the chosen number of bins; can be any number <=n/2 to get at least 2 values for each bins, ED the Euclidian distance, contour[j] the jth point from the upper or lower contour of a finger, and RegLine is the regression line of a finger. Thus the size of the feature vector = 4b.
Results and discussions
To evaluate the identification process of persons, who were recorded in VSHI we conducted 3 sets of experiments, 1) identification of persons using the invariant Hu moments; 2) identification of persons using the shape of the inner and outer fingers’ contours (HSD); and 3) identification of persons using concatenated features from both methods.
For all experiments in this work, we used the 10-fold cross-validation with different classifiers from the Weka Workbench 3.8 (Frank, et al., 2016) these classifiers include:
- K-Nearest Neighbor (KNN), default parameters with K = 1 and Euclidian distance.
- Naïve Bayes, default parameters.
- Support Vector Machines (SVM), default parameters with the radial basis function.
- Artificial Neural Networks (ANN), default parameters with backpropagation.
- Random Forest, default parameters.
- Linear discriminate analysis, default parameters.
We calculate the precision and recall for each experiment, and to reduce ambiguity we calculate the F1-score, which is the harmonic mean of precision and recall
The results of the first experiments set are presented in Table 2.
|Classifier||Hu on S1||Hu on S2|
As can be seen from Table 2 the Hu method performs well with most of the classifiers, this is due to the statistical features that are invariant to rotation, translation, and scale. Other interesting notes include the significant increase in performance when using the LDA, and the very low performance of the SVM. We may contribute these two remarks to the same reason, which is the large number of classes and the small number of examples per class, as the database contains 400 different subjects (classes), with only five examples per subject for each session.
Before starting with the HSD method, we need to determine which number of bins is the best for our experiments? To answer this question we extracted the features with HSD using different number of bins, and since the LDA classifier performed the best in the first set of experiments we used it to identify subjects from both sessions, the accuracy results are presented in Table 3 and Figure 6.
As can be seen from Table 3 and Figure 6 the highest average accuracies recorded when we used 10 and 12 bins, although the accuracy recorded by 10 bins is slightly less than that of the 12 bins, we opt for 10 bins to reduce (dimensionality) number of features. It can be also noted that the identification results from Session 2 were higher than that of Session 1, this might be due to the behavior of the subjects themselves, as they learned how to raise their v-sign consistently the same in session 2.
The second set of experiments focuses on the HSD feature extraction method. To evaluate its discriminative power, again we used the same classifiers with the same parameters. The identification precision, recall and F1-score are recorded in Table 4.
|Classifier||Hu on S1||Hu on S2|
Again, same remarks on the SVM and LDA can be seen from data in Table 4, perhaps for this particular data, the SVM does not work well with Multiclass classification, particularly when we have a very small number of examples per class (5) versus a large number of classes (400). On the contrary, we find that the LDA performs better on Multiclass classification.
Since the LDA performed the best in the previous sets of experiments we used it for the 3rd set of experiments, which used features extracted by both methods; the HU and the HSD. Since the number of features has increased by this merge we opt for dimensionality reduction using principal component analysis (PCA), in addition to data normalization using min/max normalization. The identification results using the LDA classifier are shown in Table 5.
|Preprocess||Hu + HSD on S1||Hu + HSD on S2||Hu + HSD on S1 & S2|
The significant increase in the identification process shown in Table 5 is expected, because the features used come from 2 different methods the HU and the HSD, the first describes the inner geometry of both fingers using the invariant statistical moments, and the second describes the shape of the fingers’ contour, concatenating such features altogether provides more informative description for the learning process. Moreover, when adding both sessions to form one dataset, this increases the number of examples per class and therefore allows for better learning, and this justifies the significant increase in the performance of the identification as can be seen at the last three columns in Table 5.
The effect of normalizing the data on the identification process is obvious in Table 5, this shows that the LDA classifier works better on normalized data. However, this excellent performance comes with an extra cost, which is the increase in the number of features (16 + 40). The dimensionality reduction using PCA preserving 95%, 97%, 98% and 99% of the data variance, succeeded to reduce the data dimensions of the three datasets to 23, 27, 30, and 35 features respectively. At the same time obtained considerable identification performance. For example with only 35 features the F1 score is reduced by less than 2% when testing the 3rd dataset (Hu + HSD on S1 & S2). But when the PCA preserved less than 99% of the data variance the reduction in the identification performance becomes significant, this shows the importance of the features used for the identification process, since the PCA removes some important information that negatively affects the identification process. Nevertheless, PCA still important for dimensionality reduction, particularly when we have a very large number of image features such as the case of face recognition (Oh, et al., 2013) and (Zhao, et al., 1998). Features selection can be applied also to reduce the dimensionality of the training data, but this is out of the scope of this paper.
The novelty of this work stems from its application rather than its methodology and algorithms, since it proves that we can identify a person from the shape of his/her victory sign only. This is important for the application of identifying terrorists, who show their v-sign without showing any other biometric traits. If we hypothetically assume the following scenario:
‘A terrorist has been seen in a video killing someone, covering his face, and showing only his v-sign. The police department has got some suspects, but the detectives have no evidences to condemn any.’
Using the proposed methods, the detectives can image the suspects’ v-signs and compare them to the one seen in the video, if there is any match they will get some clues to pursue, if no match, they might follow other directions or leave the suspects and take care of other business. Moreover, some well-known terrorists/suspects image their v-sign showing their faces but without killing any, these images can be also used to be compared to the terrorist’s v-sign to identify them.
Conclusion and future work
This work is an extension of a published conference paper about the same subject (Hassanat, et al., 2017), and intends mainly to address the major limitations of our previous work, the main contribution of this work includes:
- Creating a new v-sign database of more subjects, 400 subjects versus 50 subjects in the previous work, this is done mainly to be able to draw a significant conclusion from such a large database.
- Proposing a new method for feature extraction versus simple measurements in the previous work.
- Using more advanced classifiers versus the use of only KNN classifier in the previous work.
Our work proposes a new approach for identifying persons, particularly terrorists, from the geometric features of their victory signs alone, as such as sign might be the only information available to identify persons. For this purpose, we created a new image database for the victory sign and made it available for public, and investigated new and old approaches to identify persons from those images. Since we do not have other information like the face of the terrorist to associate the victory sign with, we assume that police or other security agents already having images and hand images for suspects to be associated with the given v-sign of an unknown terrorist.
The proposed methods can be used for person’s identification in general as well, however, in general person’s identification we have stronger pieces of evidence (biometrics) that we can use, such as the whole hand and fingers, face, fingerprint, etc. while in recognizing a terrorist, sometimes the v-sign might be the only available evidence for the identification process.
The excellent results of the conducted experiments show great potential for identifying persons having weak evidence (the shape of the victory sign alone). The experiments also show some limitations of this work, as this work focuses only on geometric features obtained from the 2 fingers that form the v-sign, this was important in this work to avoid the different behavior of the same subject when raising their v-sign. However there is a lot of unused information there, which need to be investigated and added to perhaps enhance the performance of the overall identification process, this neglected information includes and not limited to 1) shape of the palm and wrist, 2) shape of the bent fingers, 3) the different behavior of bending fingers, and 4) skin color and texture. In addition to investigating the problems related to rotation angles in the three dimensions. These limitations will be addressed in the future work using deep learning methods. In addition to doing some feature selection to reduce the dimensionality of the training data.
The additional files for this article can be found as follows:The segmented binary images of the V-signs
These zipped files contain 4000 binary images of the segmented V-sign of 400 subjects obtained from two sessions, S1 and S2, five images for each subject for each session. DOI: https://doi.org/10.5334/dsj-2018-027.s1The images of the segmented fingers
These zipped files contain 4000 images of the segmented fingers of the V-sign, which belong to 400 subjects obtained from two sessions S1 and S2, we have 5 images for each subject for each session. DOI: https://doi.org/10.5334/dsj-2018-027.s2The extracted contours of the segmented fingers
These zipped files contain 4000 images of the extracted contours of the fingers of the V-sign, which belong to 400 subjects obtained from two sessions S1 and S2, each image shows the upper and lower contour of each finger, we have 5 images for each subject for each session. DOI: https://doi.org/10.5334/dsj-2018-027.s3