Most existing gesture recognition algorithms have low recognition rates under rotation, translation, and scaling of hand images as well as different hand types. We propose a new hand gesture recognition algorithm that combines the hand-type adaptive algorithm and effective-area ratio based on feature matching. Samples are divided into several groups according to the subjects’ palm shapes and the algorithm is trained using self-collected data. The user’s hand type is paired with one of the sample libraries by the hand-type adaptive algorithm. To further improve the accuracy, the effective-area ratio of the gesture is calculated based on the minimum bounding rectangle, and the preliminary gesture is recognized by the effective-area ratio feature method. The results of experiments demonstrate that the proposed algorithm could accurately recognize gestures in real time and exhibits good adaptability to different hand types. The overall recognition rate is over 94%. The recognition rate still exceeds 93% when hand gesture images are rotated, translated, or scaled. |
1.IntroductionHuman–computer interaction is currently realized mainly through a mouse, keyboard, remote control, and touch screen. However, actual interpersonal communication is primarily performed through a more natural and intuitive noncontact manner, such as sound and physical movements, which are considered to be flexible and efficient. Researchers have been attempting to develop machines that recognize human intentions through noncontact communication modes as humans do, such as by sound,^{1} facial expressions,^{2} body language,^{3} and gestures.^{4}^{,}^{5} Among these modes, hand gestures^{6} are an important part of human language, and hence, the development of hand gesture recognition affects the nature and flexibility of human–computer interaction.^{7}^{–}^{11} In the past few decades, the gesture recognition method is typically based on the angle and position information obtained through data gloves.^{12} However, this method is expensive and uses wearable sensors, making the method inconvenient. Hand gesture data are also collected using optics camera^{13}^{–}^{15} or radar.^{16}^{–}^{19} Optical-based gesture recognition mainly uses cameras to capture gesture images and then applies machine-learning methods^{20}^{,}^{21} for feature extraction and recognition. Coelho et al.^{14} used Kinect to capture RGB and depth images of hand gestures. By contrast, machine-vision-based noncontact recognition methods are currently popular as they have the advantages of low cost, convenience, and comfort for the human body. In this paper, we propose a hand-adaptive algorithm that can significantly reduce the impact of hand type in gesture recognition. It has advantages when the number of recognitions is small. Compared with the current popular artificial intelligence and deep learning algorithms, this algorithm has lower hardware requirements and is more suitable for embedded edge computing, which is also very popular at present. The contributions of this paper are listed as follows:
2.Relevant WorkExtraction of gesture features is one of the most important aspects of gesture recognition. In general, the accuracy and range of gesture recognition depend on the amount of gesture feature information extracted. Many algorithms have been proposed to recognize hand gestures from images. There are also many simple image processing algorithms such as the proposed algorithm. They also tried to find a simple and practical algorithm such as convex hull.^{22}^{,}^{23} Woun Bo Shen and Tan Guat Yew used a convex hull in the feature extraction stage. This algorithm is simple and efficient, but the number of gestures recognized is small. In addition, based on Kinect’s method of detecting the angle of a hull,^{24} and this algorithm also has the disadvantage of a low number of recognition gestures. The hardware equipment required for the above algorithms is expensive and therefore not conducive to popularization. Furthermore, these algorithms are not sufficiently intuitive to represent the hand gestures formed by different hand types. Other algorithms consider fingers as the features and they are detected on the basis of ridge detection,^{25}^{–}^{30} a circle drawn on the hand centroid,^{31}^{,}^{32} or convex decomposition.^{33} However, the method in Ref. 34 is time-consuming, while the others^{28}^{–}^{32} are not effective to handle fingers with distortion. Subsequent classification algorithms^{28}^{,}^{32} are learning-based, which require many training images for each class. Moreover, algorithms that use rule classifiers and zero training images^{29}^{–}^{31} lack adaptability for certain gestures with distortion and varying postures. Therefore, a balance should be maintained between convenience and robustness. Zhang et al.^{34} proposed a recognition algorithm based on Hu moment invariants for rotating images. The paper improves the algorithm by changing the characteristic value of Hu algorithm and calculating the similarity between the image to be recognized and the template image. However, this method is not intuitive enough and highly accurate and real-time detection performance cannot be guaranteed using the Hu moment feature alone. Dardas and Georganas^{35} performed scale-invariant feature transformation (SIFT) and vectorization feature extraction on images then used feature packets and multiclass support vector machines to recognize gestures. The SIFT algorithm has a higher recognition rate,^{36} but the computational complexity of the algorithm is higher, so the recognition speed is lower and the real-time performance is poor. To recognize and classify signatures of hand gestures, numerous techniques have been applied such as machine learning,^{37}^{–}^{40} principal component analysis,^{41}^{,}^{42} and differentiate/cross-multiply algorithms.^{43}^{,}^{44} Conventional supervised machine learning extracts and classifies gestures using predefined characteristic parameters (features).^{45}^{,}^{46} However, the optimal features are unknown in many cases, resulting in a significant variation of the performance of the classifier depending on the selected features. Some deep learning algorithms are large in scale and require high hardware performance^{47}^{–}^{49} and require a large number of training samples. Some deep networks require both training and GPU support for online deployment, which pose high hardware demand and thus is not conducive to small, embedded artificial intelligence systems.^{50}^{–}^{52} The above algorithms did not design and select features to reduce the amount of computation. They also failed to solve the problem that complex gesture recognition algorithms cannot be used or lack real-time performance when applied to embedded artificial intelligence systems with limited hardware resources. 3.Proposed AlgorithmIn this study, gesture recognition is divided into two parts: (1) establishment of a sample library by the process shown in Fig. 1. Three sample libraries are built according to the hand-type classification. (2) Gesture recognition by the process shown in Fig. 2. The hand-adaptive algorithm matches the suitable sample library for the user’s hand type, thus reducing the interference of hand type on gesture recognition accuracy. The number of samples that are finally identified is reduced by preliminary recognition, thereby allowing a fast recognition process. 3.1.Building Libraries for Hand-Type AdaptationFirst, subjects are selected and their palms are measured. Next, the subjects are divided into three groups according to the palm size of the palm, namely, slim, normal, and broad. Then, the gesture features of the three groups of subjects are calculated separately to establish the sample library. 3.1.1.Selection of subjectsIn this study, 40 subjects were selected after obtaining informed consent from them: 27 young people (13 females and 14 males) aged 15 to 35, 8 middle-aged people (4 females and 4 males), aged 36 to 55, and 5 elderly (3 males and 2 females) aged 56 to 70. The collected samples are presented in Fig. 3. 3.1.2.Obtaining hand-type dataFirst, the maximum length of the palm, ${L}_{1}$, is measured from the longest fingertip to the root of the palm, which is the first distinct line between the palm and the wrist, close to the palm. Then, the maximum length of the finger, ${L}_{2}$, is measured from the longest fingertip to its finger root. Finally, the palm width ${L}_{3}$ is measured. The measurement diagram is shown in Fig. 4. To reduce the error, an average of three measurements is taken. The ${L}_{1}$, ${L}_{2}$, and ${L}_{3}$ measurements provide a peripheral contour convex hull of the entire hand. In image processing, a convex hull can be considered a convex set that surrounds the outermost layer of the image. The measurement of the peripheral contour convex hull of the hand is shown in Fig. 5. The convex hull defect and its starting point are determined. The relative positions of the palm and fingers are determined and the center point and contour of the palm are calibrated. The center point and the radius of the palm are used to obtain the coordinates of the lowest point of the palm contour, following which the image of the wrist part below the lowest point is eliminated. The ordinates of the middle finger fingertip ${A}_{1}$, the palm contour lowest point ${A}_{2}$, the middle finger convex hull defect ${A}_{3}$, and the palm center point A0 are obtained. $L1$, $L2$, and $L3$ are calculated as follows: 3.1.3.Classification of hand typeOn the basis of a large number of sample statistics, the ${L}_{2}$ to ${L}_{1}$ ratio is multiplied by 0.3 and the ${L}_{3}$ to ${L}_{1}$ ratio is multiplied by 0.7 to obtain different hand types that can more accurately reflect the human hand types. The multiplication factors are empirical values. The subjects are divided into the following three groups by weighting calculation using Eq. (2):^{18} slim, normal, or broad. Table 1 lists the measurement and grouping of the selected 40 subjects: Table 1Hand parameters and grouping.
3.1.4.Building three sample librariesA total of 360 images were used to build the sample library, and there was no angle change. The algorithm designed in this paper extracts nine dimensional features, including area-perimeter ratio $C$, effective-area ratio $E$, and the seventh-order Hu moment Hu1, Hu2, …, Hu7. Calculate the feature values for the nine hand gestures in each group and take the midvalues as feature vector $O$ {C, E, Hu1, Hu2, …, Hu7}. The feature vectors of the slim group, normal group, and broad group are denoted by ${O}_{S}$, ${O}_{N}$, and ${O}_{C}$, respectively. The mid-value is obtained according to Eq. (3), where $X$ denotes the mid-value, $x$ denotes the gesture sequence number, and $n$ denotes the total number subjects in the group: 3.2.Gesture RecognitionIn this study, the gesture image of only the palm and no other part, such as the arm, is processed. The gesture image is median filtered in the image preprocessing stage, and then the image is converted from the RGB color space to the YCbCr color space, because the skin color has good clustering in the YCbCr color space, which enables segmentation of the gesture by the threshold. The result is presented in Fig. 6, and its distribution satisfies Eq. (4). Then, the morphological operation is performed on the segmented image to ensure regularization of the gesture pixels to meet the accuracy requirements of the subsequent operations. Through this series of operations, the feasibility of gesture recognition is ensured: Eq. (4)$$50\le \mathrm{Y}\le \mathrm{255,}\phantom{\rule[-0.0ex]{1em}{0.0ex}}87\le \mathrm{Cb}\le \mathrm{142,}\phantom{\rule[-0.0ex]{1em}{0.0ex}}132\le \mathrm{Cr}\le 151.$$3.2.1.Feature extractionIn previous studies,^{28}^{–}^{30} the Euclidean distance between a pixel and its nearest boundary in linear time was used to extract gesture features. Here, we propose other features as follows. Area–perimeter ratioThe area–perimeter ratio $C=\frac{S}{L}$ is not sensitive to the scaling and rotation of gestures, and it can discriminate between hand types well. Perimeter $L$ is calculated as follows: Eq. (5)$$L=\sum \sum f(x,y),f(x,y)=\{\begin{array}{cc}1,& (x,y)\in V\\ 0,& (x,y)\notin V\end{array}.$$Area $S$ is calculated as follows: Eq. (6)$$S=\sum \sum q(x,y),q(x,y)=\{\begin{array}{cc}1,& (x,y)\in R\\ 0,& (x,y)\notin R\end{array}.$$In Eqs. (5) and (6), $V$ represents the pixel area of the gesture edge, indicated in blue in Fig. 7. $R$ represents the gesture pixel area, indicated in white in Fig. 7. Hence, the first important parameter of this study, the area–perimeter ratio, is obtained. Noise and light factors adversely affect gesture segmentation, producing burrs at the edge of the gesture. However, this effect is negligible. Effective–area ratioUnlike Erdem Yavuz^{26} and Jiajun Zhang,^{27} we use the effective-area ratio of the gesture as a feature. The effective-area ratio of the gesture is defined as the ratio of the gesture area to the area of the minimum bounding rectangle (MBR, which is the rectangle that can contain the entire gesture): In the above equation, $E$ represents the effective-area ratio at acceptable noise levels. ${S}_{\text{hand}}$ is the sum of white pixels in the entire gesture area, and ${S}_{\mathrm{MBR}}$ is the sum of all the pixels in the MBR in the binary image, as shown in Fig. 7. The reason for introducing the concept of the effective-area ratio is to achieve high accuracy and control the noise within an acceptable range.The Hu invariant moment algorithm is effective for image recognition. It describes the picture from the overall feature, and the seventh-order Hu invariant moments remain unchanged for the rotation, translation, and scale transformation of the image. Therefore, the Hu invariant moment algorithm extracts mathematical features that are constant for both image rotation and scaling. This method has the advantages of good stability and accurate recognition in the gesture recognition process and is suitable for discriminating gestures with small variation. The $(p+q)$-order geometric distance of a digital image $f(x,y)$ is defined as follows: where $p,q=\mathrm{0,1},2$.The geometric center distance is The centroid is $(\overline{x},\overline{y})$ Eq. (10)$$\overline{x}=\frac{{m}_{10}}{{m}_{00}},\phantom{\rule[-0.0ex]{1em}{0.0ex}}\overline{y}=\frac{{m}_{01}}{{m}_{00}}.$$In Eq. (10), ${m}_{10}$, ${m}_{01}$ are the first-order geometric moments of the image, and ${m}_{00}$ is the zeroth-order geometric distance of the image. For binary images, the geometric center of the image is point $(\overline{x},\overline{y})$. ${m}_{pq}$ changes with the change of the image. Although ${\mu}_{pq}$ has translation invariance, it is sensitive when the image is rotated. Therefore, if the feature is represented directly by the geometric center distance and the normal moment, they cannot make feature parameters have both translation invariance and scaling and rotation invariance. The center moment can be normalized, which is invariant to image rotation, translation, and scaling. The normalized center moment is where $p,q=\mathrm{0,1},2\dots ,r=\frac{p+q+2}{2}$.The seventh-order invariant moments are defined by the normalized second- and third-order center moments, which are invariant to the transformation, rotation, and scaling of the target. The calculation of the invariant moment of a binary image or gray image is quite complex, which limits its use. To achieve faster invariant moment calculation, in this study, Hu moment extraction on the contour of the gesture is performed. 3.2.2.Hand-type adaptationThe classifier used in the following gesture recognition process is mainly template matching. Template matching is mainly realized through the calculation of distance. The hand-type adaptive algorithm is implemented using the area–perimeter ratio of the gesture. To achieve hand-type adaptation, the user needs to input 1-9 gestures in sequence. Then, the area–perimeter ratios $c$ of the nine gestures are calculated, which are then used to construct the area–perimeter ratio vector $\mathbf{C}=({c}_{1},\cdots ,{c}_{9})$. This algorithm calculates the Euclidean distance between $C$ and ${O}_{S}$, ${O}_{N}$, and ${O}_{C}$, and selects the sample library with the smallest Euclidean distance as the paired sample library. The sample library ${O}_{S}$, ${O}_{N}$, or ${O}_{C}$ contains nine vectors, and $\mathbf{C}$ is compared with the first element of the nine vectors. Because the first element of the vector is also the area–perimeter ratio, this distance is used to evaluate which sample library is the paired sample library. The Euclidean distance is calculated as follows: Eq. (12)$$D(\mathbf{C},{O}_{S})=\sqrt{\begin{array}{ccc}{({c}_{1}-{o}_{11})}^{2}& +\cdots +& {({c}_{9}-{o}_{91})}^{2}\end{array}},\phantom{\rule[-0.0ex]{1em}{0.0ex}}c\in \mathbf{C},\text{\hspace{0.17em}\hspace{0.17em}}o\in \mathbf{O},\text{\hspace{0.17em}\hspace{0.17em}}\mathbf{O}\in {O}_{S}.$$3.2.3.Gesture preliminary recognitionThe main purpose of this step is to reduce the amount of calculation for final recognition, especially when the number of recognized samples is very large. Candidate samples can be quickly determined by the effective-area ratio, thereby greatly reducing the amount of calculation required in the template matching process based on Hu invariant moments and improving the speed of gesture recognition. Through experience, the effective-area ratio can be easily calculated, so gestures can be recognized quickly with a high recognition rate. Preliminary recognition of gestures: according to Eq. (12), the Euclidean distance of the $E$ between the current gesture and each gesture in the sample library is calculated. Nine distance values based on $E$ were obtained. The nine Euclidean distances are sorted from small to large $\{{H}_{E1},\cdots ,{H}_{E9}\}$, and the first three are taken as candidate samples. 3.2.4.Gesture final recognitionThe final gesture recognition step is as follows: the feature used in this step is the seventh-order Hu moment. The seventh-dimensional feature value in the sample database can be regarded as a middle point. Now the algorithm only needs to operate on the three candidate samples that have been selected. The Euclidean distance ${H}_{G{V}_{\mathrm{z}}}$ of the seventh-order Hu moment between the gesture to be recognized and the candidate sample. ${H}_{GV}$ is calculated by Eq. (12). ${H}_{G{V}_{\mathrm{z}}}$ is calculated using Eq. (13), and gesture ${V}_{z}$ in the three candidate samples is the final recognition result: Eq. (13)$${H}_{G{V}_{z}}=\mathrm{min}\{{H}_{G{V}_{1}},{H}_{G{V}_{2}},{H}_{G{V}_{3}}\},\phantom{\rule[-0.0ex]{1em}{0.0ex}}z=1,\cdots ,3.$$Figure 8 shows a screenshot of the process of the algorithm recognizing gestures in a video stream. 4.Experiment and ResultsIn this study, nine gestures commonly used in life were selected for recognition experiments, as shown in Fig. 9. In this experiment, 40 subjects were reselected according to the above selection rules. The experiment was conducted under the conditions of stable illumination, less noise, and no face appears in the picture. Before the start of the experiment, 40 male and female subjects with different palm shapes were selected to establish corresponding gesture sample libraries for each gesture at a distance of 55 cm from a camera, as shown in Fig. 3. For each gesture, three groups A, B, and C sequentially store the area–perimeter ratio, the effective-area ratio, and the seventh-order Hu moments into the gesture sample library. The experimental environment of this study was as follows: Windows 10 system, Intel^{®} Core™ i7-10700F @2.90 GHz hardware platform, with 16 GB RAM. Using an ordinary USB camera as a gesture acquisition device, the whole experiment was implemented on the MATLAB 2012a software platform, TensorFlow, and FPGA, including image preprocessing, hand-type adaptive, and gesture recognition. A Kinect camera was used in Ref. 14, but it is complicated and difficult to commercialize on a large scale due to its high cost. 4.1.Fixed PositionIn this paper, the new 40 subjects different from previous 40 subjects at the time of building the sample-library were selected. Each subject made the gestures shown in Fig. 9 at a fixed position (the positive direction of the $y$ axis in Fig. 10, that is, the 0-deg position, was 55 cm from the camera), with 10 experiments conducted per subject. Table 2 presents the recognition rates. Table 2Fixed position recognition rates.
4.2.Rotation ConditionFor each subject, five experiments for each gesture (600 experiments for each gesture) are performed at three rotation angles of $-45\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$, 45 deg, and 90 deg (i.e., the angle between the gesture and the $y$ axis, as shown in Fig. 10). Table 3 presents the recognition rates. Both Dardas^{35} and Sykora^{36} can identify rotated images, but they are complex and have poor real-time performance. Hu moments are invariant to rotation and scale, so the results are expected to not change with rotation and scale change of the hand. The algorithm in this paper is a combination of multiple features, not just Hu features, so this experiment is necessary. Table 3Recognition rates under three rotation angles.
4.3.Different DistancesFor each subject, five experiments were performed for each gesture (600 experiments for each gesture) at three distances (the distance from the camera): 40, 70, and 85 cm. The recognition rates are presented in Table 4. Table 4Recognition rates under different distances.
4.4.Algorithm ComparisonTo better illustrate the innovation and advantages of this algorithm, the following comparison experiments are designed. Related statistics are shown in Fig. 11. Considering the accuracy and real-time requirements of the proposed algorithm, the number of subtypes is set to three. This paper designs a comparison of hand types divided into three subtypes and no subtypes. As shown in Fig. 11(a), the experiment confirmed that the hand types were divided into three subtypes, which improved the overall accuracy rate by nearly 3%. The proposed algorithm is compared with the same design concept algorithms. As shown in Fig. 11(b), it was compared with two excellent algorithms with similar design concepts. Under the same experimental environment, the recognition rate of this algorithm is slightly higher than the other two algorithms. But the algorithm in this paper is more suitable for offline scenarios such as embedded artificial intelligence. In addition, due to the introduction of candidate gestures, the number of gestures can be expanded while ensuring real-time and accuracy. The response time of the three algorithms in the experiment is basically the same. The proposed algorithm uses candidate gestures to overcome the inherent weaknesses of template matching. For example, the consumption of computing resources increases as the number of templates increases, and it is difficult to guarantee the accuracy and real-time performance of the algorithm. The experiment results are shown in Figs. 11(c) and 11(d). The response time of the algorithm with candidate gestures better than that without candidate gestures. The following experiments were conducted with the deep-learning (CNN)^{47}^{–}^{49} and Hu moment algorithms.^{34} Shen et al.^{47} combined CNN with different methods, such as x-ray, and CNN is used for edge detection, feature extraction, and recognition. The network architecture from Dayal et al.^{49} includes five combined convolutional layers, and each convolutional combined layer is composed of several subconvolutional layers, nonlinear ReLU layers, and pooling layers. The double channel CNN (DC-CNN)^{48} has improved the rate of hand gesture recognition and has enhanced the generalization ability of the CNN. Multiple channels can obtain more abandoned information and make the identification more accurate. So, we mainly used the DC-CNN to conduct comparative experiments. The DC-CNN structure is composed of two relatively independent convolution neural network. Each channel contains the same number of convolutional layers and parameters. After the pooling layer, the double channels are respectively connected to a full connection layer and a full connection map is performed. There are 200 images are used for each gesture. Such a small number of samples is unfair to deep learning. However, this comparative experiment is designed to show the advantage of the small sample demand of the proposed algorithm. This experiment is only to illustrate the weaknesses of deep learning, not to explain that the accuracy of deep learning is not high. First, experiments were performed with a fixed position: 0 deg rotation angle and 55 cm distance from the camera. The recognition rates are plotted in Fig. 12. Further, experiments were conducted keeping the subjects’ rotation angles $-45\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{deg}$, 45 deg, and 0 deg for each gesture and with 55 cm fixed distance; the recognition rates are plotted in Fig. 13. Finally, experiments were performed for each subject at distances of 40, 70, and 85 cm for each gesture. The recognition rates are plotted in Fig. 14. The reason for the low accuracy of gesture 3 and gesture 7 is the feature adopted by the algorithm in this paper. Although the area–perimeter ratio and the effective-area ratio have the advantage of a small amount of calculation, gesture three is more likely to be recognized as gesture two or gesture four. Gesture 7 is more likely to be recognized as gesture 6. Since the overall recognition rate is still good, we comprehensively evaluated the advantages and disadvantages and finally decided to continue using them. For a more comprehensive comparison, we also summarized the accuracy, response time, and hardware platform in Table 5. The hardware verification platform of the proposed algorithm is Altera’s EP4CE15F17C8N chip and Intel^{®} Core™ i7-10700F @2.90 GHz hardware platform, with 16 GB RAM. The other two algorithms have also been reproduced on the Intel^{®} Core™ i7-10700F @2.90 GHz hardware platform, with 16 GB RAM. The accuracy rate is the average of the three cases (Fig. 15). Table 5Further comparison in terms of response time and hardware.
We want to provide a simple and efficient algorithm that is often overlooked but urgently needed in some areas. It is not used for complex human–computer conversations, so its accuracy and generalization ability are quite good. As shown in Table 6, its accuracy is very close to depth learning, but the hardware resources it consumes are very small. As shown in Fig. 16, the FPGA resource consumption is relatively small. Therefore, smaller hardware resources can be selected in practical applications. This also coincides with this paper’s attempt to provide a simple and efficient algorithm to be applied in some high-performance and low-power embedded scenes. Table 6The rate of the use of FPGA hardware resources.
4.5.Results of Hand-Type AdaptiveThe experiment results for hand-type adaptive partial pairing are presented in Figs. 16 and 17. The hand-type adaptive algorithm needed to process nine gestures, but the pairing result can only be output when the last gesture was processed. Therefore, only the last image is presented here with the final pairing result. 5.DiscussionThe experiment results indicated that 40 subjects participated in the experiments, and a total of 5400 gesture images in different situations were processed and identified. The proposed method, which is a simple hand gesture recognition method combined with the hand-type adaptive algorithm, utilizing the effective-area ratio, can realize real-time gesture recognition well. Under the fixed position, the overall recognition rate was more than 94%. The recognition rate was more than 94% under different distances from the camera, and it exceeded 93% at different rotation angles. Moreover, because of the combination of the feature recognition of the Hu moment algorithm, a high recognition rate could be still maintained for gestures with a small degree of differentiation, such as gesture 6, gesture 7, and gesture 8. Nine gesture types presented in Fig. 9 were identified in the same situation, and the recognition time of each gesture was recorded. The average recognition time of the nine gestures was calculated, which was 355.27 ms for the algorithm based on Hu moments, while it was 41.79 ms for the proposed algorithm. This confirms that the proposed method has the potential to expand the scope of gesture recognition in the future. 6.ConclusionsIn this study, a simple hand gesture recognition algorithm that combines the hand-type adaptive algorithm and effective-area ratio has been proposed. The sample library is paired using the hand-type adaptive algorithm. The effective-area ratio of the target is extracted to realize the initial recognition of the gesture and improve the speed of gesture recognition. By combining the Hu moment feature judgment, gestures with a small degree of differentiation can be well recognized. Experiments showed that the proposed algorithm has a high recognition rate and good robustness under different hand-to-camera distances, rotation angles, and hand types. In particular, the hand-type adaptive algorithm and the initial recognition of gestures enable improvement of the overall recognition rate and speed. The proposed recognition algorithm is simple and easy to implement. It has strong stability and practicability under the condition of relatively stable lightness of the environment and complex background. However, the proposed algorithm has some limitations. Although the experiment can cope with a complex background, a relatively stable illumination condition is required for effective recognition. In addition, the number of recognized gestures is relatively small. Future work will focus on solving lighting effects and identifying a larger number of gestures. AcknowledgmentsThis work was supported in part by the National Key Research and Development Program of China under Grant Nos. 2017YFA0206200 and 2018YFB2202601; in part by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61834005 and 61902443. This manuscript was approved by the People’s Government of Nangang District, Harbin. The informed consent of all subjects was waived by the People’s Government of Nangang District, Harbin. ReferencesR. Huang and G. Shi,
“Design of the control system for hybrid driving two-arm robot based on voice recognition,”
in Proc. IEEE 10th Int. Conf. Ind. Inf.,
602
–605
(2012). https://doi.org/10.1109/INDIN.2012.6300736 Google Scholar
Y. Liu et al.,
“Facial expression recognition with fusion features extracted from salient facial areas,”
Sensors, 17
(4), 172
(2017). https://doi.org/10.3390/s17040712 SNSRES 0746-9462 Google Scholar
W. Takano and Y. Nakamura,
“Action database for categorizing and inferring human poses from video sequences,”
Rob. Auton. Syst., 70 116
–125
(2015). https://doi.org/10.1016/j.robot.2015.03.001 RASOEJ 0921-8890 Google Scholar
D. Q. Leite et al.,
“Hand gesture recognition from depth and infrared Kinect data for CAVE applications interaction,”
Multimedia Tools Appl., 76
(20), 20423
–20455
(2017). https://doi.org/10.1007/s11042-016-3959-0 Google Scholar
X. L. Guo and T. T. Yang,
“Gesture recognition based on HMM-FNN model using a Kinect,”
J. Multimodal User Interfaces, 11
(1), 1
–7
(2017). https://doi.org/10.1007/s12193-016-0215-x Google Scholar
L. Yu and J. Y. Hou,
“Large-screen interactive imaging system with switching federated filter method based on 3D sensor,”
Complexity, 11 8730281
(2018). https://doi.org/10.1155/2018/8730281 COMPFS 1076-2787 Google Scholar
V. Gonzalez-Pacheco et al.,
“Teaching human poses interactively to a social robot,”
Sensors, 13
(9), 12406
–12430
(2013). https://doi.org/10.3390/s130912406 SNSRES 0746-9462 Google Scholar
D. Sidobre et al.,
“Human–robot interaction,”
Advanced Bimanual Manipulation, 123
–172 Springer, Heidelberg, Berlin
(2012). Google Scholar
G. Kollegger et al.,
“BIMROB-bidirectional interaction between human and robot for the learning of movements,”
in Proc. 11th Int. Symp. Comput. Sci. Sport,
151
–163
(2017). Google Scholar
M. Daushan et al.,
“Organising body formation of modular autonomous robots using virtual embryogenesis,”
New Trends in Medical and Service Robots, 73
–86 Springer, Cham
(2018). Google Scholar
T. Petrič, M. Cevzar, J. Babič,
“Shared control for human robot cooperative manipulation tasks,”
Advances in Service and Industrial Robotics, 787
–796 Springer, Cham
(2018). Google Scholar
N. T. Do et al.,
“Robust hand shape features for dynamic hand gesture recognition using multi-level feature LSTM,”
Appl. Sci., 10
(18), 6293
(2021). https://doi.org/10.3390/app10186293 Google Scholar
Y. Li et al.,
“Deep attention network for joint hand gesture localization and recognition using static RGB-D images,”
Inf. Sci., 441 66
–78
(2018). https://doi.org/10.1016/j.ins.2018.02.024 Google Scholar
Y. L. Coelho, J. M. Salomao and H. R. Kulitz,
“Intelligent hand posture recognition system integrated to process control,”
IEEE Latin Am. Trans., 15
(6), 1144
–1153
(2017). https://doi.org/10.1109/TLA.2017.7932703 Google Scholar
Z. Hu et al.,
“3D separable convolutional neural network for dynamic hand gesture recognition,”
Neurocomputing, 318 151
–161
(2018). https://doi.org/10.1016/j.neucom.2018.08.042 NRCGEO 0925-2312 Google Scholar
B. Dekker et al.,
“Gesture recognition with a low power FMCW radar and a deep convolutional neural network,”
in Proc. Eur. Radar Conf.,
163
–166
(2017). https://doi.org/10.23919/EURAD.2017.8249172 Google Scholar
L. Rong et al.,
“Improved reduced order fault detection filter design for polytopic uncertain discrete-time Markovian jump systems with time-varying delays,”
Complexity, 2018 1
–15
(2018). https://doi.org/10.1155/2018/9489620 COMPFS 1076-2787 Google Scholar
S.-J. Ryu et al.,
“Feature-based hand gesture recognition using an FMCW radar and its temporal feature analysis,”
IEEE Sens. J., 18
(18), 7593
–7602
(2018). https://doi.org/10.1109/JSEN.2018.2859815 ISJEAZ 1530-437X Google Scholar
L. Yu,
“Image noise preprocessing of interactive projection system based on switching filtering scheme,”
Complexity, 10 1258306
(2018). https://doi.org/10.1155/2018/1258306 COMPFS 1076-2787 Google Scholar
Y. Kim and B. Toomajian,
“Hand gesture recognition using micro-Doppler signatures with convolutional neural network,”
IEEE Access, 4 7125
–7130
(2016). https://doi.org/10.1109/ACCESS.2016.2617282 Google Scholar
Y. Li, X. Cheng and G. Gui,
“Co-robust-ADMM-net: joint ADMM framework and DNN for robust sparse composite regularization,”
IEEE Access, 6 47943
–47952
(2018). https://doi.org/10.1109/ACCESS.2018.2867435 Google Scholar
X. Z. Zhao et al.,
“A rotating machinery fault diagnosis method using composite multiscale fuzzy distribution entropy and minimal error of convex hull approximation,”
Meas. Sci. Technol., 32
(2), 025010
(2021). https://doi.org/10.1088/1361-6501/abbd11 MSTCEP 0957-0233 Google Scholar
W. B. Shen and T. G. Yew,
“A partitioning method in noise reduction and a hybrid convex hull algorithm for fingertips detection on an outstretched hand,”
AIP Conf. Proc., 1830 020011
(2017). https://doi.org/10.1063/1.4980874 Google Scholar
X. Ma and J. Peng,
“Kinect sensor-based long-distance hand gesture recognition and fingertip detection with depth information,”
J. Sens., 9 5809769
(2018). https://doi.org/10.1155/2018/5809769 Google Scholar
A. L. V. Coelho and C. A. M. Lima,
“Assessing fractal dimension methods as feature extractors for EMG signal classification,”
Eng. Appl. Artif. Intell., 36 81
–98
(2014). https://doi.org/10.1016/j.engappai.2014.07.009 EAAIE6 0952-1976 Google Scholar
E. Yavuz and C. Eyupoglu,
“A cepstrum analysis-based classification method for hand movement surface EMG signals,”
Med. Biol. Eng. Comput., 57 2179
–2201
(2019). https://doi.org/10.1007/s11517-019-02024-8 MBECDY 0140-0118 Google Scholar
J. Zhang and Z. Shi,
“Deformable deep convolutional generative adversarial network in microwave based hand gesture recognition system,”
in 9th Int. Conf. Wireless Commun. and Signal Process.,
1144
–1153
(2017). https://doi.org/10.1109/WCSP.2017.8170976 Google Scholar
G. Benitez-Garcia et al.,
“Improving real-time hand gesture recognition with semantic segmentation,”
Sensors, 21
(2), 356
(2021). https://doi.org/10.3390/s21020356 SNSRES 0746-9462 Google Scholar
Y. Li,
“Hand gesture recognition using Kinect,”
in Proc. IEEE Int. Conf. Comput. Sci. and Autom. Eng.,
22
–24
(2012). https://doi.org/10.1109/ICSESS.2012.6269439 Google Scholar
J. M. Palacios et al.,
“Human-computer interaction based on hand gestures using RGB-D sensors,”
Sensors, 13 11842
–11860
(2013). https://doi.org/10.3390/s130911842 SNSRES 0746-9462 Google Scholar
Z. H. Chen et al.,
“Real-time hand gesture recognition using finger segmentation,”
Sci. World J., 2014 267872
(2014). https://doi.org/10.1155/2014/267872 Google Scholar
Z. Ren et al.,
“Robust part-based hand gesture recognition using kinect sensor,”
IEEE Trans. Multimedia, 15 1110
–1120
(2013). https://doi.org/10.1109/TMM.2013.2246148 Google Scholar
Z. Ren, J. Yuan and W. Liu,
“Minimum near-convex shape decomposition,”
IEEE Trans. Pattern Anal. Mach. Intell., 35 2546
–2552
(2013). https://doi.org/10.1109/TPAMI.2013.67 ITPIDJ 0162-8828 Google Scholar
T. Z. Zhang, X. H. Gao and J. Y. Li,
“The improved Hu moment and its application in gesture recognition,”
in Proc. Int. Conf. Comput. Vision, Image and Deep Learn.,
577
–580
(2020). https://doi.org/10.1109/CVIDL51233.2020.00-24 Google Scholar
N. H. Dardas and N. D. Georganas,
“Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques,”
IEEE Trans. Instrum. Meas., 60
(11), 3592
–3607
(2011). https://doi.org/10.1109/TIM.2011.2161140 IEIMAO 0018-9456 Google Scholar
P. Sykora and P. Kamencay,
“Comparison of SIFT and SURF methods for use on hand gesture recognition based on depth map,”
AASRI Procedia, 9 19
–24
(2014). https://doi.org/10.1016/j.aasri.2014.09.005 Google Scholar
H. Y. Lam et al.,
“Classification of moving targets using mirco-Doppler radar,”
in Proc. 17th Int. Radar Symp.,
1
–6
(2016). https://doi.org/10.1109/IRS.2016.7497317 Google Scholar
P. Molchanov et al.,
“Multi-sensor system for driver’s hand-gesture recognition,”
in 11th IEEE Int. Conf. and Workshops Autom. Face and Gesture Recognit.,
1
–8
(2015). https://doi.org/10.1109/FG.2015.7163132 Google Scholar
Y. Sun, Z. Wu and F. Meng,
“Common weak linear copositive Lyapunov functions for positive switched linear systems,”
Complexity, 2018 1365960
(2018). https://doi.org/10.1155/2018/1365960 COMPFS 1076-2787 Google Scholar
Z. Zhang, Z. Tian and M. Zhou,
“Latern: dynamic continuous hand gesture recognition using FMCW radar sensor,”
IEEE Sens. J., 18
(8), 3278
–3289
(2018). https://doi.org/10.1109/JSEN.2018.2808688 ISJEAZ 1530-437X Google Scholar
M. Mustafal et al.,
“EEG spectrogram classification employing ANN for IQ application,”
in Proc. Int. Conf. Technol. Adv. Electr., Electron. Comput. Eng.,
199
–203
(2013). https://doi.org/10.1109/TAEECE.2013.6557222 Google Scholar
M. Liang et al.,
“Reconfigurable array design to realize principal component analysis (PCA)-based microwave compressive sensing imaging system,”
IEEE Antennas Wireless Propag. Lett., 14 1039
–1042
(2015). https://doi.org/10.1109/LAWP.2014.2386356 IAWPA7 1536-1225 Google Scholar
C. Zheng et al.,
“Doppler bio signal detection-based time-domain hand gesture recognition,”
in Proc. IEEE MTT-S Int. Microw. Workshop Ser. RF Wireless Technol. Biomed. Healthcare Appl.,
3
(2013). https://doi.org/10.1109/IMWS-BIO.2013.6756200 Google Scholar
J. Wang et al.,
“Noncontact distance and amplitude-independent vibration measurement based on an extended DACM algorithm,”
IEEE Trans. Instrum. Meas., 63
(1), 145
–153
(2014). https://doi.org/10.1109/TIM.2013.2277530 IEIMAO 0018-9456 Google Scholar
D. Miao et al.,
“Doppler radar-based human breathing patterns classification using support vector machine,”
in Proc. IEEE Radar Conf.,
456
–459
(2017). https://doi.org/10.1109/RADAR.2017.7944246 Google Scholar
L. Song et al.,
“Application of federal Kalman filter with neural networks in the velocity and attitude matching of transfer alignment,”
Complexity, 2018 3039061
(2018). https://doi.org/10.1155/2018/3039061 COMPFS 1076-2787 Google Scholar
X. Shen et al.,
“Research on bone age automatic judgment algorithm based on deep learning and hand x-ray image,”
J. Med. Imaging Health Inf., 11 156
–161
(2021). https://doi.org/10.1166/jmihi.2021.3443 Google Scholar
X. Y. Wu,
“A hand gesture recognition algorithm based on DC-CNN,”
IEEE Multimedia Tools Appl., 79 9193
–9205
(2020). https://doi.org/10.1007/s11042-019-7193-4 Google Scholar
A. Dayal et al.,
“Design and implementation of deep learning based contactless authentication system using hand gestures,”
Electronics, 10
(2), 182
(2021). https://doi.org/10.3390/electronics10020182 ELECAD 0013-5070 Google Scholar
Y. T. Qassim, T. R. Cutmore and D. D. Rowlands,
“Optimized FPGA based continuous wavelet transform,”
Comput. Elect. Eng., 49 84
–94
(2016). https://doi.org/10.1016/j.compeleceng.2014.11.012 CPEEBQ 0045-7906 Google Scholar
E. Nurvitadhi et al.,
“Can FPGAs beat GPUs in accelerating next generation deep neural networks?,”
in Proc. ACM/SIGDA Int. Symp. Field-Program. Gate Arrays,
5
–14
(2017). Google Scholar
S. Han et al.,
“EIE: efficient inference engine on compressed deep neural network,”
in Proc. 43rd Int. Symp. Comput. Archit.,
243
–254
(2016). Google Scholar
BiographyQiang Zhang is a PhD candidate at the School of Microelectronics Science and Technology, Sun Yat-sen University, Zhuhai, China. His research interests include machine vision, image processing, and embedded artificial intelligence. Shanlin Xiao received his BS degree in communications engineering and his MS degree in communications and information systems from the University of Electronic Science and Technology of China, Chengdu, China, in 2009 and 2012, respectively. He received his PhD in communications and computer engineering from Tokyo Institute of Technology, Tokyo, Japan, in 2017. He is currently an associate research professor at the School of Electronics and Information Technology in Sun Yat-sen University, Guangzhou, China. Zhiyi Yu is with the School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China, and also with the School of Microelectronics Science and Technology, Sun Yat-sen University, Zhuhai, China. |