 
       
Volume 14 | Issue 5
Volume 14 | Issue 5
Volume 14 | Issue 5
Volume 14 | Issue 5
Volume 14 | Issue 5
This paper offers a blended network of RGB, depth, and skeleton inputs fed into CNNs in both directions. In order to learn the combined temporal features of the action, CNNs are used to characterize the RGB and depth data, while LSTMs are used to encode the skeletal data in both directions. At last, the L2 distance metric is used to choose the probability distribution generated from the three inputs. Coupling the model with a mixed CNN BILSTM network and computing an L2 distance measure in place of score fusion improved performance to 94.73%. Finally, the proposed models were compared to both cutting-edge deep learning methods and classic machine learning models.