Fine-Grained Activity Recognition Using Machine Learning

Abstract. The present disclosure relates to a custom framework for fine-grained human activity recognition. One or more input videos may be accessed, where the one or more input videos comprise one or more frames depicting one or more actors and one or more objects. A plurality of object-pose interaction graphs may be generated for individual frames from the one or more input videos based at least in part on one or more objects of interest from the one or more objects and on one or more joint keypoints of the one or more actors. A first graph neural network may be trained based at least in part on the plurality of object-pose interaction graphs to identify spatial information for the one or more actors, the one or more objects of interest, and one or more interactions between the one or more actors and the one or more objects of interest. A second graph neural network may be trained based at least in part on the plurality of object-pose interaction graphs and one or more keyframes from the plurality of frames to identify temporal information for the one or more actors, the one or more objects of interest, and the one or more interactions between the one or more actors and the one or more objects of interest. A classifier may be trained to identify one or more actions in the one or more input videos based at least in part on the spatial information and the temporal information.

Links: Patent

Amit Agarwal