With the widespread use of wearable devices, human activity recognition (HAR) holds immense potential in health monitoring, smart environment. Notably, temporal sensory sequences collected from the wearable devices can provide accurate reflections of the daily activities. Nonetheless, existing CNN-based and LSTM-based methods have predominantly concentrated on feature extraction from univariate sequences, overlooking the implicit frequency information. Therefore, we firstly employed the Short Time Fourier Transform (STFT) in HAR tasks, extracting inherent frequency feature. Concurrently, we introduced a multi-branch network that combines CNN and LSTM. The CNN component captures spatial information of different dimensions. The LSTM, on the other hand, comprises two parts, one focused on temporal relationships within a single channel and the other concerned about channel relationships at a specific time point. In addition, recognizing the limitations in the available datasets, particularly the insufficient coverage of daily activities, we collected our custom dataset, encompassing eight distinct daily activity categories. Finally, we evaluated our proposed model and benchmark models. The results demonstrate that our network exhibits superior generalization across different datasets, achieving accuracy of 91.70%, 95.79%, 87.81% on the PAMAP2, UCI HAR and our own dataset respectively.
|