Multi-Feature Video Recommendation Based on Hypergraph Convolution for Mobile Edge Environment

With the massive growth of edge devices, how to provide users with video recommendation services in a mobile edge environment has become a research hotspot. Most traditional video recommendation methods regard the relationship between user and neighbor to be linear and ignore higher-order connectivity among users, which results in poor recommendation performance. Besides, these methods use a single feature to represent user preferences, which cannot effectively alleviate the data sparsity problem. To improve the performance of video recommendation, this article proposes a multi-feature video recommendation method based on hypergraph convolution (MVRHC). Hypergraph convolution is adopted to compute user neighborhood-level features for modeling high-order correlations among users. Final features are obtained by fusing multi-party features through attention mechanism. And video recommendation is then implemented based on the obtained features. Experimental results on two real-world datasets demonstrate that MVRHC has better performance compared with baseline methods.

When it comes to video recommendation, the attributes of videos are not as explicit as that of other products. Challenges of how to obtain features successfully by certain keywords from videos need to be addressed. As a result, video recommendation has become a research hotspot.
In mobile edge environment, there will be a lot of interactions between users. And these data contain a wealth of information, which is of great significance to helping feature extraction and video recommendation. Suppose a scenario to push videos to customers/users, we can make recommendations based on the characteristics of neighbors. Videos can be classified into different types such as romance, comedy and mystery. As shown in Figure 1, both users, Bob and Alice, like to watch comedy, then Bob can be defined as Alice's neighbor. According to the preferences of neighbors, videos that have not been watched by Alice, such as mystery, can be recommended to her.
Traditional video recommendation methods, such as content-based video recommendation (Dong et al., 2018;Ramadhan & Musdholifah, 2021;Subercaze et al., 2016), collaborative filtering video recommendation (Di Yu & Chen, 2020;Shen et al., 2020) and hybrid video recommendation (Pérez-Marcos et al., 2020;Yan et al., 2015;Zhou et al., 2019). Content-based recommendation methods use video content to predict the preferences of target users, recommending videos to users that are similar to what the user previously liked or watched. Collaborative filtering recommendation methods use user feedback information, such as previous ratings and viewing history, to predict user preferences. Hybrid recommendation methods combine user feedback and consumed video content to improve recommendations. Those methods only consider the simple interactive relationships that cannot mine users' comprehensive interest representations (Stoica & Chaintreau, 2019;Xu et al., 2021;Yang et al., 2019;Yang et al., 2021). Some methods (Cai et al., 2022;Pingali et al., 2022) improve the performance of model by mining the information in the video data, but the excessive computational overhead makes it unsuitable for some low latency scenarios, such as mobile edge environments. In recent years, some studies have used deep learning to construct users' features. J. Chen et al. (2017) proposed two attention models to construct user's features at component-level and item-level, and  Chen et al. (2018) proposed an attention model to combine users' features at category-level and item-level. However, such recommendation systems may lead to poor recommendation quality due to the following reasons: 1. Most traditional video recommendation methods regard the relationship between user and neighbor to be linear and ignore higher-order connectivity among users, and cannot obtain the user feature based on neighborhood well, which leads to poor recommendation quality. 2. Most traditional video recommendation methods mine users' single-feature from single-source data, which are usually sparse and cannot obtain valid users' interest representations.
In view of the above analysis, this paper proposes a multi-feature video recommendation method based on hypergraph convolution (MVRHC). In real video recommendation, users often have interactive relationships with multiple neighbors, and simply treating this complex high-order relationship as a pairwise relationship will inevitably lead to the loss of valuable information. A natural extension is to rely on hypergraphs, where hyperedges can accurately model high-price relationships among users. In addition, multi-feature is the aggregation of information from multi-source data, which makes the representation learned by the model more complete. Data from multiple sources are semantically interrelated and provide complementary information to each other, which is beneficial to alleviate the data sparsity problem and obtain more reliable predictions. Specifically, the authors first use hypergraph convolution to learn neighborhood-level feature of users by modeling higherorder correlations among users. Then, attention mechanism is used to learn user item-level feature according to the user's historical interaction record with the video. Neighborhood-level feature and item-level feature are fused to obtain users' multi-feature representations. Video recommendation is finally implemented based on users' multi-feature representations.
Main contributions of this paper are summarized as follows: 1. This paper proposes a video recommendation method based on hypergraph convolution which combines multi-feature representations of users. 2. This paper uses hypergraph convolution for modeling the high-order correlations among users to obtain user neighborhood-level feature. Moreover, attention mechanism is used to obtain user item-level feature to reflect the different importance of videos. 3. Experimental results on two real-world datasets show that the proposed method achieves better recommendation performance than those baseline methods.

Recommendation Methods Based on Hypergraph
Hypergraph is a generalization of graph which uses hyperedges to connect multiple vertexes. The recommendation methods based on hypergraph are similar to those of graph. However, hypergraph can provide more information and reflect the high-order correlations between vertexes Zhao et al., 2018). On the basis of traditional hypergraph learning, hypergraph is gradually combined with deep learning, which is often used to solve problems such as classification and image processing. Hypergraph neural network was first used to deal with complex interactions between data Jiang et al., 2019), and it was also used in recommendation in recent studies. Jiang et al. (2019) proposed a dynamic hypergraph construction method, which adopts k-NN method to generate basic hyperedge and extends adjacent hyperedge set by clustering algorithm. By dynamic hypergraph construction method, local and global relations will be extracted. Wu et al. (2019) proposed a collaboration matrix factorization model CMF that combines projection methods with convolutional matrix factorization to extract the collaboration between rating-based latent factors and review-based latent factors. Ji et al. (2020) proposed DHCF model, which combines hypergraph and Collaborative filtering to learn the high-order correlations between users and items. Xia et al. (2021) proposed a dual channel hypergraph convolutional network DHCN model, which modeled session data as a hypergraph to extract user preference embeddings and integrated a self-supervised task into the training of network to enhance hypergraph modeling and improve the recommendation task. Yu et al. (2021) proposed a multi-channel hypergraph convolutional network to improve social recommendation by leveraging high-order user relations.

Multi-Feature Representations
Multi-feature is the aggregation of information from multi-source data, which makes the representation learned by the model more complete. There have been many studies on multi-feature representations. AFM (Xiao et al., 2017) used the attention mechanism to improve factorization machines to distinguish the importance of feature interaction. ACF (J. Chen et al., 2017) used collaborative filtering based on the attention mechanism to model features at item-level and component-level. THACIL (Chen et al., 2018) used the temporal hierarchical attention model to construct user features in category-level and item-level. LS-PLM (Gai et al., 2017) first used the embedding layer to process the sparse initial input and then uses the transfer function to obtain the relationships between features. DIN ) is a deep interest network that adaptively expresses users' interest representations through the correlation of historical behaviors. DAFC (Yin et al., 2020) uses auto-encoder and clustering methods to mitigate overfitting in feature extraction. MUIR  used a deep network to extract multi-feature representations of users. However, these methods still do not make good use of the high-order correlations to represent user preferences.

MULTI-FEATURE VIDEo RECoMMENDATIoN BASED oN HyPERGRAPH CoNVoLUTIoN
This paper proposes a multi-feature video recommendation based on hypergraph convolution for mobile edge environment. As illustrated in Figure 2, MVRHC consists of four parts: 1) Feature extraction; 2) Neighborhood-level feature through hypergraph convolution; 3) Item-level feature through attention mechanism; 4) User-side feature representation and prediction. This paper uses bold uppercase letters to denote a matrix and bold lowercase letters to denote a vector in this paper. To compress user i's tag-aware feature vector and video j's tag-aware feature vector into a low-dimensional feature space, u i tag and v j tag are fed into MLP with shared parameters. Then, user neighborhood-level feature x u N (u means user-side and N means neighborhood-level) is obtained through hypergraph convolution, and user item-level feature x u V (u means user-side and V means item-level) is obtained through the attention mechanism. The final feature x u * on the user side is obtained by fusing multiple features through the attention mechanism. Finally, video recommendation is implemented based on the probability obtained by the inner product of the user final feature vector and the video feature vector. Frequently used notations in this paper are shown in Table 1.

Feature Extraction
First, a tuple F U V T = ( , , ) is used to represent user set U, video set V and tag set T. Specifically MLP is used to compress sparse user-tag feature vector u i tag and video-tag feature vector v j tag from high-dimensional space to low-dimensional space. and by sharing parameters, map them to the same space. Take the user side as an example, w l ( ) represents the l-th layer shared parameter between  x v (j) Video-side feature after MLP (v means video-side) User-side item-level feature (u means user-side and V means item-level) x u N User-side neighborhood-level feature (u means user-side and N means neighborhood-level) x u * User-side final feature (u means user-side) the user side and the video side, and use ReLU as the activation function. The input and output of the first layer can be expressed as follows: For each layer of MLP, the input and output can be expressed as follows: Therefore, the feature representation x u on the user side is obtained. Similarly, the feature representation on the video side is x v .

Neighborhood-Level Feature Through Hypergraph Convolution
According to the idea of the user-based collaborative filtering method, the features of neighbors with high similarity can reflect the features of the target user. As illustrated in Figure 3, the authors use the hypergraph convolution method to model higher-order connectivity among users and extract neighborhood-level feature.

Hypergraph Construction
Unlike ordinary graph, a hyper-edge in a hypergraph can contain three or more vertexes. Hypergraph can be expressed as where V is the set of vertexes and E is the set of hyperedges. The can be used to represent a hypergraph. In this paper, the vertexes of hypergraph represent users feature x u . The Euclidean distance is calculated for every two users, and for each vertex, regard the set of its nearest k neighbors as a hyperedge. Then, the incidence matrix H G of hypergraph G is obtained. Each element h v e , ( ) indicate whether vertex v belongs to hyperedge e: Therefore, H G can be used to represent the user hypergraph. The overall algorithm of the hypergraph construction is presented in Algorithm1. In lines 1-9, the algorithm constructs hyperedge set E based on the input user feature x u . Specifically, the algorithm computes the Euclidean distance between users in line 3, and constructs hyperedges through each user's top k similar users in lines 4-8. In lines 10-18, the algorithm constructs the hypergraph G and obtain the incident matrix H G of G. In general, it is to select the most similar k users from the user set to construct a hyperedge for each user, so the complexity of Algorithm 1 is O n ( ) 2 , where n is the size of user set U.

Hypergraph Convolution
Similar to graph convolution, hypergraph convolution is to update vertex features through the process of vertex-hyperedge-vertex. Referring to the spectral hypergraph convolution proposed in , the hypergraph convolution can be expressed as follows: Q ( ) l represents the learnable parameter matrix on the l-th layer. D is a degree matrix, and the elements on the diagonal are the degrees of H G . The multiplication operation with H G T represents the aggregation from vertex feature to hyperedge feature, and the multiplication operation with H G represents the aggregation from the hyperedge feature to vertex feature. In addition, according to the idea of ResNet (He et al., 2016), in each layer of the hypergraph convolution, the previous features will be preserved by the add operation with X u l ( ) . After passing x u through l hypergraph convolutional layer, taking the final output as user neighborhood-level feature X u N .

Item-Level Feature Through Attention Mechanism
According to users' historical interaction records with videos, the features of videos can be used to express the features of users to infer users' interest representations. In order to obtain the influence weights of different video features on user interest representation, the authors introduce the attention mechanism (Bahdanau et al., 2014). According to the user-video rating matrix, the historical interactive video set V i for each user i is obtained. Then user i's interest representations on V i is constructed through the attention mechanism. The attention score calculated by the two-layer network is calculated as follows: where w u 1 , w v 1 , 1 b are the parameters of the first layer, and w T 1 ,c 1 are the parameters of the second layer. a i j , ( ) represents the weight of video j relative to user i.
The item-level feature by historical interaction records is calculated as follows:

User-Side Feature Representation and Prediction
In order to integrate the user-side neighborhood-level feature x u N and item-level feature x u V , a selfattention mechanism is used to calculate weights of multi-feature.
where w 2 , c 2 are the parameters of the attention mechanism, and x ui k is neighborhood-level feature x u N or item-level feature x u V . The final feature representation x u * on the user side is obtained.
In order to make full use of the interaction between features, we first concatenate the final feature vector x u * on the user side and the video feature vector . Then, we use full connection layer and sigmoid function to obtain user i's predicted probability of the video j.
The cross-entropy loss function is used as the loss function for model training, and TS is the training sample set.

Experiment
In this section, the authors verify the feasibility and effectiveness of the model proposed in this paper through experiments. It is worth noting that users' interaction in mobile edge environment is abstracted as neighbor relations, and the features of target users are obtained through hypergraph convolution.
The authors first introduce the datasets, experimental environment, parameter settings, baseline methods and evaluation metrics. Then, the experimental results are compared and analyzed in detail.

Datasets
In order to test the effectiveness of MVRHC, MovieLens-Latest and MovieLens-10M are adopted as datasets for experiments. MovieLens is a dataset of multiple versions of interactive information such as ratings, reviews, tags of users on the IMDB provided by the University of Minnesota. MovieLens-Latest and MovieLens-10M are two different versions of MovieLens. The statistics of the above two datasets are shown in Table 4:

Experimental Environment and Parameter Settings
The experimental hardware environment is CPU Intel(R) Core(TM) i9-9980XE @ 3.00GHz, memory 128GB. Using Python programming language, PyTorch deep learning framework and Numpy scientific computing library. The Adam optimizer is adopted for training. The number of MLP layers is 3, the number of neurons in each layer is 512, 256, 128, the number of hypergraph convolution layers is 2, the dropout rate of the dropout layer is 0.5, and the batch size is 128.

Baseline Methods and Evaluation Metrics
The method proposed in this paper will be compared with four baseline methods: (1) NCF : A collaborative filtering algorithm which uses neural networks to obtain features for recommendation. (2) CFA (Zuo et al., 2016): This algorithm uses deep neural networks to extract deep features from tags, and combines them with user-based collaborative filtering for recommendation.
(3)TNAM (Huang et al., 2020): A tag-aware neural attention model for Top-K recommendation. This method considers both user-item interaction and tag information, and uses neural attention networks for recommendation.
(4) DHCF (Ji et al., 2020): A dual channel hypergraph collaborative filtering method. This method uses hypergraph convolution on the user-side and the item-side for recommendation.
This paper uses the following evaluation metrics: HR@K: Hit ratio for Top-K recommendation: where GT is the size of test set, and NumberOfHits@K is the number of successfully predicted items in the Top-K recommendation.
nDCG@K: Normalized discounted cumulative gain for Top-K recommendation: DCG rel where rel i represents the degree of correlation of the i-th recommendation, DCG represents discounted cumulative gain, and IDCG represents the maximum DCG value under ideal conditions.

MRR@K:
Mean reciprocal rank for Top-K recommendation: where rank i refers to the rank position of the first relevant item for target user u in the recommended list.

Experimental Results
Experimental results are analyzed from four aspects: a) Overall analysis; b) compare and analyze MVRHC with variant models; c) compare and analyze MVRHC on training datasets of different sizes; and d) compare and analyze MVRHC with different parameters.

Overall Analysis
In this section, MVRHC proposed in this paper is compared with baseline methods and analyzed based on the experimental results of the datasets. From Table 5 and Table 6, it can be seen that the overall performance of MVRHC is better than baseline methods, which verifies the effectiveness of the proposed method. The improvements over the best baseline are 4.23% on MovieLens-Latest and 5.08% on MovieLens-10M. Specifically, since NCF and CFA only use neural network to extract static features of users and items, without considering the relationship between users and the historical interactions of users, these two baseline methods are less effective, while MVRHC uses hypergraph convolution to model users' high-order correlations and extract user neighborhood-level feature. MVRHC adopts the idea of user-based collaborative filtering and constructs feature through similar users. The attention mechanism is used to obtain the influence of different videos in the users' historical interaction records so as to construct the user item-level feature. Therefore, the method proposed in this paper is superior to other baseline methods. In addition, some methods improve model performance by mining video data but also bring a significant computational overhead (Carreira et al., 2017). Our model combines the characteristics of hypergraph convolution and multi-features to obtain recommendation performance by mining textual information, which is more suitable for scenarios with high real-time requirements, such as mobile edge environments.

Compare With Variant Models
Compare and analyze two variants of the model: (1) MVRHC-1: No hypergraph convolution is used; (2) MVRHC-2: No attention mechanism is used.
According to the experimental results shown in Figure 4-5, these two variant models show poor recommendation performance. This is because the hypergraph convolution can extract neighborhoodlevel feature based on the high-order correlations of vertexes so as to feed back the features of similar users to the target user. The attention mechanism can learn the weights of different historical interactions on the user so as to reflect the influence of different videos on the construction of user preferences. The experimental results reflect the necessity and rationality of these two parts of the model proposed.

Compare on Training Datasets of Different Sizes
The authors select 5%, 10%, 20%, 40%, 60%, and 80% of MovieLens-10M as the training dataset and analyze the impact of data sparsity on performance. It can be seen in Figure 6 that MVRHC is less affected by data sparsity than other baseline methods. Because MVRHC uses hypergraph convolution and attention mechanism to obtain high-order correlations and multi-feature information, which make the representation learned by the model more complete. These semantically related information will provide complementary information to each other, alleviating the sparsity problem of single-feature representation.

Compare on Model Parameter
The authors analyze the impact of hyperparameter k on recommendation performance, where k represents the number of the top-k nearest neighbors selected by each vertex. As can be seen from Figure 7 that the performance improves with the increase of k. However, after the value of k is incremented to 10, the experimental results show poor performance. This is because when the value of k is small, neighborhood-level feature of users cannot be well represented. Features of neighbors may be similar to that of target user on some dimensions. However, when there are few neighbor nodes available, it is not enough to enrich the features of the target user. On the other hand, a larger value of k means that features of low-similarity neighbors are included in the hyperedge, which will introduce noisy information.

DISCUSSIoN
In this paper, we propose a multi-feature video recommendation method based on hypergraph convolution. The theoretical and practical contributions of our work are listed as follows:

Theoretical Contributions
Our model makes a systematic investigation on the relationships among users for recommendation system. By introducing hypergraph convolution to model the nonlinear relationship among users, the preference features in users' interaction records are obtained, which extends the research from linear to nonlinear on users' social relationship modeling. Additionally, our model takes the semantic correlation between different types of source data into consideration, including user-user interaction information and user-item historical interaction information. By introducing multi-feature method to fuse information, the feature representation has been enriched in semantics, which provides an alternative solution for the problem of data sparsity in recommendation system.

Practical Contributions
Our model aims at achieving a good balance between recommendation performance and computational overhead in mobile edge environment. Due to the inherent ambiguity of text information, it may have a negative impact on the performance of traditional recommendation models based on text mining. Some methods (Cai et al., 2022;Pingali et al., 2022) improve the performance of recommendation system by mining video data features, but the model takes significant time to process video data, which cannot meet the low latency requirements of users in mobile edge environment. Our model combines the characteristics of hypergraph convolution and multiple features, and accurate recommendation results are obtained by mining textual information, which has more advantages in practical applications. In addition, our model is designed based on the mobile edge environments, which can handle large amounts of interaction data. The noise information in these interaction data will lead to poor performance for some traditional models. Our model employs deep learning theory and technology to explore the common rules existing in disordered data from the perspective of data governance, thereby obtaining effective video recommendation prediction sequences provides a diverse approach to data management.

CoNCLUSIoN
This paper proposed a multi-feature video recommendation method based on hypergraph convolution for mobile edge environment. Through hypergraph convolution and attention mechanism, the users' neighborhood-level feature and item-level feature are integrated to obtain the final feature representations, so as to complete video recommendation by multi-feature. Experimental results indicate that MVRHC can reflect high-order correlations between users and integrate multi-feature effectively. The authors' current work focuses on how to improve the performance of video recommendation through mining interaction information between users and neighbors. However, the authors will investigate on fine-grained feature for the representation of user preferences, such as real-time property and location, to model user dynamic preferences for video recommendation in future research.
Haiyan Wang received the Ph.D. degree from the School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China, in 2008