One paper been accepted by IEEE TPAMI 🎉
Title: Hypergraph-Based High-Order Correlation Analysis for Large-Scale Long-Tailed Data red Classification
High-order correlations, which capture complex interactions among multiple entities, extend beyond traditional graph representations and support a wider range of applications. However, existing neural network models for high-order correlations encounter scalability issues on large datasets due to the substantial computational complexity involved in processing large-scale structures. In addition, long-tailed distributions, which are common in real-world data, result in underrepresented categories and hinder the model’s ability to learn effective high-order interaction patterns for rare instances. To address these issues, we introduce a novel framework known as HyperGraph-based High-order Correlation analysis (HGHC) for large-scale long-tailed data classification. Firstly, to tackle the long-tailed distribution problem, HGHC generates synthetic vertices and computes their attributed high-order correlations using an oversampling module inspired by SMOTE, termed HSMOTE, to enhance the representation of tail categories. Secondly, for efficient computational scaling, we treat the data as having two modalities: the structural modality capturing high-order relationships and the feature modality representing individual attributes. We perform computations on both CPU and GPU separately and then fuse the results to achieve a lightweight vertex transformation and aggregation scheme for high-order correlation data. Additionally, we contribute the first benchmark for large-scale long-tailed datasets involving high-order correlations, known as Amazon-LT, which includes multiple datasets with varying imbalance ratios. Our experimental results demonstrate that HGHC achieves state-of-the-art performance in handling high-order correlation analysis issues for large-scale, long-tailed data.
