Het2Hom: Representation of Heterogeneous Attributes into Homogeneous Concept Spaces for Categorical-and-Numerical-Attribute Data Clustering

Het2Hom: Representation of Heterogeneous Attributes into Homogeneous Concept Spaces for Categorical-and-Numerical-Attribute Data Clustering

Yiqun Zhang, Yiu-ming Cheung, An Zeng

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 3758-3765. https://doi.org/10.24963/ijcai.2022/522

Data sets composed of a mixture of categorical and numerical attributes (also called mixed data hereinafter) are common in real-world cluster analysis. However, insightful analysis of such data under an unsupervised scenario using clustering is extremely challenging because the information provided by the two different types of attributes is heterogeneous, being at different concept hierarchies. That is, the values of a categorical attribute represent a set of different concepts (e.g., professor, lawyer, and doctor of the attribute "occupation"), while the values of a numerical attribute describe the tendencies toward two different concepts (e.g., low and high of the attribute "income"). To appropriately use such heterogeneous information in clustering, this paper therefore proposes a novel attribute representation learning method called Het2Hom, which first converts the heterogeneous attributes into a homogeneous form, and then learns attribute representations and data partitions on such a homogeneous basis. Het2Hom features low time complexity and intuitive interpretability. Extensive experiments show that Het2Hom outperforms the state-of-the-art counterparts.
Keywords:
Machine Learning: Clustering