Hierarchical Matching Network for Heterogeneous Entity Resolution
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 3665-3671. https://doi.org/10.24963/ijcai.2020/507
Entity resolution (ER) aims to identify data records referring to the same real-world entity. Most existing ER approaches rely on the assumption that the entity records to be resolved are homogeneous, i.e., their attributes are aligned. Unfortunately, entities in real-world datasets are often heterogeneous, usually coming from different sources and being represented using different attributes. Furthermore, the entities’ attribute values may be redundant, noisy, missing, misplaced, or misspelled—we refer to it as the dirty data problem. To resolve the above problems, this paper proposes an end-to-end hierarchical matching network (HierMatcher) for entity resolution, which can jointly match entities in three levels—token, attribute, and entity. At the token level, a cross-attribute token alignment and comparison layer is designed to adaptively compare heterogeneous entities. At the attribute level, an attribute-aware attention mechanism is proposed to denoise dirty attribute values. Finally, the entity level matching layer effectively aggregates all matching evidence for the final ER decisions. Experimental results show that our method significantly outperforms previous ER methods on homogeneous, heterogeneous and dirty datasets.
Natural Language Processing: Coreference Resolution
Natural Language Processing: Information Extraction
Data Mining: Classification, Semi-Supervised Learning