Crowdsourced Semantic Matching of Multi-Label Annotations / 3483
Lei Duan, Satoshi Oyama, Masahito Kurihara, Haruhiko Sato
Most multi-label domains lack an authoritative taxonomy. Therefore, different taxonomies are commonly used in the same domain, which results in complications. Although this situation occurs frequently, there has been little study of it using a principled statistical approach. Given that (1) different taxonomies used in the same domain are generally founded on the same latent semantic space, where each possible label set in a taxonomy denotes a single semantic concept, and that (2) crowdsourcing is beneficial in identifying relationships between semantic concepts and instances at low cost, we proposed a novel probabilistic cascaded method for establishing a semantic matching function in a crowdsourcing setting that maps label sets in one (source) taxonomy to label sets in another (target) taxonomy in terms of the semantic distances between them. The established function can be used to detect the associated label set in the target taxonomy for an instance directly from its associated label set in the source taxonomy without any extra effort. Experimental results on real-world data (emotion annotations for narrative sentences) demonstrated that the proposed method can robustly establish semantic matching functions exhibiting satisfactory performance from a limited number of crowdsourced annotations.