Computing Text Semantic Relatedness Using the Contents and Links of a Hypertext Encyclopedia: Extended Abstract / 3185
Majid Yazdani, Andrei Popescu-Belis
We propose methods for computing semantic relatedness between words or texts by using knowledge from hypertext encyclopedias such as Wikipedia. A network of concepts is built by filtering the encyclopedia's articles, each concept corresponding to an article. A random walk model based on the notion of Visiting Probability (VP) is employed to compute the distance between nodes, and then between sets of nodes.To transfer learning from the network of concepts to text analysis tasks, we develop two common representation approaches. In the first approach, the shared representation space is the set of concepts in the network and every text is represented in this space. In the second approach, a latent space is used as the shared representation, and a transformation from words to the latent space is trained over VP scores.We applied our methods to four important tasks in natural language processing: word similarity, document similarity, document clustering and classification, and ranking in information retrieval. The performance is state-of-the-art or close to it for each task, thus demonstrating the generality of the proposed knowledge resource and the associated methods.