Large Scale Homophily Analysis in Twitter Using a Twixonomy / 2334
Stefano Faralli, Giovanni Stilo, Paola Velardi
In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A word-sense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity.We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that mid-low level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation.