Unsupervised Storyline Extraction from News Articles / 3014
Deyu Zhou, Haiyang Xu, Xin-Yu Dai, Yulan He
Storyline extraction from news streams aims to extract events under a certain news topic and reveal how those events evolve over time. It requires algorithms capable of accurately extracting events from news articles published in different time periods and linking these extracted events into coherent stories. The two tasks are often solved separately, which might suffer from the problem of error propagation. Existing unified approaches often consider events as topics, ignoring their structured representations. In this paper, we propose a non-parametric generative model to extract structured representations and evolution patterns of storylines simultaneously. In the model, each storyline is modelled as a joint distribution over some locations, organizations, persons, keywords and a set of topics. We further combine this model with the Chinese restaurant process so that the number of storylines can be determined automatically without human intervention. Moreover, per-token Metropolis-Hastings sampler based on light latent Dirichlet allocation is employed to reduce sampling complexity. The proposed model has been evaluated on three news corpora and the experimental results show that it outperforms several baseline approaches.