WikiWrite: Generating Wikipedia Articles Automatically / 2740
Siddhartha Banerjee, Prasenjit Mitra
The growth of Wikipedia, limited by the availability of knowledgeable authors, cannot keep pace with the ever increasing requirements and demands of the readers. In this work, we propose WikiWrite, a system capable of generating content for new Wikipedia articles automatically. First, our technique obtains feature representations of entities on Wikipedia. We adapt an existing work on document embeddings to obtain vector representations of words and paragraphs. Using the representations, we identify articles that are very similar to the new entity on Wikipedia. We train machine learning classifiers using content from the similar articles to assign web retrieved content on the new entity into relevant sections in the Wikipedia article. Second, we propose a novel abstractive summarization technique that uses a two-step integer-linear programming (ILP) model to synthesize the assigned content in each section and rewrite the content to produce a well-formed informative summary. Our experiments show that our technique is able to reconstruct existing articles in Wikipedia with high accuracies. We also create several articles using our approach in the English Wikipedia, most of which have been retained in the online encyclopedia.