Generating all Possible Palindromes from Ngram Corpora / 2489
Alexandre Papadopoulos, Pierre Roy, Jean-Charles Régin, François Pachet
We address the problem of generating all possible palindromes from a corpus of Ngrams. Palindromes are texts that read the same both ways. Short palindromes ("race car") usually carry precise, significant meanings. Long palindromes are often less meaningful, but even harder to generate. The palindrome generation problem has never been addressed, to our knowledge, from a strictly combinatorial point of view. The main difficulty is that generating palindromes require the simultaneous consideration of two inter-related levels in a sequence: the "character" and the "word" levels. Although the problem seems very combinatorial, we propose an elegant yet non-trivial graph structure that can be used to generate all possible palindromes from a given corpus of Ngrams, with a linear complexity. We illustrate our approach with short and long palindromes obtained from the Google Ngram corpus. We show how we can control the semantics, to some extent, by using arbitrary text corpora to bias the probabilities of certain sets of words. More generally this work addresses the issue of modelling human virtuosity from a combinatorial viewpoint, as a means to understand human creativity.