Constructing Career Histories: A Case Study in Disentangling the Threads

Paul R. Cohen

We present an algorithm for organizing partially-ordered observations into multiple "threads," some of which may be concurrent., The algorithm is applied to the problem of constructing career histories for individual scientists from the abstracts of published papers. Because abstracts generally do not provide rich information about the contents of papers, we developed a novel relational method for judging the similarity of papers. We report four experiments that demonstrate the advantage of this method over the traditional Dice and Tanimoto coefficients, and that evaluate the quality of induced multi-thread career histories.