Abstract

Efficient Dominant Point Algorithms for the Multiple Longest Common Subsequence (MLCS) Problem

Finding the longest common subsequence of multiple strings is a classical computer science problem and has many applications in the areas of bioinformatics and computational genomics. In this paper, we present a new sequential algorithm for the general case of MLCS problem, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divide-and-conquer technique to compute the dominant points. When applied to find a MLCS of 3 strings, our general algorithm is shown to exhibit the same performance as the best existing MLCS algorithm by Hakata and Imai, designed specifically for the case of 3 strings. Moreover, we show that for a general case of more than 3 strings, the algorithm is significantly faster than the best existing sequential approaches, reaching up to 2-3 orders of magnitude faster on the large-size problems. Finally, we propose a parallel implementation of the algorithm. Evaluating the parallel algorithm on a benchmark set of both random and biological sequences reveals a near-linear speed-up with respect to the sequential algorithm.

Qingguo Wang, Dmitry Korkin, Yi Shang