The DNA Word Design Problem: A New Constraint Model and New Results

The DNA Word Design Problem: A New Constraint Model and New Results

Michael Codish, Michael Frank, Vitaly Lagoon

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 585-591. https://doi.org/10.24963/ijcai.2017/82

A fundamental problem in coding theory concerns the computation of the maximum cardinality of a set S of length n code words over an alphabet of size q, such that every pair of code words has Hamming distance at least d, and the set of additional constraints U on S is satisfied. This problem has application in several areas, one of which is the design of DNA codes where q=4 and the alphabet is {A,C,G,T}. We describe a new constraint model for this problem and demonstrate that it improves on previous solutions (computes better lower bounds) for various instances of the problem. Our approach is based on a clustering of DNA words into small sets of words. Solutions are then obtained as the union of such clusters. Our approach is SAT based: we specify constraints on clusters of DNA words and solve these using a Boolean satisfiability solver.
Keywords:
Constraints and Satisfiability: Constraint Satisfaction
Multidisciplinary Topics and Applications: Computational Biology and e-Health
Constraints and Satisfiability: Constraints and Satisfiability