Ranking Structured Documents: A Large Margin Based Approach for Patent Prior Art Search

We propose an approach for automatically ranking structured documents applied to patent prior art search. Our model, SVM Patent Ranking (SVM_PR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgements in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVM_PR performs on average 30%-40% better than many other state-of-the-art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.

Yunsong Guo, Carla Gomes