Learning Hostname Preference to Enhance Search Relevance / 3903
Jingjing Wang, Changsung Kang, Yi Chang, Jiawei Han
Hostnames such as en.wikipedia.org and www.amazon.com are strong indicators of the content they host. The relevant hostnames for a query can be a signature that captures the query intent. In this study, we learn the hostname preference of queries, which are further utilized to enhance search relevance. Implicit and explicit query intent are modeled simultaneously by a feature aware matrix completion framework. A block-wise parallel algorithm was developed on top of the Spark MLlib for fast optimization of feature aware matrix completion. The optimization completes within minutes at the scale of a million x million matrix, which enables efficient experimental studies at the web scale. Evaluation of the learned hostname preference is performed both intrinsically on test errors, and extrinsically on the impact on search ranking relevance. Experimental results demonstrate that capturing hostname preference can significantly boost the retrieval performance.