Machine Learning and Information Filtering on the Internet

Michael Pazzani

Course Description

The vast amount of information available on the Internet has given rise to a number of agents for locating relevant, useful or interesting information for an individual. Such agents perform tasks such as prioritizing, filtering, or sorting electronic mail; filtering news group articles and locating interesting articles in unread newsgroups; guiding a user to find relevant information on the World Wide Web; notifying a user when a significant change occurs to a web site or providing access to information relevant to a user's current tasks.

To perform such tasks, a profile of the user's interests must be created. In this tutorial, we will focus on the learning and representation of user profiles, the methods for collecting user feedback, and the representation of information sources. This tutorial will review a variety the findings from several decades of research on information retrieval focusing on approaches to information filtering and classification. Next, machine learning approaches to classification will be described including decision trees, nearest neighbor algorithms, Bayesian classifiers and neural networks. We will discuss how they may be used to learn user profiles. The relationship between machine learning and classic approaches from information retrieval will be discussed. Finally, recent developments such as collaborative filtering, efficient rule learners, combining multiple models, weighted majority algorithms and infinite attribute models will be described.

The technology will be illustrated with examples from a variety of information agents including LIRA, NewsWeeder, WebWatcher, WebDoggie, InfoFinder, Inquery, Letizia, firefly, InfoFinder, Syskill & Webert, DICA and the Remembrance Agent.

Prerequisite Knowledge

The intended audience of this tutorial is practitioners and researchers interested in issues involved with applying machine learning and information retrieval algorithms to classification and ranking of information on the Internet. A familiarity with basic knowledge of mathematics and probability will be assumed.

About the Lecturers

Michael Pazzani received an M.S. degree in computer science specializing in Natural Language Processing in 1980, and a Ph.D. in computer science specializing in Machine Learning from UCLA in 1987. He is now a professor and department chair of Information and Computer Science at the University of California, Irvine. He has been active in Machine Learning research for the past decade with numerous publications in the IJCAI, AAAI, Cognitive Science and the International Machine Learning Conferences.
Last modified: Thu Feb 20 13:18:52 JST 1997