Data Mining and Knowledge Discovery in Databases

Usama Fayyad and Evangelos Simoudis

Course Description

Knowledge Discovery in Databases (KDD) is a rapidly growing AI field that combines techniques from machine learning, pattern recognition, statistics, databases, and visualization to automatically extract knowledge (or information) from lower level data (databases). This knowledge is subsequently used to support human decision-making, e.g., prediction and classification tasks, summarize the contents of databases, or explain observed phenomena. The use of KDD systems enables decision makers to automatically analyze the large and complex data sets collected today without requiring detailed prior knowledge about the data. Successful KDD systems have been implemented and are currently in use in financial modeling, fraud detection, market data analysis, astronomy, diagnosis, manufacturing, and biology.

This tutorial presents a comprehensive picture of current research paradigms in the field of KDD and examples from the state of practice. The tutorial provides an introduction to KDD, defines the basic terms and the relation between data mining and the KDD process, presents methods for data preparation and preprocessing, describes major data mining techniques from the fields of AI, pattern recognition, databases, and visualization, discusses major KDD systems from academia and industry, and provides a guide for developing a KDD system. In the process, the tutorial addresses such issues as role played by the various steps in the KDD process, e.g., sampling, data selection, projection and dimensionality reduction, extraction of patterns and models, and the use of extracted knowledge in decision- making.

Prerequisite Knowledge

There are no pre-requisites for this tutorial other than familiarity with basic concepts in AI.

About the Lecturers

Usama Fayyad is a Senior Researcher at Microsoft Research, a Distinguished Visiting Scientist at the Jet Propulsion Laboratory, Caltech, and an adjunct professor of computer science at University of Southern California. Prior to joining Microsoft he headed the Machine Learning Systems Group at JPL. He received his Ph.D. in Computer Science (1991) from the University of Michigan, Ann Arbor. He was program cochairman of KDD-94 and KDD-95, general chair of KDD-96, and Editor-in- Chief of the Journal of Knowledge Discovery and Data Mining.

Evangelos Simoudis is Vice President of Decision Support Solutions at IBM and an adjunct professor of computer engineering at the Santa Clara University. Prior to joining IBM, Dr. Simoudis led the development and market introduction of the Recon data mining system, and led research on knowledge discovery in databases, and machine learning. Dr. Simoudis received his Ph.D. in Computer Science from Brandeis University. He is Editor-in-Chief of the Artificial Intelligence Review, and has served as Program cochairman of KDD-96.

Last modified: Thu Feb 20 13:26:33 JST 1997