An Introduction to Statistical Data Mining

Ranjan Maitra

Department of Mathematics and Statistics
University of Maryland, Baltimore County

Time : Series of three lectures. At 327 Yost Hall
Tuesday, Feb. 27, 4:00-5:00.
Thursday, Mar. 1, 11:30-12:30
Friday, Mar 2, 4:00-5:00

The topics of Data Mining and Knowledge Discovery in Databases has gained a lot of prominence with automated methods of data-collection in this information age. Broadly speaking, data mining is the extraction of useful information from large amounts of data, often collected without any pre-defined purpose in mind. Most of the present-day applications are commercial even though other areas exist. Some examples including discerning customer preferences based on transactions data for better store layout as well as targeted advertising, clustering software metrics databases to develop automated techniques for determining procedures that need to be upgraded together, deciding of related interest to a person who has entered the query "car" in a search engine as well as scheduling classes to minimize commuting students' discomfort. Algorithms used in data mining are both data- as well as computer-intensive. Because the underlying database is observational in nature, statistical techniques play a natural role. The lecture series will focus on basic needs of data mining, available statistical methodology as well as areas requiring further attention. Applications will be highlighted throughout the series.

The basic outline for the three-part series is:

  1. Why data mining, automated collection of data, data warehousing, examples and applications. Market-Basket Analysis, Link Analysis and Graphical Representations.
  2. Classification and Clustering
  3. Artificial Neural Networks, Latent Semantic Indexing,Online Automated Processing.

Questions? Nidhan Choudhuri