Statistics 326/426, Multivariate Analysis and Data Mining
Spring 2008
- Instructor:
- Professor Jiayang Sun, Office: Yost 326, Phone:
368-0630
e-mail: jsun at case edu, Office Hrs: 4-4:45PM TTH (subject to change)
- Teaching Assistant:
- Mr. Peng Liu, Office: Yost 230, Phone: 368-0416
e-mail: peng.liu at case edu,
Office Hrs: 1-5PM Monday at Yost 234, Phone: 368-2656
- Other TAs/Tutors - To be updated by the dept
- Lab Administrator:
- 1:30-5:30pm on Monday and
12:15-4:15pm on Thursday at Yost 232, Phone 368-0417
- e-mail: help at stat case edu or
stats-help at case edu
- Class: 2:45-4:00PM on TTH at Yost 300
- Course Description:
- Stat 326/426
introduces classical and modern techniques for modeling, analyzing and mining
multivariate data. The general outline is:
- Introduction
- Graphical Methods for Multivariate Data
- One Multivariate Sample,
Multivariate ANOVA and Multivariate Regression
- Dimension Reduction Techniques:
Principal components, Correspondence analysis and Projection pursuit
- Classification and Clustering:
Multidimensional scaling, Discriminant and cluster analysis,
and Classification and regression trees (CART).
- Analysis of Covariance Structures/Latent Variable Models:
Principle components (revisit), Factor analysis and Covariance
structure models (time permission).
- Other Data Mining Techniques:
Nearest neighbor, support vector machines, EM algorithms, Boosting and bagging,
... (time permitting)
There will be case studies, labs and group discussions. The participation in group discussions is required. Most course information is accessible via my Web page:
sun.case.edu/~jiayang/426/
-
Prerequisite:
Stat 325/425.
- Computing:
You will be using Splus (or R) and some SAS, Xgobi/Ggobi packages
- References:
- Recommended Text:
- Johnson, R. A., Wichern, D. W. (2008),
Applied Multivariate Statistical
Analysis, Sixth edition, Prentice Hall, ISBN-10: 0131877151
- Everitt, Brian S. (2007), An R and S-Plus Companion to Multivariate Analysis,
Springer-Verlag, ISBN: 978-1-85233-882-4 (Reserved in the Kevin-Smith lib).
- Other References:
- Everitt, B. and Dunn, G. (2001), Applied Multivariate Data Analysis, Second Edition,
New York : Oxford University Press, ISBN: 0195209370
(Reserved in the Kevin-Smith lib).
(Here are the Data Sets.)
- Hardle, W. and Simir, L. (2007), Applied Multivariate Statistical
Analysis, Springer, ISBN: 978-3-540-72243-4
- Bishop, C.M. (2006), Pattern Recognition and Machine Learning,
Springer-Verlag,
ISBN: 978-0-387-31073-2 (Reserved in the Kevin-Smith lib).
- Hastie, T., Tibshirani, R. and Friedman J. (2001), The Elements of Statistical Learning: data mining, inference and prediction, Springer- Verlag
(Reserved in the Kevin-Smith lib).
- Breiman, Friedman, Olshen and Stone (1984),
Classification and regression trees,
Second Edition, The Wadsworth statistics/probability series, ISBN: 053498054 (Reserved in the Kevin-Smith lib.)
- Sun, J. (1998), Projection Pursuit, Encyclopedia of Statistical
Sciences (updated volumes), Vol. 2, pp 554-560, Wiley, (Edited by: Samuel Kotz, Campbell Read, David
Banks, and Norman Johnson.)
- Hair, Black, Babin, Anderson and Tatham (2006), Multivariate Data
Analysis, Sixth Edition, Prentice Hall, ISBN-10: 0130329290.
- Some S/Splus/R, SAS and E-books.
- Final Examination: 12:30-3:30PM on May 1, 2008
- Grading Policy:
-
- Undergraduate:
- The assessment is based on homework assignments (50%)
and a
final examination (50%).
- Graduate:
-
The assessment is based on
homework assignments (45%),
an oral presentation (10%)
and a final examination (45%).
A graduate student's presentation can be based on his/her ongoing research project that uses or needs to use some multivariate data analysis, or on an interesting topic/article approved by the instructor. As a part of HW, the graduate students will also write a personal
``handbook''
defining goals and outlining a structured approach for data analyses.
- Notes:
No late homework!
Minimum score to pass the course is 60, out of a 100 scale.