DATA 401

Statistical Machine Learning

4 Undergraduate credits
Effective December 17, 2018 – Present

Graduation requirements this course fulfills

Statistical machine learning (often referred to simply as statistical learning) has arisen as a recent subfield of statistics. It emphasizes the interpretability, precision, and uncertainty of machine learning models. This course assesses the accuracy of several supervised and unsupervised machine learning models for both regression and classification. Topics include the bias-variance trade-off, training and test datasets, resampling methods, shrinkage and dimension reduction methods, non-linear modeling techniques such as regression splines and generalized additive models, and decision tree-based methods. Applications include examples from medicine, biology, marketing, finance, insurance, and sports.

Learning outcomes


  • Assess the accuracy of both supervised and unsupervised machine learning models.
  • Explain the tradeoff between flexibility and interpretability of several machine learning methods.
  • Describe the competing properties of bias and variance of statistical learning methods.
  • Identify and apply the appropriate statistical learning model for analyzing a dataset using statistical software.
  • Interpret and understand the computer output for a statistical learning analysis.
  • Document and articulate the results and conclusions for statistical learning techniques applied to actual data in a variety of disciplines.