Date: 
Topics: 
Lab and Handouts: 
Reading and References: 
Project deadline: 
#1
Tues 27 August

Syllabus; review of loading CSV files, processing data, plots, and filtering

Syllabus
weather data (use KNYC.csv)
Package installation commands
Preclass, empty: Lab 1  Review of dates and plots
Preclass, completed: Lab 1  Review of dates and plots
From class: Lab 1  Review of dates and plots

Academic Integrity Policy
FiveThirtyEight article using the weather data
Timedate components


#2
Thurs 29 August

Review: bar charts, filtering 
Lab 2  Reivew of bar charts and filtering
From class: Lab 2  Reivew of bar charts and filtering

Bar chart in Pandas
Filtering in Pandas
Condensed filtering examples


Mon 2 September 
CUNY: No classes (Labor Day) 
#3
Tues 3 Sept 
Groupby, Seaborn plots 
Lab 3  groupby and more plots
From class: Lab 3  groupby and more plots 
Pandas groupby tutorial (intro)
Pandas groupby tutorial (medium)
Pandas' user guide to groupby (detailed)
Seaborn package: gallery and tutorials
Another Seaborn tutorial


Thurs 5 Sept 
Classes follow a Monday schedule 
#4
Tues 10 Sept

Normal and exponential probability distributions 
Lab 4  Probability distributions
From class: Lab 4  Probability distributions
babyboom.dat.txt

Sampling with numpy
Introduction to Normal distribution
Normal distribution in Scipy
Exponential distribution in Scipy


#5
Thurs 12 Sept

Nonparametric distributions, confidence intervals, bootstrap, comparing means 
Lab 5  Nonparametric distributions and bootstrap
From class: Lab 5  Nonparametric distributions and bootstrap
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
Introduction to GitHub

Parametric vs. nonparametric data (first two sections)
Parametric and nonparametric bootstrap (starting at section "The notion of a Sampling Distribution")
Central Limit Theorem

Milestone 1: find dataset 
#6
Tues 17 Sept

Review of linear regression 
Introduction to GitHub
Lab 6  Review of linear regression
From class: Lab 6  Review of linear regression

Another tutorial on linear regression using the Boston housing data
Online stats book: linear regression 

#7
Thurs 19 Sept 
Linear regression continued: rsquared, predictions, dummy variables 
Lab 7  Linear regression continued (empty)
From class: Lab 7  Linear regression continued 
Introduction to Linear regression tutorial
More detailed introduction to linear regression
Insurance data set on Kaggle (click on kernels to see how others have analyzed it)

Milestone 2: GitHub account and upload data 
#8
Tues 24 Sept

Linear regression continued: more on dummy variables, mean square error, validation 
Lab 8  Mean Squared Error and validation
From class: Lab 8  Mean Squared Error and validation
insurance.csv 
Dummy variables
Training and test data, crossvalidation in Python


#9
Thurs 26 Sept 
Overfitting, underfitting, crossvalidation 
Lab 9  Overfitting and underfitting, fitting polynomials, kfold cross validation
From class: Lab 9

Anscombe's Quartet
Over and underfitting, crossvalidation in Python 
Milestone 3: webpage and data description 
30 September  1 October 
CUNY: No classes 
#10
Thurs 3 October

Logistic Regression 
Lab 9b  2fold cross validation
From class: Lab 9b  2fold cross validation
Lab 10  Logistic Regression
From class: Lab 10  Logistic Regression

Logistic regression tutorial 
Milestone 4: missing data and column distributions 
89 October 
CUNY: No classes 
#11
Thurs 10 October 
Logistic Regression Continued: multiple indpendent variables, accuracy and precision 
Lab 11  Logistic Regression Continued
From class: Lab 11
Lab 11b  Classwork

Precision, recall, sensitivity, specificity
Methods for evaluating binary classification

Milestone 5: outliers and multicolumn relationships 
#12
Tues 15 October 
Decision trees: Classification 
Lab 12  Decision trees
From class: Lab 12

A visual introduction to machine learning via decision trees
Introduction to decision trees in scikit learn
Sklearn: decision trees


#13
Thurs 17 October 
Decision trees: Regression 
Lab 13  Decision trees for regression
From class: Lab 13 
Detailed explanation of decision trees
Another detailed explanation of decision trees
Gini impurity


#14
Tues 22 October

Choropleth maps 
Lab 14  Mapping data
nyc_school_districts.json
math_district.csv

Folium tutorial 

#15
Thurs 24 October 
Review for midterm 


Milestone 6: Linear or logistic regression 
#16
Tues 29 October

Midterm 



#17
Thurs 31 October 
Cross tabulation (contingency tables) and more probability 
Lab 17  Cross tabulation 
Cross tabulation in Pandas


5 November 
Last day to withdraw from class with a
grade of W 
#18
Tues 5 November

Introduction to vectors and distances 



#19
Thurs 7 November

Knearest neighbors 
Lab 19  knearest neighbors
From class: Lab 19  knearest neighbors 
knearest neighbors using scikit learn
knearest neighbors concept 

#20
Tues 12 November 
Hierarchical clustering 
Lab 20  Hierarchical clustering
From class: Lab 20  Hierarchical clustering
Labor market data

Hierarchical clustering
Scikit learn: hierarchical clustering 

#21
Thurs 14 November 
kmeans clustering 
Lab 21  kmeans clustering
From class: Lab 21  kmeans clustering

Interactive visualization of kmeans clustering
Another interactive visualization of kmeans clustering
Visualization of kmeans clustering algorithm
kmeans clustering in depth
Limitations of kmeans clustering
images of the digits

Milestone 7: Decision trees 
#22
Tues 19 November

Determining the number of clusters: elbow method and silhouette score 
Lab 22  Determining the number of clusters
From class: Lab 22  Determining the number of clusters
Starbucks dataset

Estimating k with the elbow method
Silhouette analysis


#23
Thurs 21 November 
Principal Components Analysis 
Lab 23  Silhouette Score revisited and Principal Components Analysis
From class: Lab 23  Silhouette Score revisited and Principal Components Analysis 
Sklearn: Selecting the number of clusters with silhouette analysis
PCA Explained Visually 
Milestone 8: knearest neighbors 
#24
Tues 26 November

PCA continued 
Lab 24  Simulated clusters
From class: Lab 24  Simulated clusters



28 November  1 December 
Thanksgiving Recess: College Closed 
#25
Tues 3 December 
More Hypothesis testing: Testing with multiple categories 
Lab 25  Hypothesis testing with multiple categories
From class: Lab 25  Hypothesis testing with multiple categories
Lab 25  Part 2
From class: Lab 25  Part 2
Mar3_4_2019_311_Service_Requests.csv

Hypothesis testing for multiple categories
Step in hypothesis testing 

#26
Thurs 5 December

Hypothesis testing: Testing means of groups with permutation testing 
Lab 26  Permutation tests
From class: Lab 26  Permutation tests 
Hypothesis testing to compare two samples 
Milestone 9: your choice 
#27
Tues 10 December

Project presentations 



#28
Thurs 12 December

Review for final exam 



Tues 17 December 
Final exam 3:45pm  5:45pm, Gillet 231 