Date: |
Topics: |
Lab and Handouts: |
Reading and References: |
Project deadline: |
#1
Tues 27 August
|
Syllabus; review of loading CSV files, processing data, plots, and filtering
|
Syllabus
weather data (use KNYC.csv)
Package installation commands
Pre-class, empty: Lab 1 - Review of dates and plots
Pre-class, completed: Lab 1 - Review of dates and plots
From class: Lab 1 - Review of dates and plots
|
Academic Integrity Policy
FiveThirtyEight article using the weather data
Timedate components
|
|
#2
Thurs 29 August
|
Review: bar charts, filtering |
Lab 2 - Reivew of bar charts and filtering
From class: Lab 2 - Reivew of bar charts and filtering
|
Bar chart in Pandas
Filtering in Pandas
Condensed filtering examples
|
|
Mon 2 September |
CUNY: No classes (Labor Day) |
#3
Tues 3 Sept |
Groupby, Seaborn plots |
Lab 3 - groupby and more plots
From class: Lab 3 - groupby and more plots |
Pandas groupby tutorial (intro)
Pandas groupby tutorial (medium)
Pandas' user guide to groupby (detailed)
Seaborn package: gallery and tutorials
Another Seaborn tutorial
|
|
Thurs 5 Sept |
Classes follow a Monday schedule |
#4
Tues 10 Sept
|
Normal and exponential probability distributions |
Lab 4 - Probability distributions
From class: Lab 4 - Probability distributions
babyboom.dat.txt
|
Sampling with numpy
Introduction to Normal distribution
Normal distribution in Scipy
Exponential distribution in Scipy
|
|
#5
Thurs 12 Sept
|
Non-parametric distributions, confidence intervals, bootstrap, comparing means |
Lab 5 - Non-parametric distributions and bootstrap
From class: Lab 5 - Non-parametric distributions and bootstrap
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
Introduction to GitHub
|
Parametric vs. non-parametric data (first two sections)
Parametric and non-parametric bootstrap (starting at section "The notion of a Sampling Distribution")
Central Limit Theorem
|
Milestone 1: find dataset |
#6
Tues 17 Sept
|
Review of linear regression |
Introduction to GitHub
Lab 6 - Review of linear regression
From class: Lab 6 - Review of linear regression
|
Another tutorial on linear regression using the Boston housing data
Online stats book: linear regression |
|
#7
Thurs 19 Sept |
Linear regression continued: r-squared, predictions, dummy variables |
Lab 7 - Linear regression continued (empty)
From class: Lab 7 - Linear regression continued |
Introduction to Linear regression tutorial
More detailed introduction to linear regression
Insurance data set on Kaggle (click on kernels to see how others have analyzed it)
|
Milestone 2: GitHub account and upload data |
#8
Tues 24 Sept
|
Linear regression continued: more on dummy variables, mean square error, validation |
Lab 8 - Mean Squared Error and validation
From class: Lab 8 - Mean Squared Error and validation
insurance.csv |
Dummy variables
Training and test data, cross-validation in Python
|
|
#9
Thurs 26 Sept |
Overfitting, underfitting, cross-validation |
Lab 9 - Overfitting and underfitting, fitting polynomials, k-fold cross validation
From class: Lab 9
|
Anscombe's Quartet
Over- and under-fitting, cross-validation in Python |
Milestone 3: webpage and data description |
30 September - 1 October |
CUNY: No classes |
#10
Thurs 3 October
|
Logistic Regression |
Lab 9b - 2-fold cross validation
From class: Lab 9b - 2-fold cross validation
Lab 10 - Logistic Regression
From class: Lab 10 - Logistic Regression
|
Logistic regression tutorial |
Milestone 4: missing data and column distributions |
8-9 October |
CUNY: No classes |
#11
Thurs 10 October |
Logistic Regression Continued: multiple indpendent variables, accuracy and precision |
Lab 11 - Logistic Regression Continued
From class: Lab 11
Lab 11b - Classwork
|
Precision, recall, sensitivity, specificity
Methods for evaluating binary classification
|
Milestone 5: outliers and multi-column relationships |
#12
Tues 15 October |
Decision trees: Classification |
Lab 12 - Decision trees
From class: Lab 12
|
A visual introduction to machine learning via decision trees
Introduction to decision trees in sci-kit learn
Sklearn: decision trees
|
|
#13
Thurs 17 October |
Decision trees: Regression |
Lab 13 - Decision trees for regression
From class: Lab 13 |
Detailed explanation of decision trees
Another detailed explanation of decision trees
Gini impurity
|
|
#14
Tues 22 October
|
Choropleth maps |
Lab 14 - Mapping data
nyc_school_districts.json
math_district.csv
|
Folium tutorial |
|
#15
Thurs 24 October |
Review for midterm |
|
|
Milestone 6: Linear or logistic regression |
#16
Tues 29 October
|
Midterm |
|
|
|
#17
Thurs 31 October |
Cross tabulation (contingency tables) and more probability |
Lab 17 - Cross tabulation |
Cross tabulation in Pandas
|
|
5 November |
Last day to withdraw from class with a
grade of W |
#18
Tues 5 November
|
Introduction to vectors and distances |
|
|
|
#19
Thurs 7 November
|
K-nearest neighbors |
Lab 19 - k-nearest neighbors
From class: Lab 19 - k-nearest neighbors |
k-nearest neighbors using sci-kit learn
k-nearest neighbors concept |
|
#20
Tues 12 November |
Hierarchical clustering |
Lab 20 - Hierarchical clustering
From class: Lab 20 - Hierarchical clustering
Labor market data
|
Hierarchical clustering
Sci-kit learn: hierarchical clustering |
|
#21
Thurs 14 November |
k-means clustering |
Lab 21 - k-means clustering
From class: Lab 21 - k-means clustering
|
Interactive visualization of k-means clustering
Another interactive visualization of k-means clustering
Visualization of k-means clustering algorithm
k-means clustering in depth
Limitations of k-means clustering
images of the digits
|
Milestone 7: Decision trees |
#22
Tues 19 November
|
Determining the number of clusters: elbow method and silhouette score |
Lab 22 - Determining the number of clusters
From class: Lab 22 - Determining the number of clusters
Starbucks dataset
|
Estimating k with the elbow method
Silhouette analysis
|
|
#23
Thurs 21 November |
Principal Components Analysis |
Lab 23 - Silhouette Score revisited and Principal Components Analysis
From class: Lab 23 - Silhouette Score revisited and Principal Components Analysis |
Sklearn: Selecting the number of clusters with silhouette analysis
PCA Explained Visually |
Milestone 8: k-nearest neighbors |
#24
Tues 26 November
|
PCA continued |
Lab 24 - Simulated clusters
From class: Lab 24 - Simulated clusters
|
|
|
28 November - 1 December |
Thanksgiving Recess: College Closed |
#25
Tues 3 December |
More Hypothesis testing: Testing with multiple categories |
Lab 25 - Hypothesis testing with multiple categories
From class: Lab 25 - Hypothesis testing with multiple categories
Lab 25 - Part 2
From class: Lab 25 - Part 2
Mar3_4_2019_311_Service_Requests.csv
|
Hypothesis testing for multiple categories
Step in hypothesis testing |
|
#26
Thurs 5 December
|
Hypothesis testing: Testing means of groups with permutation testing |
Lab 26 - Permutation tests
From class: Lab 26 - Permutation tests |
Hypothesis testing to compare two samples |
Milestone 9: your choice |
#27
Tues 10 December
|
Project presentations |
|
|
|
#28
Thurs 12 December
|
Review for final exam |
|
|
|
Tues 17 December |
Final exam 3:45pm - 5:45pm, Gillet 231 |