Date: | Topics: | Lab and Handouts: | Reading: | Classwork & Quiz Topics: | |
#1
Tues 29 January |
Syllabus; What is Data Science; Introduction to Python (printing and variables); line plots with Pandas | Syllabus Citi Bike data example Data Science Process Lab 1 (Jupyter notebook) nycHistPop.csv |
Academic Integrity Policy, 3.1,3.2,3.3 Variables Line graphs |
||
#2
Thurs 31 January |
Statistical varaibles; proportions; column operations | Lab 2 - Plotting NYC's shelter population | Statistical variables | Academic Integrity | |
#3
Tues 5 February |
Bar charts; statistical variables cont'd | Lab 3 - Bar charts Feb5_2017_Green_Taxi_Trip_Data.csv |
7.1 | Groups:variables in statistics | |
#4
Thurs 7 February |
Histograms | Lab 4 - Histograms | 7.2, Histograms | Quiz: Lab 1 | |
Tues 12 February | Lincoln's Birthday - Lehman is closed | ||||
#5
Thurs 14 February |
Mean, median, and mode; filtering | Lab 5 - Mean, Median, and Mode |
Online Stats: Median and mean Non-technical overview: mean, median, mode |
Quiz: Lab 2, types of statistical variables | |
Mon 18 February | President's Day - Lehman is closed | ||||
#6
Tues 19 February |
Measures of Spread: range, variance, and standard deviation | Lab 6 - Measures of spread (Range, Variance, and Standard Deviation) | Measures of Variability | Groupwork: mean, median, and variance | |
#7
Thurs 21 February |
Behavior of sample vs. population; boxplots | Lab 7 - Samples and boxplots | 10.2 Sampling from a Population, percentiles, boxplots | Online quiz: Labs 3 and 4 | |
#8
Tues 26 February |
Introduction to probability, computing probabilities, filtering | Lab 8 - Computing probabilities | 9.5 Finding Probabilities,
Introduction to Probability, Computing probabilities Filtering |
Paper quiz: Labs 1 and 2 (assigments 1-4), statistical variables; 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed | |
#9
Thurs 28 February |
Computing probabilities with and/or, subsets of dataframes | Lab 9 - Computing probabilities 2 Feb19_2019_311_Service_Requests.csv |
9.5 Finding Probabilities, Filtering with Pandas |
Online quiz: review, Labs 5 and 6 | |
#10
Tues 5 March |
Sampling and empirical distributions, iteration | Lab 10 - Sampling Distributions, Part 1, Lab 10 - Sampling Distributions, Part 2 |
10.3 Empirical Distribution of a Statistic 9.3 Iteration Iteration with turtles |
Online quiz: reivew, Labs 7 and 8 | |
#11
Thurs 7 March |
Comparing distributions visually, data and time in pandas | Lab 11 - Comparing distributions visually | Paper quiz: Labs 3 and 4 (assigments 5-8); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed | ||
#12
Tues 12 March |
Simulating data and introduction to hypotheses | Lab 12 - Simulations and hypotheses | Classwork | ||
#13
Thurs 14 March |
Hypothesis testing of proportions | Lab 13 - Hypothesis testing of proportions | 11.1 Assessing Models Introduction to Hypothesis Test |
No quiz | |
#14
Tues 19 March |
Hypothesis testing: two samples, qualitiative data | Lab 14 - Hypothesis testing with multiple categories | 11.2 Multiple Categories Creating dataframes |
Paper quiz: Labs 5, 6, and 7 (assigments 9-14); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed | |
#15
Thurs 21 March |
Hypothesis testing: two samples, quantitative data | Lab 15 - More hypothesis testing | 11.3 Hypothesis testing steps 12.1 A/B Testing |
Classwork | |
#16
Tues 26 March |
Bootstrap and confidence intervals | Lab 16 - Bootstrap and confidence intervals | 13.1 Percentiles 13.2 The Bootstrap 13.3 Confidence Intervals Much more detail about the dataset |
Online quiz | |
#17
Thurs 28 March |
Normal distributions | Lab 17 - Normal distributions | 14.3 The SD and the Normal Distribution Online stats book: normal distributions Visualizing the normal distribution |
Paper quiz: Labs 8 and 9 (assigments 15-18); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed | |
1 April | Last day to withdraw from class with a grade of W | ||||
#18
Tues 2 April |
Central Limit Theorem | Lab 18 - Central Limit Theorem Data: starbucks-menu-nutrition-drinks.csv |
14.4 The Cental Limit Theorem 14.5 The Variability of the Sample Mean |
Classwork: Probabilities | |
#19
Thurs 4 April |
Functions and conditional statements | tree_data.csv Lab 19 code from class |
8.1 Applying a function to a column 9.1 Conditional statements |
Quiz: online on labs 12 and 13 | |
#20
Tues 9 April |
Correlation, Causation, and Heat maps | Lab 20 - Correlation, causation, and heat maps Data: Feb2019_labor_market_majors.csv |
Spurious correlations Correlation guessing game 15.1 Correlation Online stats book: intro to correlation |
Paper quiz: Labs 10 and 11 (assigments 19-22); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed | |
#21
Thurs 11 April |
Simple linear regression, checking residuals for normality | Lab 21 - Simple linear regression | 15.2 The Regression Line 15.5 Visual Diagnostics Introduction to linear regression |
Classwork | |
#22
Tues 16 April |
Multi-linear regression, R-squared, and prediction | Lab 22 - Multi-linear regression, R-squared, and prediction | 15.4 Least Squares Regression 17.6 Multiple Regression |
||
#23
Thurs 18 April |
Confidence intervals for the slope of linear regression | Lab 23 - Confidence intervals for regression UPDATED to correct issues found during class: Lab 23 - Confidence intervals for regression |
16.1 A regression model, 16.2 Inference for the true slope | Paper quiz: Labs 12 and 13 | |
19 - 28 April | Spring recess: no classes | ||||
#24
Tues 30 April |
Intro to Machine Learning: understanding the data | Kaggle: Titanic: Machine Learning from Disaster titanic_train.csv titanic_test.csv Lab 24 - Understanding the Titantic data |
Classwork | ||
#25
Thurs 2 May |
k-nearest neighbors (machine learning) | Lab 25 - k-Nearest Neighbors classifier 1 | 17 Classification 17.1 Nearest Neighbors 17.2 Training and Testing KNN classification using Scikit-Learn (Datacamp) |
Paper quiz: Labs 17 (normal distribution), 18 (Central Limit Theorem), 19 (functions), 20 (correlation and heatmaps) | |
#26
Tues 7 May |
k-nearest neighbors continued (machine learning) | Lab 26 - k-Nearest Neighbors classifier 2 Completed Lab 26 - k-Nearest Neighbors classifier 2 | |||
#27
Thurs 9 May |
The data science process revisited | Exam review 1 | Paper quiz: Labs 21, 22, 23 (linear regresesion) | ||
#28
Tues 14 May |
Review | Sample Final Exam (answers) original NBA dataset (see blackboard for how to clean) |
|||
Thurs 16 May | Final exam 1:30pm - 3:30pm, Gillet 231 |