Date: | Topics: | Handouts: | Reading: | Classwork & Quiz Topics: | |
#1
Tues 30 January |
First Day Details; What is Data Science; Introduction to Python: printing, variables; plotting with Pandas | Syllabus,
DS venn diagram, Gallery: NY density, nearest airport, precincts, citibike, buses vs. subways, ebola, disease Data Science Process |
Academic Integrity Policy, Think CS: Chapter 1 & Chapter 2 Online Stats:Variables &Line graphs |
Academic Integrity | |
#2
Thurs 1 February |
Introduction to plotting and Pandas; types of statistical variables; | Lab 1 | Online Stats:Variables | Groups:variables in statistics | |
#3
Tues 6 February |
More plotting and columnn operations | Lab 2 | Groups: | ||
#4
Thurs 8 February |
Histograms; Mean, median, mode | Lab 3 | Online Stats: Median and mean Non-technical overview: mean, median, mode |
Quiz: printing, variables, plotting, types of variables | |
Mon 12 February | Lincoln's Birthday - Lehman is closed | ||||
#5
Tues 13 February |
Variance and boxplots | Lab 4, lab4.py Anscombe's quartet |
Online Stats: percentiles, boxplots, variance | Quiz: printing and (computer) variables, lab 2, types of statistical variables | |
#6
Thurs 15 February |
Sample and Population Means and Variances | Lab 5 | Online Stats: http://onlinestatbook.com/2/summarizing_distributions/variability.html | Quiz: lab 3, variance, review | |
19 February | President's Day - Lehman is closed | ||||
20 February | Classes follow Monday schedule | ||||
#7
Thurs 22 February |
Hypotheses, selecting rows in a dataframe | Lab 6 | Selecting pandas Dataframe rows based on conditions | Classwork: making and checking a hypothesis | |
#8
Tues 27 February |
Introduction to probability, probability mass functions, sample vs. distribution | Lab 7 Code | Think CS: Generating random numbers Online stats: Introduction to Probability |
Quiz: Lab 6 and review | |
#9
Thurs 1 March |
Computing probabilities, bar plots, counting unique values of data | Lab 8 | Online stats: Probability Basic Concepts, Bar charts |
Quiz: boxplots and review | |
#10
Tues 6 March |
Probability density distributions, uniform distribution, estimating probabilities continued | Lab 9 Code | Paper quiz: Homework 1-8; 1 sheet of paper (8" x 11") with handwritten notes on both sides is allowed | ||
#11
Thurs 8 March |
Normal distribution, data and time in pandas | Lab 10 Code (normal distribution), Lab 10 Code (rodent complaints) |
Online stats: Normal distribution Visualizing normal distributions |
Quiz: probabilities and review | |
#12
Tues 13 March |
Central Limit Theorem in action | Lab 11 Lab 11 code from class |
Visualizing the Central Limit Theorem 1 Visualizing the Central Limit Theorem 2 Online stats: Introduction to sampling distributions Sampling distribution of the mean |
Quiz: Lab 9 and review | |
#13
Thurs 15 March |
Review | IMDb dataset Lab 12 code from class |
|||
#14
Tues 20 March |
Confidence Intervals | Lab 13 code from class | Online stats: confidence intervals | Paper quiz: Homework 9-16; 1 sheet of paper (8" x 11") with handwritten notes on both sides is allowed | |
#15
Thurs 22 March |
Correllation and causation, scatter plots, heatmaps | Labor market data, Lab 14 Code |
Spurious Correlation Correlation Guessing Game Online stats: correlation 1 correlation 2 |
||
#16
Tues 27 March |
Regression: Simple Linear Regression | Labor market data Lab 15 Code |
Introduction to Linear Regression Online stats: Introduction to Linear Regression Think stats: Statsmodel A more comprehensive example of using linear regression |
Quiz: Central Limit Theorem and review | |
#17
Thurs 29 March |
Regression continued: Rsquared and Multiple Linear Regression | Lab 16 (partial answers) Lab 16 code from class |
Online stats: Multiple linear regression, R-squared Picture illustrating R-squared in section 6 |
Classwork: Multiple Linear Regression | |
30 March - 8 April | Spring recess: no classes | ||||
#18
Tues 10 April |
Introduction to hypothesis testing | Lab 17 code from class Background for lab at top - the code is different |
Online stats:Introduction to Hypothesis Testing | Paper quiz: Homework 17-24; 1 sheet of paper (8" x 11") with handwritten notes on both sides is allowed | |
11 April | Classes follow Friday schedule | ||||
#19
Thurs 12 April |
Hypothesis testing | Lab 18, Code from class | Steps for hypothesis testing | Classwork | |
16 April | Last day to withdraw from class with a grade of W | ||||
#20
Tues 17 April |
Hypothesis testing continued | Lab 18, Lab 18 Details Code from class |
Classwork: hypothesis testing | ||
#21
Thurs 19 April |
Introduction to R; vectors; plotting in R | DataCamp Introduction to R Lab 19 |
Try R | Paper quiz: Estimating probabilities; 1 sheet of paper (8" x 11") with handwritten notes on both sides is allowed | |
#22
Tues 24 April |
Dataframes in R | DataCamp Introduction to R Lab 20 |
Try R | ||
#23
Thurs 26 April |
ggplot2 - fancy plotting in R | Buzzfeed's cleaned FBI NICS Firearm Background Check Data Lab 21 code from class ggplot2 cheatsheet NY Times article based on NICS data |
DataCamp Introduction to ggplot2 | Paper quiz: Confidence intervals, correlation, heatmaps; 1 sheet of paper (8" x 11") with handwritten notes on both sides is allowed | |
#24
Tues 1 May |
Intro to Machine Learning: understanding the data | Kaggle: Titanic: Machine Learning from Disaster
titanic_train.csv titanic_test.csv |
Classwork: understanding the titanic dataset | ||
#25
Thurs 3 May |
Guest talk by Violet Fredericks, continuation to machine learning | Titanic tutorial part 1 Titanic tutorial part 2 Titanic tutorial part 3 Code from class |
|||
#26
Tues 8 May |
Continuation of machine learning introduction | Code from class | |||
#27
Thurs 10 May |
Review | Exam review 1 | |||
#28
Tues 15 May |
Review | Sample Final Exam (answers) original NBA dataset (see blackboard for how to clean) |
|||
Thurs 24 May | Final exam 1:30pm - 3:30pm, Gillet 231 |