(Created with wordle with text from wiki)
Instructor: Prof. Megan Owen (she/her)
E-mail: megan.owen@lehman.cuny.edu
Office: Gillet 137E
Office hours: 2:40 - 4pm on Tuesdays and Thursdays (Gillet 231 or 137E), or by appointment

Course time: 1:00pm - 2:40pm on Tuesdays and Thursdays, Gillet 231

Student mentor: Josephine (Jo) Bagcal
E-mail: josephine.bagcal@lc.cuny.edu

Syllabus

Python and Jupyter Hub:

Textbooks:

Assignments: see Blackboard

In-class Quizzes: see Blackboard

Outline:

Date: Topics: Lab and Handouts: Reading: Classwork & Quiz Topics:
#1
Tues 29 January
Syllabus; What is Data Science; Introduction to Python (printing and variables); line plots with Pandas Syllabus
Citi Bike data example
Data Science Process

Lab 1 (Jupyter notebook)
nycHistPop.csv
Academic Integrity Policy,
3.1,3.2,3.3
Variables
Line graphs
Academic Integrity
#2
Thurs 31 January
Statistical varaibles; proportions; column operations Lab 2 - Plotting NYC's shelter population Statistical variables Academic Integrity
#3
Tues 5 February
Bar charts; statistical variables cont'd Lab 3 - Bar charts
Feb5_2017_Green_Taxi_Trip_Data.csv
7.1 Groups:variables in statistics
#4
Thurs 7 February
Histograms Lab 4 - Histograms 7.2, Histograms Quiz: Lab 1
Tues 12 February Lincoln's Birthday - Lehman is closed
#5
Thurs 14 February
Mean, median, and mode; filtering Lab 5 - Mean, Median, and Mode Online Stats: Median and mean
Non-technical overview: mean, median, mode
Quiz: Lab 2, types of statistical variables
Mon 18 February President's Day - Lehman is closed
#6
Tues 19 February
Measures of Spread: range, variance, and standard deviation Lab 6 - Measures of spread (Range, Variance, and Standard Deviation) Measures of Variability Groupwork: mean, median, and variance
#7
Thurs 21 February
Behavior of sample vs. population; boxplots Lab 7 - Samples and boxplots 10.2 Sampling from a Population, percentiles, boxplots Online quiz: Labs 3 and 4
#8
Tues 26 February
Introduction to probability, computing probabilities, filtering Lab 8 - Computing probabilities 9.5 Finding Probabilities, Introduction to Probability, Computing probabilities
Filtering
Paper quiz: Labs 1 and 2 (assigments 1-4), statistical variables; 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
#9
Thurs 28 February
Computing probabilities with and/or, subsets of dataframes Lab 9 - Computing probabilities 2
Feb19_2019_311_Service_Requests.csv
9.5 Finding Probabilities, Filtering with Pandas
Online quiz: review, Labs 5 and 6
#10
Tues 5 March
Sampling and empirical distributions, iteration Lab 10 - Sampling Distributions, Part 1,
Lab 10 - Sampling Distributions, Part 2
10.3 Empirical Distribution of a Statistic
9.3 Iteration
Iteration with turtles
Online quiz: reivew, Labs 7 and 8
#11
Thurs 7 March
Comparing distributions visually, data and time in pandas Lab 11 - Comparing distributions visually Paper quiz: Labs 3 and 4 (assigments 5-8); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
#12
Tues 12 March
Simulating data and introduction to hypotheses Lab 12 - Simulations and hypotheses Classwork
#13
Thurs 14 March
Hypothesis testing of proportions Lab 13 - Hypothesis testing of proportions 11.1 Assessing Models
Introduction to Hypothesis Test
No quiz
#14
Tues 19 March
Hypothesis testing: two samples, qualitiative data Lab 14 - Hypothesis testing with multiple categories 11.2 Multiple Categories
Creating dataframes
Paper quiz: Labs 5, 6, and 7 (assigments 9-14); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
#15
Thurs 21 March
Hypothesis testing: two samples, quantitative data Lab 15 - More hypothesis testing 11.3 Hypothesis testing steps
12.1 A/B Testing
Classwork
#16
Tues 26 March
Bootstrap and confidence intervals Lab 16 - Bootstrap and confidence intervals 13.1 Percentiles
13.2 The Bootstrap
13.3 Confidence Intervals
Much more detail about the dataset
Online quiz
#17
Thurs 28 March
Normal distributions Lab 17 - Normal distributions 14.3 The SD and the Normal Distribution
Online stats book: normal distributions
Visualizing the normal distribution
Paper quiz: Labs 8 and 9 (assigments 15-18); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
1 April Last day to withdraw from class with a grade of W
#18
Tues 2 April
Central Limit Theorem Lab 18 - Central Limit Theorem
Data: starbucks-menu-nutrition-drinks.csv
14.4 The Cental Limit Theorem
14.5 The Variability of the Sample Mean
Classwork: Probabilities
#19
Thurs 4 April
Functions and conditional statements tree_data.csv
Lab 19 code from class
8.1 Applying a function to a column
9.1 Conditional statements
Quiz: online on labs 12 and 13
#20
Tues 9 April
Correlation, Causation, and Heat maps Lab 20 - Correlation, causation, and heat maps
Data: Feb2019_labor_market_majors.csv
Spurious correlations
Correlation guessing game
15.1 Correlation
Online stats book: intro to correlation
Paper quiz: Labs 10 and 11 (assigments 19-22); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
#21
Thurs 11 April
Simple linear regression, checking residuals for normality Lab 21 - Simple linear regression 15.2 The Regression Line
15.5 Visual Diagnostics
Introduction to linear regression
Classwork
#22
Tues 16 April
Multi-linear regression, R-squared, and prediction Lab 22 - Multi-linear regression, R-squared, and prediction 15.4 Least Squares Regression
17.6 Multiple Regression
Online quiz: Labs 17 and 18
#23
Thurs 18 April
Confidence intervals for the slope of linear regression Lab 23 - Confidence intervals for regression
UPDATED to correct issues found during class: Lab 23 - Confidence intervals for regression
16.1 A regression model, 16.2 Inference for the true slope Paper quiz: Labs 12 and 13
19 - 28 April Spring recess: no classes
#24
Tues 30 April
Intro to Machine Learning: understanding the data Kaggle: Titanic: Machine Learning from Disaster
titanic_train.csv
titanic_test.csv

Lab 24 - Understanding the Titantic data
Classwork
#25
Thurs 2 May
k-nearest neighbors (machine learning) Lab 25 - k-Nearest Neighbors classifier 1 17 Classification
17.1 Nearest Neighbors
17.2 Training and Testing
KNN classification using Scikit-Learn (Datacamp)
Paper quiz: Labs 17 (normal distribution), 18 (Central Limit Theorem), 19 (functions), 20 (correlation and heatmaps)
#26
Tues 7 May
k-nearest neighbors continued (machine learning) Lab 26 - k-Nearest Neighbors classifier 2
Completed Lab 26 - k-Nearest Neighbors classifier 2
#27
Thurs 9 May
The data science process revisited Exam review 1 Paper quiz: Labs 21, 22, 23 (linear regresesion)
#28
Tues 14 May
Review Sample Final Exam (answers)
original NBA dataset (see blackboard for how to clean)
Thurs 16 May Final exam 1:30pm - 3:30pm, Gillet 231