(Created with wordle with text from wiki)
Instructor: Prof. Megan Owen (she/her)
E-mail: megan.owen@lehman.cuny.edu
Office: Gillet 137E
Office hours: 2:40 - 4pm on Tuesdays and Thursdays (Gillet 231 or 137E), or by appointment

Course time: 1:00pm - 2:40pm on Tuesdays and Thursdays, Gillet 231

Student mentor: Tricia Cavaliero
E-mail: trishcavaliero@gmail.com
Office hours: 5:30-7pm on Monday and Wednesday, Gillet 233-B


Python and Jupyter:


Assignments: see Blackboard

In-class Quizzes: see Blackboard


Date: Topics: Lab and Handouts: Reading: Classwork & Quiz Topics:
Tues 27 August
Syllabus; What is Data Science; Introduction to Python (math, variables, and printing); line plots with Pandas Syllabus
Citi Bike data example
Data Science Process

Lab 1 - Introduction to Python and Pandas (Jupyter notebook)
Academic Integrity Policy,
Line graphs
Online quiz: Academic Integrity
Thurs 29 August
Statistical varaibles; proportions; column operations Lab 2 - Plotting NYC's shelter population Statistical variables Classwork: Statistical variables
Mon 2 September CUNY: No classes (Labor Day)
Tues 3 Sept
Bar charts Lab 3 - Bar charts
7.1 Online quiz: variables and functions (all of Lab 1 except the plotting section)
Thurs 5 Sept Classes follow a Monday schedule
Tues 10 Sept
Histograms Lab 4 - Histograms 7.2, Histograms Online quiz: Lab 1, statistical variables
Thurs 12 Sept
Mean, median, and mode; filtering Lab 5 - Mean, Median, and Mode Online Stats: Median and mean
Non-technical overview: mean, median, mode
Online quiz: Lab 2
Tues 17 Sept
Measures of Spread: range, variance, and standard deviation Lab 6 - Measures of spread (Range, Variance, and Standard Deviation) Measures of Variability
Subway trip variability
Classwork: mean, median, and variance
Thurs 19 Sept
Behavior of sample vs. population; boxplots Lab 7 - Samples and boxplots 10.2 Sampling from a Population, percentiles, boxplots Online quiz: Labs 3 and 4
Tues 24 Sept
Introduction to probability, computing probabilities, filtering Lab 8 - Computing probabilities 9.5 Finding Probabilities, Introduction to Probability, Computing probabilities
Paper quiz: Labs 1 and 2 (assigments 1-4), statistical variables; 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
Thurs 26 Sept
Filtering Lab 9 - Filtering
Filtering with Pandas Classwork
30 September - 1 October CUNY: No classes
Thurs 3 October
Computing probabilities with and/or, subsets of dataframes Lab 10 - Computing probabilities 2-
9.5 Finding Probabilities Online quiz: review, Labs 5 and 6
8-9 October CUNY: No classes
Thurs 10 October
Iteration, Sampling and Empirical Distributions Lab 11 - Iteration and Sampling Distributions 10.3 Empirical Distribution of a Statistic
9.3 Iteration
Iteration with turtles
Paper quiz: Labs 3 and 4 (assigments 5-8); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
Tues 15 October
Comparing distributions visually, data and time in pandas Lab 12 - Comparing distributions visually Online quiz: review, Labs 7, 8, and 9
Thurs 17 October
Simulations and hypotheses Lab 13 - Simulations and hypotheses 11.1 Assessing Models
Introduction to Hypothesis Test
Classwork: Introduction to hypotheses
Tues 22 October
Hypothesis testing of proportions Lab 14 - Hypothesis testing of proportions 11.1 Assessing Models
Introduction to Hypothesis Test
Paper quiz: Labs 5, 6, and 7 (assigments 9-14); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
Thurs 24 October
Hypothesis testing of proportions continued Lab 15 - Hypothesis testing of proportions continued Online quiz: Review, Labs 10, 11, 12
Tues 29 October
Bootstrap and confidence intervals Lab 16 - Bootstrap and confidence intervals 13.1 Percentiles
13.2 The Bootstrap
13.3 Confidence Intervals
Much more detail about the dataset
Online quiz: Review and Labs 13 and 14
Thurs 31 October
Normal distributions Lab 17 - Normal distributions 14.3 The SD and the Normal Distribution
Online stats book: normal distributions
Visualizing the normal distribution
Paper quiz: Labs 8, 9, 10 (assigments 15-20); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
5 November Last day to withdraw from class with a grade of W
Tues 5 November
Central Limit Theorem Lab 18 - Central Limit Theorem
Data: starbucks-menu-nutrition-drinks.csv
14.4 The Cental Limit Theorem
14.5 The Variability of the Sample Mean
Visualization of the Central Limit Theorem
Classwork: Probabilities
Thurs 7 November
Functions and conditional statements Lab 19 - Functions and conditional statements 8.1 Applying a function to a column
9.1 Conditional statements
Online quiz: Review, Labs 15 and 16
Tues 12 November
Correlation, Causation, and Heat maps Lab 20 - Correlation, causation, and heat maps
Data: Feb2019_labor_market_majors.csv
Spurious correlations
Correlation guessing game
15.1 Correlation
Online stats book: intro to correlation
Paper quiz: Labs 11 and 12 (assignments 21 - 24); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz (note that labs 10 and 11 in the sample quiz are labs 11 and 12 this term)
Thurs 14 November
Simple linear regression, checking residuals for normality Lab 21 - Simple linear regression
Data: Feb2019_labor_market_majors.csv
15.2 The Regression Line
15.5 Visual Diagnostics
Introduction to linear regression
Visual explanation of linear regression
Classwork: linear regression
Tues 19 November
Multi-linear regression, R-squared, and prediction Lab 22 - Multi-linear regression, R-squared, and prediction 15.4 Least Squares Regression
17.6 Multiple Regression
Online quiz: Labs 17 and 18
Thurs 21 November
Confidence intervals for the slope of linear regression Lab 23 - Confidence intervals for regression 16.1 A regression model, 16.2 Inference for the true slope Paper quiz: Labs 13, 14, and 15 (assignments 25 - 30); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
Tues 26 November
Intro to Machine Learning: understanding the data Lab 24 - Understanding the Titantic data

Kaggle: Titanic: Machine Learning from Disaster
Classwork: Understanding the Titanic data
28 November - 1 December Thanksgiving Recess: College Closed
Tues 3 December
k-nearest neighbors (machine learning) Lab 25 - k-Nearest Neighbors classifier 1 17 Classification
17.1 Nearest Neighbors
17.2 Training and Testing
KNN classification using Scikit-Learn (Datacamp)
Paper quiz: Labs 16 (bootstrap and confidence interval), 17 (normal distribution), 18 (Central Limit Theorem), 19 (functions), 20 (correlation and heatmaps)
Sample quiz
Thurs 5 December
k-nearest neighbors continued (machine learning) Lab 26 - k-Nearest Neighbors classifier 2
Tues 10 December
The data science process revisited Lab 27 - The data science process revisited Paper quiz: Labs 21, 22, 23 (linear regresesion)
Sample quiz
Thurs 12 December
Review Sample final Spring 2019(answers)
Final Spring 2019(answers)
Thurs 19 December Final exam 1:30pm - 3:30pm, Gillet 231