(Created with wordle with text from wiki)
Instructor: Prof. Megan Owen (she/her)
E-mail: megan.owen@lehman.cuny.edu
Office: Gillet 137E
Office hours: 2:40 - 4pm on Tuesdays and Thursdays (Gillet 231 or 137E), or by appointment

Course time: 1:00pm - 2:40pm on Tuesdays and Thursdays, Gillet 231

Student mentor: Tricia Cavaliero
E-mail: trishcavaliero@gmail.com
Office hours: 5:30-7pm on Monday and Wednesday, Gillet 233-B

Syllabus

Python and Jupyter:

Textbooks:

Assignments: see Blackboard

In-class Quizzes: see Blackboard

Outline:

Date: Topics: Lab and Handouts: Reading: Classwork & Quiz Topics:
#1
Tues 27 August
Syllabus; What is Data Science; Introduction to Python (math, variables, and printing); line plots with Pandas Syllabus
Citi Bike data example
Data Science Process

Lab 1 - Introduction to Python and Pandas (Jupyter notebook)
nycHistPop.csv
Academic Integrity Policy,
3.1,3.2,3.3
Variables
Line graphs
Online quiz: Academic Integrity
#2
Thurs 29 August
Statistical varaibles; proportions; column operations Lab 2 - Plotting NYC's shelter population Statistical variables Classwork: Statistical variables
Mon 2 September CUNY: No classes (Labor Day)
#3
Tues 3 Sept
Bar charts Lab 3 - Bar charts
Sept3_2018_Green_Taxi_Trip_Data.csv
7.1 Online quiz: variables and functions (all of Lab 1 except the plotting section)
Thurs 5 Sept Classes follow a Monday schedule
#4
Tues 10 Sept
Histograms Lab 4 - Histograms 7.2, Histograms Online quiz: Lab 1, statistical variables
#5
Thurs 12 Sept
Mean, median, and mode; filtering Lab 5 - Mean, Median, and Mode Online Stats: Median and mean
Non-technical overview: mean, median, mode
Online quiz: Lab 2
#6
Tues 17 Sept
Measures of Spread: range, variance, and standard deviation Lab 6 - Measures of spread (Range, Variance, and Standard Deviation) Measures of Variability
Subway trip variability
Classwork: mean, median, and variance
#7
Thurs 19 Sept
Behavior of sample vs. population; boxplots Lab 7 - Samples and boxplots 10.2 Sampling from a Population, percentiles, boxplots Online quiz: Labs 3 and 4
#8
Tues 24 Sept
Introduction to probability, computing probabilities, filtering Lab 8 - Computing probabilities 9.5 Finding Probabilities, Introduction to Probability, Computing probabilities
Filtering
Paper quiz: Labs 1 and 2 (assigments 1-4), statistical variables; 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
#9
Thurs 26 Sept
Filtering Lab 9 - Filtering
imdb_1000.csv
Filtering with Pandas Classwork
30 September - 1 October CUNY: No classes
#10
Thurs 3 October
Computing probabilities with and/or, subsets of dataframes Lab 10 - Computing probabilities 2-
Sept17_2019_311_Service_Requests.csv
9.5 Finding Probabilities Online quiz: review, Labs 5 and 6
8-9 October CUNY: No classes
#11
Thurs 10 October
Iteration, Sampling and Empirical Distributions Lab 11 - Iteration and Sampling Distributions 10.3 Empirical Distribution of a Statistic
9.3 Iteration
Iteration with turtles
Paper quiz: Labs 3 and 4 (assigments 5-8); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
#12
Tues 15 October
Comparing distributions visually, data and time in pandas Lab 12 - Comparing distributions visually Online quiz: review, Labs 7, 8, and 9
#13
Thurs 17 October
Simulations and hypotheses Lab 13 - Simulations and hypotheses 11.1 Assessing Models
Introduction to Hypothesis Test
Classwork: Introduction to hypotheses
#14
Tues 22 October
Hypothesis testing of proportions Lab 14 - Hypothesis testing of proportions 11.1 Assessing Models
Introduction to Hypothesis Test
Paper quiz: Labs 5, 6, and 7 (assigments 9-14); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
#15
Thurs 24 October
Hypothesis testing of proportions continued Lab 15 - Hypothesis testing of proportions continued Online quiz: Review, Labs 10, 11, 12
#16
Tues 29 October
Bootstrap and confidence intervals Lab 16 - Bootstrap and confidence intervals 13.1 Percentiles
13.2 The Bootstrap
13.3 Confidence Intervals
Much more detail about the dataset
Online quiz: Review and Labs 13 and 14
#17
Thurs 31 October
Normal distributions Lab 17 - Normal distributions 14.3 The SD and the Normal Distribution
Online stats book: normal distributions
Visualizing the normal distribution
Paper quiz: Labs 8, 9, 10 (assigments 15-20); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
5 November Last day to withdraw from class with a grade of W
#18
Tues 5 November
Central Limit Theorem Lab 18 - Central Limit Theorem
Data: starbucks-menu-nutrition-drinks.csv
14.4 The Cental Limit Theorem
14.5 The Variability of the Sample Mean
Visualization of the Central Limit Theorem
Classwork: Probabilities
#19
Thurs 7 November
Functions and conditional statements Lab 19 - Functions and conditional statements 8.1 Applying a function to a column
9.1 Conditional statements
Online quiz: Review, Labs 15 and 16
#20
Tues 12 November
Correlation, Causation, and Heat maps Lab 20 - Correlation, causation, and heat maps
Data: Feb2019_labor_market_majors.csv
Spurious correlations
Correlation guessing game
15.1 Correlation
Online stats book: intro to correlation
Paper quiz: Labs 11 and 12 (assignments 21 - 24); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz (note that labs 10 and 11 in the sample quiz are labs 11 and 12 this term)
#21
Thurs 14 November
Simple linear regression, checking residuals for normality Lab 21 - Simple linear regression
Data: Feb2019_labor_market_majors.csv
15.2 The Regression Line
15.5 Visual Diagnostics
Introduction to linear regression
Visual explanation of linear regression
Classwork: linear regression
#22
Tues 19 November
Multi-linear regression, R-squared, and prediction Lab 22 - Multi-linear regression, R-squared, and prediction 15.4 Least Squares Regression
17.6 Multiple Regression
Online quiz: Labs 17 and 18
#23
Thurs 21 November
Confidence intervals for the slope of linear regression Lab 23 - Confidence intervals for regression 16.1 A regression model, 16.2 Inference for the true slope Paper quiz: Labs 13, 14, and 15 (assignments 25 - 30); 1 sheet of paper (8 1/2" x 11") with handwritten notes on both sides is allowed
Sample quiz
#24
Tues 26 November
Intro to Machine Learning: understanding the data Lab 24 - Understanding the Titantic data

Kaggle: Titanic: Machine Learning from Disaster
train.csv
test.csv
Classwork: Understanding the Titanic data
28 November - 1 December Thanksgiving Recess: College Closed
#25
Tues 3 December
k-nearest neighbors (machine learning) Lab 25 - k-Nearest Neighbors classifier 1 17 Classification
17.1 Nearest Neighbors
17.2 Training and Testing
KNN classification using Scikit-Learn (Datacamp)
Paper quiz: Labs 16 (bootstrap and confidence interval), 17 (normal distribution), 18 (Central Limit Theorem), 19 (functions), 20 (correlation and heatmaps)
Sample quiz
#26
Thurs 5 December
k-nearest neighbors continued (machine learning) Lab 26 - k-Nearest Neighbors classifier 2
#27
Tues 10 December
The data science process revisited Lab 27 - The data science process revisited Paper quiz: Labs 21, 22, 23 (linear regresesion)
Sample quiz
#28
Thurs 12 December
Review Sample final Spring 2019(answers)
Final Spring 2019(answers)
Thurs 19 December Final exam 1:30pm - 3:30pm, Gillet 231