Date: | Topics: | Handouts: | Reading: | Quiz Topics: | HW/Project: |
#1
Mon 30 January |
First Day Details, Topics Overview, Mean, variance and random variables | Syllabus,
DS venn diagram, Gallery: NY density, nearest airport, precincts, citibike, buses vs. subways, transit + census, life spans, ebola, disease, jobs; Printing (from __future__), Textbook's repo, summaries sometimes hides the big picture, Anscombe's Quartet |
Academic Integrity Policy, Chapters 1-3 |
||
#2
Wed 1 February Lab |
Python 2 vs. 3, Python Refresher: basics;
Quick look at matplotlib's
line and bar charts;More on matplotlib:
histograms and scatterplots;
Data as vectors: scaling, dot products;
Python Refresher: list comprehensions & zip |
basic stats, list comprehension examples, list comprehension tutorial, matplotlib, Textbook's repo, Plotting recipes | Chapters 2,4,5 | ||
#3
Mon 6 February |
Scaling and dot product; correlation & causation
Python Refresher: lists & tuples |
weather3.py, lymeScaled.py, book's stats.py (depends on linear_algebra.py), Anscombe's Quartet, correlation guessing | Chapters 2,5 | #1: Academic Integrity | HW #1: Simple graphs with pyplot |
#4
Wed 8 February Lab |
Applying correlation, Simpson's Paradox; Getting Data: CSV Files Python refresher: lists & tuples, dictionaries |
lists vs. tuples, dictionary examples, lymeScaled.py, simple csv example & data, Simpson's paradox wiki, wage growth paradox | Chapters 2,6,9 | ||
Mon 13 February | Lincoln's Birthday - Lehman is closed | ||||
15 February | Classes follow Monday schedule | ||||
#5
Wed 15 February Monday schedule |
Probability: Distributions & Central Limit Theorem;
CSV Files |
Simpson's paradox wiki, wage growth paradox, simple csv example & data, normal distribution calculator, rolling dice, Central Limit Theorem Visualized, Matt Nedrich on CLT | Chapters 5,6, 9 | #2: Python Basics | HW #2: Scaling Vector Data |
20 February | President's Day - Lehman is closed | ||||
#6
Wed 22 February Lab |
Causation vs. Correlation, CSV Files
Python Refresher: collections, regular expressions |
simple csv example & data, dsWiki.txt (for group work), regex cheat sheet, regex online tester, correlation does not equal causation | Chapters 2,9 | #3: Vectors, Means, and Variances | HW #3: Binning Data & Measuring Dispersion |
#7
Mon 27 February |
Bayes Theorem; Naive Bayes: Spam Filter Example |
regex online tester,book's naive Bayes spam filter, spam dataset | Chapters 6,13 | #4: Python Lists, Dictionaries, & csv | HW #4: Correlations & Distributions |
#8
Wed 1 March Lab |
Naive Bayes: Spam Filter Example; Python Refresher: more on matplotlib & sets | twoPlots.py, subplots,book's naive Bayes spam filter, spam dataset | Chapters 2,7 | ||
#9
Mon 6 March |
Hypothesis & Inference: Confidence Intervals;More on Confidence Intervals, A/B Testing; | Khan Academy on confidence intervals, Khan Academy on hypothesis testing, normal distribution calculator, numpy, plotting revisited | Chapters 7,25 | #5: Correlation & Regular Expressions | HW #5: Bayes Theorem, Simpson's Paradox, & Regular Expressions |
#10
Wed 8 March Lab |
Hypothesis & Inference: Confidence Intervals;More on Confidence Intervals, A/B Testing continued | scipy lecture notes on arrays, arrays & images, , 3d surface example code, mplot3d tutorial, matplotlib colormaps | Chapters 8,9,25 | ||
#11
Mon 13 March |
Gradient descent, Linear Algebra Refresher: Eigenvalues & Eigenvectors Example: Simple Linear Regression |
Matt Nedrich's intro to gradient descent &
example,
Quinn Liu's
gradient descent image,Andrew Ng's linear regression notes; Eigenvectors & eigenvalues, visually, linear transformations example |
Chapters 2,8,9 | #6: Bayes Theorem | HW #6: A/B Testing |
#12
Wed 15 March Lab |
Manipulating image files with numpy
Python Refresher: numpy |
numpy:
plotting revisited,
detailed numpy tutorial,
numpy cheatsheet; scipy lecture notes on arrays, arrays & images; regression and GitHub classwork |
Chapters 9,10 | ||
#13
Mon 20 March |
Eigenvectors and eigenvalues; review: gradient descent and linear regression |
Matt Nedrich's intro to gradient descent &
example; Eigenvectors & eigenvalues, visually |
Chapters 2,10,25 | #7: Hypothesis & Inference | HW #7: Gradient Descent & Images |
#14
Wed 22 March Lab |
Using github; using Pandas and Seaborn for correlation and regreesion; |
github for beginners,
github Hello World, github student pack,
github cheat sheet; regression and GitHub classwork; Folium classwork, Folium tutorial |
Chapters 5,25 | ||
#15
Mon 27 March |
Computing eigenvalues and eigenvectors; Working with Multidimensional Data: Rescaling, Principal Components Analysis |
Example of using numpy to compute eigenvalues and eigenvectors; PCA, explained visually, Lindsay Smith's computing PCA, Sebastian Raschka's PCA overview and implementating in Python; scipy, sklearn's PCA, pca on iris dataset, NY Fed's unemployment rates and by major |
Chapters 2,10,25 | #8: Gradient Descent & numpy | HW #8: Mapping Data |
#16
Wed 29 March Lab |
Principal Components Analysis via sci-kit learn; JSON and geoJSON; choropleth maps |
ERSI's shapefiles,
shapefile wikipage, JSON,
KML,
summary & comparison; geometric interpretation of covariance matrix,PCA explained in greater and greater detail (first answer), sample PCA code, PCA method in sci-kit learn, PCA on the iris dataset; geoJSON and choropleth Lab, geoJSON specifications, geoJSON editor |
Chapters 2,11,12 | ||
#17
Mon 3 April |
Nearest Neighbors & Voronoi Diagrams; Clustering: k-means |
nearest airport, precincts' Voronoi diagram,
Voronoi diagrams from triagulations, scipy Voronoi module
k-means (wiki), k-means image example, scikit-learn clustering, |
Chapters 12,19 | #9: Eigenvectors & eigenvalues | HW #9: Shading Maps & PCA Project: Proposal |
#18
Wed 5 April Lab |
Scraping webpages: Beautiful Soup; k- Nearest Neighbors |
beautifulSoup,
soup documentation,
where's beautifulSoup?, Frances Zlotnick's
tutorial,
DOM tutorial,
book's code; k-nearest neighbors tutorial |
Chapters 10,19 | ||
10-18 April | Spring recess: no classes | ||||
19 April | Last day to withdraw from class with a grade of W | ||||
#19
Wed 19 April Lab |
k-Nearest Neighbors |
book's code; k-nearest neighbors tutorial |
Chapters 14-15 | #10: Using github & beautifulSoup | HW #10: Nearest Neighbors
Project: Timeline |
20 April | Classes follow Monday schedule | ||||
#20
Thurs 20 April |
Voronoi Diagrams, Clustering: k-means |
nearest airport, precincts' Voronoi diagram,
Voronoi diagrams from triagulations, scipy Voronoi module
k-means (wiki), k-means image example, scikit-learn clustering |
Chapter 16 | ||
#21
Mon 24 April |
k-means continued; hierarchical clustering; Multi-dimensional Scaling (MDS) | k-means (wiki),
k-means image example,k means example, k-nearest-neighbor versus k-means,
scikit-learn clustering;
hierarchical clustering; Noel O'Boyle's map example, Zachary Nichols' NYC scaled to commute time and part 2 |
Chapters 16,20 | #11: PCA | HW #11: k-Nearest Neighbors and Voronoi Diagrams
Project: Data Collection |
#22
Wed 26 April Lab |
Voronoi Diagrams and Clustering Labs | Voronoi Diagram Lab,Voronoi function in Scipy;scikit-learn clustering,k-means image example | Chapters 17,20 | ||
#23
Mon 1 May |
Refresher: Trees & Graphs;
Network Analysis |
networkx tutorial, Cambridge tutorial, graph review | Chapter 21 | #12: Nearest Neighbors & Clustering | HW #12: MDS & Regression Project: Analysis |
#24
Wed 3 May Lab |
Regression Cont'd |
regression recap,
logistic regression wiki, Marcel Caracliolo's university entrance example,
dummies on iris data set,
sklearn logistic regression,
sklean logistic regression example,
311 Requests (filter for Descriptor = "Pothole"), sklearn's MDS,middle school data |
Chapters 18, 22 | ||
#25
Mon 8 May |
MapReduce & PageRank | PageRank as applied lin. alg. (SIAM Review 2006) | Chapter 23 | #13: Regression & NLP | Project: Visualization & Draft Slide |
#26
Wed 10 May Lab |
Crash Course in SQL | Khan Academy on SQL, sqlitebrowser, sqlite, SQL lab | Chapter 24 | ||
#27
Mon 15 May |
Not from scratch: iPython (jupyter), pandas, and seaborn |
Thomas Wiecki's modern guide to data science,
OpenTechSchool iPython tutorial, pandas cookbook, cheat sheet, seaborn, elevator data |
Chapter 25 | Complete Project Project: Sneak Preview Slide |
|
#28
Wed 17 May Lab |
Project Presentations | ||||
Wed 24 May | Final exam 1:30pm - 3:30pm |