Homework #9

CMP 464/788:
Topics Course: Data Science
Spring 2017

Topics: Shading Maps & PCA
Deadline: Thursday, 6 April 2017, 11:59pm

Data

For this assignment, you will need to download three different data sets:

  1. New York Federal Reserve Labor Market for Recent Graduates: Download the excel file at the bottom of the page and convert to CSV to use in the problem set below:
    https://www.newyorkfed.org/research/college-labor-market/college-labor-market_compare-majors.html
  2. geoJSON for New York City School Districts: This file is available from

    Open Data NYC Planning

    (scroll down to "School, Police, Health & Fire" and export as geoJSON, called schoolDistricts.json).

    (If you have troubles downloading, here's the file: schoolDistricts.json

  3. Test Scores for New York City School Districts: For this homework, we will be using the District "Math Data Files":
    http://schools.nyc.gov/Accountability/data/TestResults/ELAandMathTestResults

We will use these data sets for later homework assignments. Since scraping the data takes time, save these data sets to use again for the future programs.

Assignment

The work to be submitted is the same for the undergraduate and graduate versions of the course.

CMP 464/788 Homework:
#1-3 Analyse the NY Fed's Labor Market Data for Recent Graduates (see link above) using a Principal Components Analysis. There are three parts to this exercise:
  1. Compute and display the covariance matrix for the data,
  2. Generate a 3D plot of the data under the first three axis of a Principal Components Analysis. On this plot, highlight (using a different color) the computer science and mathematics majors, and
  3. Include the Python code that you used to generate your plots.
Make sure to include in the title of your plot the date plotted.

#1: Submit your Python program as a .py file.
#2: Submit a text file or screen shot that includes the covariance matrix.
#3: Submit a screen shot of the graphics window containing the plot.
#4-5 OPTIONAL (FOR EXTRA CREDIT)Using folium, create a map of the New York City School Districts (elementary and middle school) and shade each districts by borough (that is, all districts in the Bronx will be the same color; the districts in Brooklyn will be another color, etc.).

#4: Submit your Python program as a .py file.
#5: Submit a screen shot of the graphics window containing the plot.
#6-7 Using the New York City data for district test scores, shade your map above by percentage of students proficient in mathematics (i.e. scored a 3 or 4 on the exam-- the last column in the CSV file).

#6: Submit your Python program as a .py file.
#7: Submit a screen shot of the graphics window containing the plot.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.