Homework #4

CMP 464/788:
Topics Course: Data Science
Spring 2017

Topics: Distributions & Correlations
Deadline: Thursday, March 2 2017, 11:59pm

Textbook's Code

This assignment uses the basic statistics functions developed by the textbook's author and available at:

https://github.com/joelgrus/data-science-from-scratch/blob/master/code-python3/stats.py

Datasets

This assignment uses the following datasets:

Assignment

CMP 464/788 Homework:
#1 Examine the correlation between the change in incidence of Lyme Disease in Connecticut, New Jersey, and New York. Compute the pairwise correlation-- that is ρ(CT,NJ), ρ(CT,NY), and ρ(NJ,NY). Use the textbook's code to compute the correlations between each pair of states. Include all correlations that you computed in your written answer.

#1: Submit a .txt or .pdf file with your answer.

#2-3 Use the dataset of New York City's historical population to answer the following: Which borough's change in population is most closely correlated to the city's change in population? Justify your answer. Use the textbook's code to compute the correlations between each borough and the overall city populations. Include all correlations that you computed in your written answer. For the second part include a plot of the borough population that most closely correlates and the city's population from 1790 to 2010. Make sure to include in the title of your plot the date plotted.

#2: Submit a .txt or .pdf file with your answer.
#3: Submit a screen shot of the graphics window containing the plot of the borough population that most closely correlated to the city's population, as well as the city's population.
#4-6 Could it be the case that drivers racing to get somewhere by the top of the hour, drive more recklessly and get in more accidents, than those who are driving just past the hour? Using the birthday data set, display the number of collisions that occur on the your birthday binned by minute. The x-axis of your plot should be the minutes from 0 to 59 (minutes after the hour), and the y-axis should be the number of the accidents that occur at each minute after the hour. Include in your plot, a label containing the correlation, ρ(minutes,# accidents).

#4: Submit your Python program as a .py file.
#5: Submit a screen shot of the graphics window containing the plot.
#6: Do minutes after the hour correlation with more accidents in your birthday dataset? Justify your answer. Include a .txt or .pdf file with your answer.

Hint: See Homework #3, #3-4.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.