# Homework #4

## CMP 464/788: Topics Course: Data Science Spring 2017

Topics: Distributions & Correlations
Deadline: Thursday, March 2 2017, 11:59pm

### Textbook's Code

This assignment uses the basic statistics functions developed by the textbook's author and available at:

https://github.com/joelgrus/data-science-from-scratch/blob/master/code-python3/stats.py

### Datasets

This assignment uses the following datasets:
• The Lyme disease incident data for Connecticut, New Jersey, and New York from 2003 to 2011 (see lymeScaled.py).
• The historical population for New York City used in Homework #2.
• The birthday collisions dataset you collected in Homework #3.

### Assignment

CMP 464/788 Homework:
#1 Examine the correlation between the change in incidence of Lyme Disease in Connecticut, New Jersey, and New York. Compute the pairwise correlation-- that is ρ(CT,NJ), ρ(CT,NY), and ρ(NJ,NY). Use the textbook's code to compute the correlations between each pair of states. Include all correlations that you computed in your written answer.

#2-3 Use the dataset of New York City's historical population to answer the following: Which borough's change in population is most closely correlated to the city's change in population? Justify your answer. Use the textbook's code to compute the correlations between each borough and the overall city populations. Include all correlations that you computed in your written answer. For the second part include a plot of the borough population that most closely correlates and the city's population from 1790 to 2010. Make sure to include in the title of your plot the date plotted.

#3: Submit a screen shot of the graphics window containing the plot of the borough population that most closely correlated to the city's population, as well as the city's population.
#4-6 Could it be the case that drivers racing to get somewhere by the top of the hour, drive more recklessly and get in more accidents, than those who are driving just past the hour? Using the birthday data set, display the number of collisions that occur on the your birthday binned by minute. The x-axis of your plot should be the minutes from 0 to 59 (minutes after the hour), and the y-axis should be the number of the accidents that occur at each minute after the hour. Include in your plot, a label containing the correlation, ρ(minutes,# accidents).

#4: Submit your Python program as a .py file.
#5: Submit a screen shot of the graphics window containing the plot.

Hint: See Homework #3, #3-4.

### Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.