Topics Course: Data Science

Spring 2017

This assignment uses the basic statistics functions developed by the textbook's author and available at:

https://github.com/joelgrus/data-science-from-scratch/blob/master/code-python3/stats.py

- The Lyme disease incident data for Connecticut, New Jersey, and New York from 2003 to 2011 (see lymeScaled.py).
- The historical population for New York City used in Homework #2.
- The birthday collisions dataset you collected in Homework #3.

CMP 464/788 Homework: | |
---|---|

#1 |
Examine the correlation between the change in incidence of Lyme Disease in Connecticut, New Jersey, and New York. Compute the pairwise correlation-- that is ρ(CT,NJ), ρ(CT,NY), and ρ(NJ,NY). Use the textbook's code to compute the correlations between each pair of states. Include all correlations that you computed in your written answer.
#1: Submit a .txt or .pdf file with your answer. |

#2-3 |
Use the dataset of New York City's historical population to answer the following:
Which borough's change in population is most closely correlated to the city's change in population? Justify your answer. Use the textbook's code to compute the correlations between each borough and the overall city populations. Include all correlations that you computed in your written answer. For the second part include a plot of the borough population that most closely correlates and the city's population from 1790 to 2010.
Make sure to include in the title of your plot the date plotted. #2: Submit a .txt or .pdf file with your answer. #3: Submit a screen shot of the graphics window containing the plot of the borough population that most closely correlated to the city's population, as well as the city's population. |

#4-6 |
Could it be the case that drivers racing to get somewhere by the top of the hour, drive more recklessly and get in more accidents, than those who are driving just past the hour? Using the birthday data set, display the number of collisions that occur on the your birthday binned by minute. The x-axis of your plot should be the minutes from 0 to 59 (minutes after the hour), and the y-axis should be the number of the accidents that occur at each minute after the hour. Include in your plot, a label containing the correlation, ρ(minutes,# accidents). #4: Submit your Python program as a .py file.#5: Submit a screen shot of the graphics window containing the plot. #6: Do minutes after the hour correlation with more accidents in your birthday dataset? Justify your answer. Include a .txt or .pdf file with your answer. Hint: See Homework #3, #3-4. |