This assignment uses collision data collected and made publicly by New York City Open Data, and can be found at:
https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95/data.
For this assignment, you will need to download two different data sets:
We will use these data sets for later homework assignments. Since scraping the data takes time, save these data sets to use again for the future programs.
02/01/2016,0:09,BRONX,10465,40.8341548,-73.8174815,"(40.8341548, -73.8174815)",BARKLEY AVENUE,DEAN AVENUE,,0,0,0,0,0,0,0,0,Driver Inattention/Distraction,Driver Inattention/Distraction,,,,3381301,PASSENGER VEHICLE,PASSENGER VEHICLE,,,
All lines are formatted similarly: they start with the date, then time, the borough, zip code, latitude and longitude, and also include cross streets, types of vehicles involved, number of injuries/fatalities, and possible cause. The first line of the file gives the entries in the order they occur in the rows.
The sample entry above gives details for a crash that occurred just past midnight at the corner of Barkley and Dean Avenues. There were no injuries and two passenger vehicles were involved. The probable cause was driver inattention on the part of both drivers. Each entry also includes a unique key that can be used to look up the report of the incident.
The textbook has a nice explanation (p 107 & sample code, line 160) of using the CSV module. You should use that as a basis for the programs below that take CSV files as input.
The work to be submitted differs by whether you are enrolled in the undergraduate or graduate course.
CMP 464 Homework: | CMP 788 Homework: | |
---|---|---|
#1-2 |
Using the birthday data set (see above), display a histogram of the number of collisions
that occur each hour. That is, your x-axis will have the hours from 0 to 23 and the y-axis
will be the number of collisions.
Make sure to include in the title of your plot the date plotted. #1: Submit your Python program as a .py file. #2: Submit a screen shot of the graphics window containing the plot. Hint: In this file, times of collisions are stored as "H:MM" or "HH:MM". To get the hour to use as a key for your dictionary, you can first find the location of the ":" in the string and then use it. For example, if timeString holds the string, then c = timeString.find(":") finds the location, and hour = int(timeString[:c]) will give the hour as an integer. |
|
#3-4 |
Using the birthday data set, display the fraction of collisions that occur in Queens each
hour. That is, for 0 (midnight to just before 1am), you should have as your y-value the
fraction of: collisions that occurred in Queens at hour 0 over collisions across the
whole city that occurred at hour 0. #3: Submit your Python program as a .py file. #4: Submit a screen shot of the graphics window containing the plot. |
Using the birthday data set, compute the mean and variance of the hours that all collisions occur
(this is the binned data from #1) and compute the mean and variance of the hours that just
the collisions in Queens occur.
#3: Submit your Python program as a .py file. #4: Submit the output of your python program as a text file or screen shot of the shell output. |
#5-6 |
Using the zip code data set (see above), display a histogram of the number of collisions that
occur each month. That is, your x-axis will have the numbers from 1 to 12 representing the
months of the year and the y-axis will be the number of collisions.
Make sure to include in the title of your plot the zip code plotted. #5: Submit your Python program as a .py file. #6: Submit a screen shot of the graphics window containing the plot. Hint: Since all dates occur in the same format: "MM/DD/YYYY", you can extract the month from dateString by monthNum = int(dateString[:2]). |