Homework #2

CMP 464/788:
Topics Course: Data Science
Spring 2017

Topics: Data as vectors, more on matplotlib & Weather Underground data
Deadline: Thursday, 16 February 2016, 11:59pm

Weather Data and matplotlib

This assignment relies on data collected in Homework #1. See it for directions on getting started with matplotlib and scraping the Weather Underground website.

Built-in to Python are functions for downloading pages ('scraping data') directly from the web. We will use the urllib library to plot historical weather data.

Messy Data

Real data can be a messy. For example, on the weather data question below, you are asked to scale the size of the "bubbles" in the scatter plot to reflect the snow depth on the ground that day. As you scrape this data, you will see many values of "0", but then a "T" pops up half-way through the month. For example, here's part of the Weather Underground page for January 19, 2016:

The raw html file that produces the last 2 lines looks like:

		<td class="indent"><span>Since 1 July snowfall</span></td>
		<td>0.6</td>
		<td>10.0</td>
		<td> </td>
		</tr>
		<tr>
		<td class="indent"><span>Snow Depth</span>
		<td>
  <span class="wx-data"><span class="wx-value">T</span> in</span></span>
</td>
		<td> 
		<td> 
		</tr>

Assignment

This assignment is the same for the undergraduate and graduate course.

CMP 464/788 Homework:
#1-2 Using the data you collected for Homework #1, #5, use matplotlib to produce a plot that shows the fluctuation of the daily min temperature with respect to the month's average. That is, first compute the average min temperature of the 31 daily min temperatures and then scale each daily min temperature to reflect its percentage of the average min temperature. For an example, see lymeScaled.py which does a similar (but not identical) scaling to this problem. Make sure to change the title of your plot to reflect the information plotted.

#1: Submit your Python program as a .py file.
#2: Submit a screen shot of the graphics window containing the plot.
#3-4 For the January minimum temperature data, compute and display the running average of the temperatures over the previous 5 days. That is, you display the average temperature over the previous 5 days for each day (if all exist, if not use as many as do exist).
For example, if the temperatures were 10,20,10,20,15,35,30,... :
  • The first day has no previous values, so would be 10.
  • The second day is (10+20)/2 = 15.
  • The third day is (10+20+10)/3 = 13.
  • The fourth day is (10+20+10+20)/4 = 15.
  • The fifth day: we now have enough to do the running average of a full 5 days: (10+20+10+20+15)/5 = 75/5 = 15.
  • The sixth day uses the the previous 5 days: (20+10+20+15+35)/5 = 90/5 = 18...
#3: Submit your Python program as a .py file.
#4: Submit a screen shot of the graphics window containing the plot.
#5-6 Collect the snow depths for January 2017. Display the January minimum temperatures (collected in Homework #1) as a scatter plot (of day versus temperature) with the size of each `bubble' proportional to the snow depth on that day (see scatter_plot.py for a sample of varying `bubble' sizes).

#5: Submit your Python program as a .py file.
#6: Submit a screen shot of the graphics window containing the plot.
#7-8 Plot the percentage of New York City's population that lives in each borough. The raw historical population data for New York city from 1790 to 2010 is available here. Your plot should not display the raw population numbers, but instead give the percentages. For example, in 1790, 31,131 people lives in Manhattan out of the 49,447 that lived in New York City overall. The displayed value for Manhattan in 1790 would be 31,131/49,447 * 100 = 63 percent.

#7: Submit your Python program as a .py file.
#8: Submit a screen shot of the graphics window containing the plot.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.