This assignment relies on data collected in Homework #1. See it for directions on getting started with matplotlib and scraping the Weather Underground website.
Built-in to Python are functions for downloading pages ('scraping data') directly from the web. We will use the urllib library to plot historical weather data.
Real data can be a messy. For example, on the weather data question below, you are asked to scale the size of the "bubbles" in the scatter plot to reflect the snow depth on the ground that day. As you scrape this data, you will see many values of "0", but then a "T" pops up half-way through the month. For example, here's part of the Weather Underground page for January 19, 2016:
The raw html file that produces the last 2 lines looks like:
<td class="indent"><span>Since 1 July snowfall</span></td>
<td>0.6</td>
<td>10.0</td>
<td> </td>
</tr>
<tr>
<td class="indent"><span>Snow Depth</span>
<td>
<span class="wx-data"><span class="wx-value">T</span> in</span></span>
</td>
<td>
<td>
</tr>
A good approach is to run your program, and if you discover that there's non-numeric data where you expect numbers, is to go examine the data (which we did above) to decide if it's an error in coding or an unexpected value. Let's look at the code from weather3.py:
def getTempFromWeb(kind,url): page = urllib.request.urlopen(url) lines = page.readlines() for i in range(len(lines)): if lines[i].decode("utf8").find(kind+" Temperature") >= 0: m = i break searchObj = re.search('\d+', lines[m+2].decode("utf8")) return int(searchObj.group(0))What does this code do? (Again, we will discuss it on 2/8, but here are the notes as a reminder). It opens up the url and reads through the lines until it finds kind+" Temperature" and then searches the 2 lines later for number ('\d+' is a way of writing you would like a number of 1 or more digits as a regular expression). The re.search will return the search objects if found. What does it do if there is no number on that line? It will return Python's default I-don't-know-what-to-say value of None. But the code above assumes that searchObj contains values and continues processing. Instead, there should be a test here to make sure searchObj has a non-None value and process the data appropriately.
How can we do this? Here's the pseudocode for a function that looks for the snow
depth and returns the number given or 0 if trace amounts are reported:
def getSnowDepth(url):
This assignment is the same for the undergraduate and graduate course.
CMP 464/788 Homework: | |
---|---|
#1-2 |
Using the data you collected for Homework #1, #5, use matplotlib
to produce a plot that shows the fluctuation of the daily min temperature with respect to
the month's average. That is, first compute the average min temperature of the 31 daily min temperatures
and then scale each daily min temperature to reflect its percentage of the average min temperature.
For an example, see lymeScaled.py which does a similar (but
not identical) scaling to this problem. Make sure to change the title of your plot to
reflect the information plotted. #1: Submit your Python program as a .py file. #2: Submit a screen shot of the graphics window containing the plot. |
#3-4 |
For the January minimum temperature data, compute and display the running average of the
temperatures over the previous 5 days. That is, you display the average temperature over
the previous 5 days for each day (if all exist, if not use as many as do exist).
For example, if the temperatures were 10,20,10,20,15,35,30,... :
#4: Submit a screen shot of the graphics window containing the plot. |
#5-6 |
Collect the snow depths for January 2017. Display the January minimum temperatures
(collected in Homework #1) as a scatter plot (of day versus temperature)
with the size of each `bubble' proportional to the snow depth on that day (see
scatter_plot.py
for a sample of varying `bubble' sizes). #5: Submit your Python program as a .py file. #6: Submit a screen shot of the graphics window containing the plot. |
#7-8 |
Plot the percentage of New York City's population that lives in each borough.
The raw historical
population data for New York city from 1790 to 2010 is available
here. Your plot should not display the raw population numbers,
but instead give the percentages. For example, in 1790, 31,131 people lives in Manhattan
out of the 49,447 that lived in New York City overall. The displayed value for Manhattan
in 1790 would be 31,131/49,447 * 100 = 63 percent.
#7: Submit your Python program as a .py file. #8: Submit a screen shot of the graphics window containing the plot. |