Homework #1

CMP 464/788:
Topics Course: Data Science
Spring 2017

Topics: Simple plots using matplotlib, scraping data from WeatherUnderground
Deadline: Thursday, February 9, 11:59pm
This page has been updated to correspond to Python 3. If you are using Python 2, see the Weather Data section from last year's course.

Getting Started with matplotlib

This homework uses matplotlib which is available with anaconda or visit matplotlib.

If you would prefer not to upgrade your Python installation (or would like to work through a browser), there are several web-based services. One that already has the libraries were are using is PythonAnywhere.

Weather Data

Built-in to Python are functions for downloading pages ('scraping data') directly from the web. We will use the urllib.request library to plot historical weather data.

We will use just one function from urllib.request, urlopen(), which takes as input a URL (uniform record locator of a web page) and opens the page for reading. The format is:

	import urllib
	page = urllib.request.urlopen("http://lehman.edu")
Once the page variable is set up, it can be used just like a file variable. For example, you can read all the lines into a list of strings:
	lines = page.readlines()
in the same way as a file.

In Python 3, urlopen() returns a btyearray instead of a string, so we must convert each line into a string before we use it as one. We do that with the method decode("utf8"), which tells the computer that each character was encoded as a byte using the UTF-8 character encoding.

We can also combine data from multiple pages into a single program. We will use Weather Underground's historical weather data to plot temperatures. The idea is:

The hard part is figuring out the URL for the webpages. Let's look at the URLs


The only thing that changes is the year, the suffix before it and the prefix after it stay the same. Let's store those in variables and then loop through the years:

    prefix = "http://www.wunderground.com/history/airport/KLGA/"
    suffix = "/02/02/DailyHistory"
    for year in range(2000,2017):
        url = prefix+str(year)+suffix

Each time through the loop, the url variable will hold the prefix+str(year)+suffix.

Try running the program, weather3.py and then start the assignment.


The work to be submitted differs by whether you are enrolled in the undergraduate (CMP 464) or graduate (CMP 788) course.

CMP 464 Homework: CMP 788 Homework:
#1-2 Using the above as a starting point, use matplotlib to produce a plot of the high temperature over the last 25 years for your birthday. For example, if you were born on February 2, then your plot would be the same as the first plot of the sample program weather3.py. Make sure to change the title of your plot to include your name and birthday.

#1: Submit your Python program as a .py file.
#2: Submit a screen shot of the graphics window containing the plot.

Note: You will use this same data set below. Since scraping the data takes the most time of running the program, save it and use it again for the programs.
#3-4 Modify the above program to plot both the minimum and maximum temperature for the last 25 years for your birthday.

#3: Submit your Python program as a .py file.
#4: Submit a screen shot of the graphics window containing the plot.
Using the data you collected, compute the average high and low temperatures for your birthday over the last 25 years. Plot the maximum and minimum temperatures from above as well as two constant lines representing the average high and low temperatures (i.e. y = aveHigh).

#3: Submit your Python program as a .py file.
#4: Submit a screen shot of the graphics window containing the plot.
#5-6 Collect the minimum temperatures for January 2017. Display the collected data as a histogram.

#5: Submit your Python program as a .py file.
#6: Submit a screen shot of the graphics window containing the plot.

Submitting Homework

To submit your homework, log on to the Blackboard system, and go to "Homework". For each part of the homework, there is a separate input box. You may submit the homework as many times as you would like before the deadline.