Graphing & Plotting Recipes

CMP 464/788:
Topics Course: Data Science

Spring 2017

Book's Respository

The textbook's github repository has a concise demonstration of many fun features of matplotlib. If you have used matplotlib in the past, go straight to Chapter 3 code in the repository. If you have not, below is a brief tutorial to get started.

Graphing Mathematical Functions:

The pyplot module of matplotlib provides lots of useful ways to plot data to the screen. Let's use it to answer the question, which grows faster:
y = log(x) or y = √ x ?

To test out this question, we will write a program that:

  1. Uses the math and plotting libraries.
  2. Sets up a list of numbers (x-values) for our functions.
  3. Computes the y-values of our numbers for our functions.
  4. Creates plots of the two functions.
  5. Shows the plots in a separate graphics window.
Let's add in the Python code that for each of these steps:
  1. Uses the math and plotting libraries.
    import math	
    import matplotlib.pyplot as plt
    	
    Since it's unwieldly to type "matplotlib.pyplot" before every function we'd like to use from that library, instead we'll use the common abbreviation of "plt". With this, we can plt.plot(), instead of matplotlib.pyplot.plot().
  2. Sets up a list of numbers (x-values) for our functions.
    x = range(1,101)
    
    Remember: Python starts counting at 0 and goes up to, but not including the 101. So, this creates the list [1,2,...,100].
  3. Computes the y-values of our numbers for our functions.
    y1 = []
    for i in x:
       y = math.log(i)
       y1.append(y)
    y2 = []
    for i in x:
        y = math.sqrt(i)
        y2.append(y)   
    
    We need two separate lists since we have two separate functions to graph.
  4. Creates plots of the two functions.
    plt.plot(x,y1,label='y1 = log(x)')
    plt.plot(x,y2,label='y2 = sqrt(x)')
    plt.legend()
    
    Creates the plot for safe keeping but does not display it until told to (see next lines).
  5. Shows the plots in a separate graphics window.
    plt.show()
    
    This line pops up the new graphics window to display the plots.

From your plots, which do you think grows faster: log(x) or sqrt(x)?

Challenges

Using the Python program you wrote above, try the following:

Plotting Data:

We can use the same techniques to plot data. As a warm-up, download the scatter_plot.py. Run the program, and then, with a partner, figure out what each of line of the program does.

Next, Let's focus on the question: "Has Lyme Disease Increased?" and examine data from the Center for Disease Control (CDC) to answer that question. Let's start with the tri-state area. Here are the years and occurrences:

years = [2003,2004,2005,2006,2007,2008,2009,2010,2011]
ny = [5399,5100,5565,4460,4165,5741,4134,2385,3118]
nj = [2887,2698,3363,2432,3134,3214,4598,3320,3398]
ct = [1403,1348,1810,1788,3058,2738,2751,1964,2004]

To plot New York data as a `scatter plot' (dots at each (x,y) point), we add the commands:

import matplotlib.pyplot as plt #Library of plotting functions
plt.scatter(years, ny)
plt.show()

Challenges:

Plotting Data from Files:

Often there is too much data to type into your program. In these cases, it is easier to read in the information from a file. Below is a mixture of novel and previously used commands for accessing file from data and strings. Try to puzzle each one out on paper and then try in Python.

The data file statesSummary.csv is from the CDC. Before starting the program, open up the csv file and see what it looks like.

Challenges:

Harder Challenges: