Voronoi Diagrams Lab

CMP 464/788: Data Science Spring 2017


Original lab by Katherine St. John.

More on GeoJSON

In addition to directly reading in CSV files. Pandas has a built-in function for reading in JSON files. This section has more on the format. To illustrate the concepts, we will work through an example building a map of CitiBike stations from their real-time data feed.

New York City's bike share program, CitiBike, provides extensive data about their system. This includes a real-time feed of the status of stations in the system. The feed is in JSON. Here's the beginning of the file:

{"executionTime":"2017-03-21 10:37:12 PM",
"stationBeanList":[
{"id":72,"stationName":"W 52 St & 11 Ave","availableDocks":18,"totalDocks":39,"latitude":40.76727216,"longitude":-73.99392888, "statusValue":"In Service",
"statusKey":1,"availableBikes":19,"stAddress1":"W 52 St & 11 Ave","stAddress2":"","city":"","postalCode":"","location":"","altitude":"","testStation":false,
"lastCommunicationTime":"2017-03-21 10:36:30 PM","landMark":""},
{"id":79,"stationName":"Franklin St & W Broadway","availableDocks":31,"totalDocks":33,"latitude":40.71911552,"longitude":-74.00666661,"statusValue":"In Service",
"statusKey":1,"availableBikes":0,"stAddress1":"Franklin St & W Broadway","stAddress2":"","city":"","postalCode":"","location":"","altitude":"","testStation":false,
"lastCommunicationTime":"2017-03-21 10:33:23 PM","landMark":""},
{"id":82,"stationName":"St James Pl & Pearl St","availableDocks":23,"totalDocks":27,"latitude":40.71117416,"longitude":-74.00016545,"statusValue":"In Service",
"statusKey":1,"availableBikes":4,"stAddress1":"St James Pl & Pearl St","stAddress2":"","city":"","postalCode":"","location":"","altitude":"","testStation":false,
"lastCommunicationTime":"2017-03-21 10:34:12 PM","landMark":""},

It begins with the time the file was created, followed by information about each station in a field marked stationBeanList. Each entry is organized as dictionary with (key:value) pairs. Let's look at the bean list entry for the first station:

   {"id":72,
   "stationName":"W 52 St & 11 Ave",
   "availableDocks":18,
   "totalDocks":39,
   "latitude":40.76727216,
   "longitude":-73.99392888, 
   "statusValue":"In Service",
   "statusKey":1,
   "availableBikes":19,
   "stAddress1":"W 52 St & 11 Ave",
   "stAddress2":"",
   "city":"",
   "postalCode":"",
   "location":"",
   "altitude":"",
   "testStation":false,
   "lastCommunicationTime":"2017-03-21 10:36:30 PM",
   "landMark":""}

Which ones are useful for making a map? We need latitude and longitude. For our popup message, it would also be good to give location as well as the number of bikes available and number of docks. If we called the stationBeanList entry for a station, beanList, these entries would be:

   beanList['latitude']
   beanList['longitude']
   beanList['name']
   beanList['availableBikes']
   beanList['totalDocks']

To build our map, we need to:

  1. Read in the json file.
  2. Create a map object (variable).
  3. Extract the location, name, and bike information for each dock.
  4. Add a marker for each dock to our map.
  5. Save our map.

The new part is reading in a json file, but pandas has a built-in method to do that for us: read_json():

stations = pd.read_json('https://feeds.citibikenyc.com/stations/stations.json')

The second step, creating a map object, is the same as before:

mapBikes = folium.Map(location=[40.75, -73.99],tiles="Cartodb Positron",zoom_start=14)

Extracting the information from each row has an added level, since the information is stored in the dictionary, stationBeanList. To make the lines more readable, we will save row['stationBeanList'] as the variable beanList. The rest of the for is the same as previous prorgrams:

for i,row in stations.iterrows():
    beanList = row['stationBeanList']
    lat = beanList['latitude']
    lon = beanList['longitude']
    name = beanList['stationName'] + ": " + str(beanList['availableBikes']) + " bikes available of " + str(beanList['totalDocks']) + " total bikes"
    print(name)
    if beanList['statusValue'] == 'Not In Service':
        name = beanList['stationName'] + ": Not In Service"
        i = folium.Icon(color='lightgray')
    else:
        name = beanList['stationName'] + ": " + str(beanList['availableBikes']) + " bikes available of " + str(beanList['totalDocks']) + " total bikes"
        if beanList['availableBikes'] < 2:
            i = folium.Icon(color='red')
        else:
            i = folium.Icon(color='green')
    folium.Marker([lat,lon],popup = name,icon = i).add_to(mapBikes)


#Create the html file with the map:
mapBikes.save(outfile='bikeLocations.html')

Putting this altogether gives the python program, cbStations.py.

Note: You can directly read from a URL, as we did in this program, or if you would like to work off-line, you can download the JSON file, save it locally, and use it as before.

Challenges

Voronoi Diagrams

A Voronoi diagram divides a region based on the distance to a set of input points. This simple idea has many applications, one of the most famous was the work of John Snow to study the 1854 cholera outbreak. Our focus will be on mapping access to public resources and transit.

Our first Voronoi diagram will be for libraries across the city (using the CSV library data set that can be downloaded from https://data.cityofnewyork.us/Business/Library/p4pf-fyc4). Our map highlights the regions closest to each library:

To make our map, we

  1. First take library locations and calculate the Voronoi diagram (using the scipy package).
  2. While going row by row to get the library locations, add markers to the map (we're adding them as separate markers, not a MarkerCluster, since we don't want them to collapse when zooming out).
  3. Show the plot in a matplotlib window to make sure its working (note: if you have a separate matplotlib window you may need to close it, since the rest of the program is waiting until it's done to continue).
  4. Then write out the locations as a geojson file (using the geojson).
  5. Load the regions in as a geoJSON layer in folium.

If you are using anaconda (either spyder, idle3, or jupyter), the scipy and matplotlib packages are included. To install geojson, type at a terminal window:

	pip install geojson

The program, makeVor.py, is a bit rambling, but contains all the steps above (a better design would be to split into separate functions or files for the different tasks). Try running it on the library dataset.

Note that it does very well in dense regions but has odd behavior on the edges of the map since we didn't include in the .json file any point at infinity and didn't clip the maps to the city boundaries.

Challenges