Geopandas Basics

Geopandas is a library for manipulating spatial data. The difference between geopandas and pandas is that a GeoDataFrame contains a GeoSeries with spatial data. The name of this GeoSeries is often 'geometry'. This spatial data has a coordinate reference system (CRS), typically EPGS: 4326 unprojected geographic coordinates, i.e. latitude/longitude. The CRS must be changed to a projected CRS for any spatial analysis or measurements. In this class, we will use EPSG:3857 WGS84 Web Mercator (Auxiliary Sphere) with units in meters.

The Geopandas library is NOT part of the anaconda application so you must install it in your environment before using it.

In the terminal window, enter conda install -c conda-forge geopandas.

There are two common formats for spatial datasets, namely GeoJSON and SHP. The easiest way to determine if an open dataset is a spatial dataset (contains coordinates) is to determine if it can be exported in either of these formats.

The open datasets in this example are GeoJSON format. In addition, the platform for the open datasets in this example is SOCRATA. Later exercises will work with ArcGIS REST services and the Census WMS.

The City of Buffalo, NY uses Socrata as its platform to provide open data. The web address is https://data.buffalony.gov/.

Buffalo Open Data

Geospatial Data provide location and ID information of geographic features and events. Attributes in these datasets are fairly limited. Most often, these datasets would be used in combination with other datasets. Notice that geospatial datasets are type Map.

Buffalo Geospatial Open Data

When you click on the name of a geospatial dataset, you will see a map of the geographic features. The image below shows the results of clicking on the Buffalo Police Districts map. The district polygons are shown over the Google basemap. If you click on a feature (polygon) a pop-up shows the attributes in the database.

Buffalo Police Districts

When you click on the EXPORT tab you will see that this dataset may be exported in shapefile or geoJson formats.

Export Tab

If you hover your mouse over either of these formats, right click - copy link address -you can paste the url into a text string to be read into a geodataframe.

Export shapefile

SEE BELOW

In [8]:
import os
import geopandas as gpd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
In [9]:
shp_url="https://data.buffalony.gov/api/geospatial/hggk-983r?method=export&format=Shapefile"
shp_gdf=gpd.read_file(shp_url)
shp_gdf.plot(column='name', figsize=(9,9));
In [10]:
shp_gdf.tail() #look at geodataframe, geometry column contains spatial coordinate data
Out[10]:
bdy_left bdy_right city_left city_right direction fromleft fromright globalid hi_x_name hi_x_pre ... shape_star shape_stle st_code st_prefix st_suffix st_type street toleft toright geometry
0 None None None None None 0.0 0.0 {F6B399B4-524B-45CF-83A0-06C06BCCA628} None None ... 1.528497e+08 62635.398739 None None None None None 0.0 0.0 POLYGON ((-78.90197 42.89666, -78.90113 42.897...
1 None None None None None 0.0 0.0 {83F2A179-549E-40B5-ACC4-5A0ED9DBF195} None None ... 1.881763e+08 71419.982741 None None None None None 0.0 0.0 POLYGON ((-78.83168 42.94597, -78.83023 42.947...
2 None None None None None 0.0 0.0 {323FC3F5-857C-4E2E-A95D-4A8CC6112099} None None ... 1.789123e+08 58574.689857 None None None None None 0.0 0.0 POLYGON ((-78.83762 42.87883, -78.83794 42.878...
3 None None None None None 0.0 0.0 {FF525ECB-B916-4644-9F18-A71150F86DAD} None None ... 3.479409e+08 121348.349294 None None None None None 0.0 0.0 POLYGON ((-78.83838 42.83207, -78.85385 42.832...
4 None None None None None 0.0 0.0 {AB3A0367-7B20-4827-B06F-B41EC47B998B} None None ... 2.744325e+08 89540.395915 None None None None None 0.0 0.0 MULTIPOLYGON (((-78.90172 42.91518, -78.90168 ...

5 rows × 33 columns

Change CRS from unprojected lat/long to projected coordinates

In [11]:
shp_gdf.crs
Out[11]:
{'init': 'epsg:4326'}
In [12]:
shp_gdf=shp_gdf.to_crs('epsg:3857')
shp_gdf.geometry
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))
Out[12]:
0    POLYGON ((-8783327.052 5296255.090, -8783233.4...
1    POLYGON ((-8775502.471 5303751.845, -8775340.9...
2    POLYGON ((-8776164.159 5293546.540, -8776199.1...
3    POLYGON ((-8776248.761 5286445.574, -8777970.1...
4    MULTIPOLYGON (((-8783299.317 5299070.664, -878...
Name: geometry, dtype: geometry
In [13]:
shp_gdf.plot(column='name', figsize=(9,9)); #notice that the axes are in meters, not degrees lat/long

Use SODA API

Another approach to reading a geospatial dataset into a geopandas dataframe is to use the SODA API. In the export window, click on SODA API. Select the API Endpoint and copy (ctrl c) it to a string variable in your notebook. Change .json to geojson then read the file.

SODA API Endpoint

SEE BELOW

In [14]:
endpt_url="https://data.buffalony.gov/resource/j266-6xj4.geojson"
gj_gdf=gpd.read_file(endpt_url)
gj_gdf.plot(column='name', figsize=(9,9));
In [15]:
gj_gdf=gj_gdf.to_crs('epsg:3857')
gj_gdf.loc[gj_gdf.name=='District A'].plot(figsize=(6,6)); #same .loc as in pandas!
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))

Other (non-map) Datasets

Many datasets contain spatial information (mostly point coordinates) but they are not presented as maps. Usually these tables contain a lot of attribute information about specific geographic events like vehicle crashes, pothole locations, or crime locations. There may be latitude and longitude fields (columns) in the table along with a location data type field. If a location datatype field exists, the data can be read as a geojson.

There are many ways to check if the dataset has geospatial location data. In the image below, the view Types has been set to Datasets. You can scroll through the datasets that are available or you can use a search tool to find datasets on a specific theme.

SODA API Endpoint

In the image below, the front page of the (housing) Code Violations dataset is shown. You may quickly determine if the dataset has location information by clicking on the API button, then the extension of the endpoint. If you see GeoJSON as a extension option, the dataset has location information. Change the extension to GeoJSON and click on the copy button to the right of the SODA API endpoint.

SODA API Endpoint

SEE BELOW

In [16]:
api_url="https://data.buffalony.gov/resource/ivrf-k9vm.geojson"
hcv_gdf=gpd.read_file(api_url)
hcv_gdf.tail()
Out[16]:
location_state code_section city location_zip police_district latitude zip state description :@computed_region_kwzn_pe6v ... census_block_group census_tract address :@computed_region_fk4y_hpmh :@computed_region_eziv_p4ck case_number council_district comments case_type geometry
995 None Rubbish and Garbage BUFFALO None None None 14206 NY Accumulation of rubbish or garbage None ... None None 265 BABCOCK None None GEN19-9457843 None <STRONG><FONT size=5>REPAIR THE FIRE DAMAGED I... GENERAL None
996 Swimming pools, spas and hot tubs BUFFALO District D 42.94080034474154 14207 NY Swimming pools 10 ... 2 59 345 EAST 5 72 GEN17-9428186 NORTH <EM>Owner fails to completely surround the poo... GENERAL POINT (-78.90336 42.94080)
997 Fire Protection Systems BUFFALO District D 42.9543728935067 14207 NY Smoke detection system 10 ... 1 58.02 166 CROWLEY 5 5 248227 NORTH Install smoke detectors in and outside all sle... GENERAL POINT (-78.90337 42.95437)
998 Buffalo Ordinance BUFFALO District E 42.94424807903508 14215 NY Rental Registry 11 ... 7 43 518 SHIRLEY 2 7 GEN19-9458254 UNIVERSITY <P class=MsoNormal style="MARGIN: 0in 0in 8pt"... GENERAL POINT (-78.80575 42.94425)
999 Exterior Property Areas BUFFALO District E 42.94888292539797 14215 NY Rodent harborage 11 ... 5 43 385 WINSPEAR 2 7 GEN18-9442488 UNIVERSITY Rubbish, deteriorated garage and wooden fence ... GENERAL POINT (-78.81236 42.94888)

5 rows × 37 columns

RESET LIMIT

Notice that only 1,000 records were downloaded. Most data servers set a limit of 1,000 records or features. If you want more than 1,000 features from a Socrata dataset, append the query ?$limit=100000 to the url string.

You can see how many records are in the dataset either by scrolling down the home (metatdata) page of the dataset or by clicking on the VIEW DATA button and looking at the bottom right side of the window.

data table size

or

SODA API Endpoint

SEE BELOW

In [17]:
api_url="https://data.buffalony.gov/resource/ivrf-k9vm.geojson?$limit=1000000"
hcv_gdf=gpd.read_file(api_url)
hcv_gdf.tail()
Out[17]:
location_state code_section city location_zip police_district latitude zip state description :@computed_region_kwzn_pe6v ... census_block_group census_tract address :@computed_region_fk4y_hpmh :@computed_region_eziv_p4ck case_number council_district comments case_type geometry
84980 Exterior Structure BUFFALO District B 42.91133479357011 14222 NY Doors 12 ... 2 66.02 255 UTICA WEST 1 61 GEN19-9457761 NIAGARA FRONT DOOR HANDLE MISSING. GENERAL POINT (-78.87599 42.91133)
84981 Fire-Resistance Ratings BUFFALO District B 42.911334398329544 14222 NY Fire-resistance-rated assemblies 12 ... 2 66.02 257 UTICA WEST 1 61 GEN19-9457767 NIAGARA None GENERAL POINT (-78.87607 42.91133)
84982 Buffalo Ordinance BUFFALO District E 42.94812417574177 14215 NY Sanitary Drainage 11 ... 5 43 419 HIGHGATE 2 7 GEN18-9442106 UNIVERSITY Upper bathroom toilet seal is compromised and ... GENERAL POINT (-78.81223 42.94812)
84983 Buffalo Ordinance BUFFALO District A 42.86253710992083 14210 NY License/Permit 19 ... 3 11 41 AVONDALE 4 10 GEN18-9443164 SOUTH <P class=MsoNormal style="MARGIN: 0in 1in 10pt... GENERAL POINT (-78.81793 42.86254)
84984 Rubbish and Garbage BUFFALO District D 42.91598254276741 14213 NY Accumulation of rubbish or garbage 2 ... 3 61 5 ARNOLD 5 43 GEN19-9457774 NIAGARA <P>Exterior property and premises, and the int... GENERAL POINT (-78.89027 42.91598)

5 rows × 37 columns

In [18]:
hcv_gdf=hcv_gdf.to_crs('epsg:3857')
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-14788bda72ca> in <module>
----> 1 hcv_gdf=hcv_gdf.to_crs('epsg:3857')

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\geopandas\geodataframe.py in to_crs(self, crs, epsg, inplace)
    543         else:
    544             df = self.copy()
--> 545         geom = df.geometry.to_crs(crs=crs, epsg=epsg)
    546         df.geometry = geom
    547         df.crs = geom.crs

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\geopandas\geoseries.py in to_crs(self, crs, epsg)
    424             proj_out = pyproj.Proj(crs, preserve_units=True)
    425             project = partial(pyproj.transform, proj_in, proj_out)
--> 426         result = self.apply(lambda geom: transform(project, geom))
    427         result.__class__ = GeoSeries
    428         result.crs = crs

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3847             else:
   3848                 values = self.astype(object).values
-> 3849                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3850 
   3851         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\geopandas\geoseries.py in <lambda>(geom)
    424             proj_out = pyproj.Proj(crs, preserve_units=True)
    425             project = partial(pyproj.transform, proj_in, proj_out)
--> 426         result = self.apply(lambda geom: transform(project, geom))
    427         result.__class__ = GeoSeries
    428         result.crs = crs

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\shapely\ops.py in transform(func, geom)
    222     also satisfy the requirements for `func`.
    223     """
--> 224     if geom.is_empty:
    225         return geom
    226     if geom.type in ('Point', 'LineString', 'LinearRing', 'Polygon'):

AttributeError: 'NoneType' object has no attribute 'is_empty'

ERROR - NULL GEOMETRY CELLS

These null values must be removed from the dataset. Determine how many records are missing location information.

In [19]:
orig_rows = hcv_gdf.shape[0] 
hcv_gdf = hcv_gdf.loc[hcv_gdf.geometry.notnull()]
hcv_gdf=hcv_gdf.to_crs('epsg:3857')
hcv_gdf.plot(column='council_district', figsize=(9,9));
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))
In [20]:
print('Records with missing location information = {:,.0f}'.format(orig_rows-hcv_gdf.shape[0]))
Records with missing location information = 1,034

Examples from class

Below are the other examples of reading spatial datasets from open data sources done in class.

Buffalo Bike Lanes (Map)

Line data.

Bike Lanes

In [21]:
url2="https://data.buffalony.gov/resource/525g-icq5.geojson"
bikes=gpd.read_file(url2)
bikes.plot(column='facility',figsize=(9,9),legend=True);
In [22]:
bikes=bikes.to_crs('epsg:3857')
bikes.plot(column='facility',figsize=(9,9),legend=True);
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))

Calculate line length

using Geopandas .length method. Since the units of the projection is meters, line length is in kilometers.

In [23]:
bikes['lengthkm']=bikes.geometry.length/1000
bikes.head()
Out[23]:
objectid_1 objectid road_name facility shape_stlength geometry lengthkm
0 52 52 N Division St On Road - Dedicated Bike Lanes 1003.6427666313491 MULTILINESTRING ((-8780347.218 5294202.505, -8... 0.416907
1 78 78 Genesee St On Road - Dedicated Bike Lanes 1415.1931220849217 MULTILINESTRING ((-8780022.491 5295063.489, -8... 0.588323
2 70 70 Elmwood Ave On Road - Dedicated Bike Lanes 1555.1689861388118 MULTILINESTRING ((-8780559.613 5298117.885, -8... 0.648566
3 3 3 Shoreline Trail Off Street - Multi-Use Path 93589.381027630065 MULTILINESTRING ((-8777976.506 5286480.160, -8... 38.969682
4 79 79 Broadway On Road - Dedicated Bike Lanes 6932.1224240204519 MULTILINESTRING ((-8776346.951 5295699.366, -8... 2.880304

Buffalo 311 Service Requests (Dataset)

Point data.

311 Service Requests

311 SR size

In [24]:
url311="https://data.buffalony.gov/resource/whkc-e5vr.geojson?$limit=1000000"
SR311=gpd.read_file(url311)
SR311.tail()
Out[24]:
location_state zip_code city location_zip police_district x_coordinate subject latitude state :@computed_region_kwzn_pe6v ... census_tract open_date address_number :@computed_region_fk4y_hpmh type :@computed_region_eziv_p4ck address_line_1 closed_date council_district geometry
773228 None 14215 Buffalo None District E -8774180.272 Dept of Public Works 42.93641679665284 NY 11 ... 42 2018-08-07T13:28:00 26 2 Pot Hole (Req_Serv) 39 ELMER 2018-08-21T07:31:00 MASTEN POINT (-78.81955 42.93642)
773229 None 14201 Buffalo None District B None Dept of Public Works 42.90007408886808 NY 15 ... 68 2018-08-07T11:10:00 118 1 Totes Replace (Req_Serv) 13 MARINER 2018-09-18T14:19:00 FILLMORE POINT (-78.87820 42.90007)
773230 None 14220 Buffalo None District A -8774516.167 Dept of Public Works 42.84527394758654 NY 9 ... 8 2018-08-07T12:45:00 21 4 Totes Combo (Req_Serv) 71 BLOOMFIELD 2018-09-24T13:34:00 SOUTH POINT (-78.82288 42.84527)
773231 None 14220 Buffalo None District A -8774649.155 Dept of Public Works 42.854087528086424 NY 9 ... 2 2018-08-07T15:22:00 194 4 Totes Replace (Req_Serv) 1 SOUTHSIDE 2018-09-24T12:54:00 SOUTH POINT (-78.82371 42.85409)
773232 None 14215 Buffalo None District E -8773139.368 Dept of Public Works 42.94732819699567 NY 11 ... 43 2018-08-07T13:19:00 83 2 Totes Combo (Req_Serv) 7 ROUNDS 2018-10-11T12:55:00 UNIVERSITY POINT (-78.81039 42.94733)

5 rows × 38 columns

In [25]:
SR311=SR311.to_crs('epsg:3857')
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-bcbf9075f4d4> in <module>
----> 1 SR311=SR311.to_crs('epsg:3857')

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\geopandas\geodataframe.py in to_crs(self, crs, epsg, inplace)
    543         else:
    544             df = self.copy()
--> 545         geom = df.geometry.to_crs(crs=crs, epsg=epsg)
    546         df.geometry = geom
    547         df.crs = geom.crs

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\geopandas\geoseries.py in to_crs(self, crs, epsg)
    424             proj_out = pyproj.Proj(crs, preserve_units=True)
    425             project = partial(pyproj.transform, proj_in, proj_out)
--> 426         result = self.apply(lambda geom: transform(project, geom))
    427         result.__class__ = GeoSeries
    428         result.crs = crs

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3847             else:
   3848                 values = self.astype(object).values
-> 3849                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3850 
   3851         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\geopandas\geoseries.py in <lambda>(geom)
    424             proj_out = pyproj.Proj(crs, preserve_units=True)
    425             project = partial(pyproj.transform, proj_in, proj_out)
--> 426         result = self.apply(lambda geom: transform(project, geom))
    427         result.__class__ = GeoSeries
    428         result.crs = crs

~\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\shapely\ops.py in transform(func, geom)
    222     also satisfy the requirements for `func`.
    223     """
--> 224     if geom.is_empty:
    225         return geom
    226     if geom.type in ('Point', 'LineString', 'LinearRing', 'Polygon'):

AttributeError: 'NoneType' object has no attribute 'is_empty'
In [26]:
orig_rows = SR311.shape[0]
SR311 = SR311[SR311.geometry.notnull()] #Remove records with missing location data
SR311=SR311.to_crs('epsg:3857')
C:\Users\mixwa\AppData\Local\Continuum\anaconda3\envs\GEG584\lib\site-packages\pyproj\crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))
In [27]:
num_miss=orig_rows-SR311.shape[0]
pct_miss= (num_miss/orig_rows)*100
print('Records with missing location information = {:,.0f}\nPercent of database = {:.1f}% '.format(num_miss,pct_miss))
Records with missing location information = 75,753
Percent of database = 9.8% 
In [28]:
SR311.plot(column='type',figsize=(9,9)); #legend too large, too many categories!

Assignment

Using the Open Data Network, read and display a point, line, and polygon dataset.

  1. Document each data set. Where it came from, how many records are in it, etc.
  2. Change the CRS to 3857. (Is location data available for all features/records?)
  3. For polygons, calculate area of each polygon in sq. km. (see geopandas .area).
  4. For lines, calculate the length of each line in km.
  5. Plot each dataset using an attribute field (column) to different features by color. If the legend isn't too large, turn it on.
In [ ]: