Using the US Census Geocoder¶

Introduction
Census Geocoder Features
Overview
- How the Census Geocoder Works
1. Installing the Census Geocoder
- Dependencies
2. Import the Census Geocoder
3. Geocoding
4. Working with Results

Introduction ¶

What is Geocoding?¶

Hint

The act of determining a specific, canonical location based on some input data.

See also

Forward Geocoding

Reverse Geocoding

What we typically know about a specific location or geographical area is fuzzy. We might know part of the address, or refer to the address with abbreviations, or describe a general area, etc. It’s ambiguous, fuzzy, and unclear. That makes getting specific, canonical, and precise data about that geographic location challenging. Which is where the process of geocoding comes into play.

Geocoding is the process of getting a specific, precise, and canonical determination of a geographical location (a place or geographic feature) or of a geographical area (encompassing multiple places or geographic features).

A canonical determination of a geographical location or geographical area is defined by the meta-data that is returned for that location/area. Things like the canonical address, or various characteristics of the geographical area, etc. represent the “canonical” information about that location / area.

The process of geocoding returns exactly that kind of canonical / official / unambiguous meta-data about one or more geographical locations and areas based on a set of inputs. Some inputs may be expected to be imprecise or partial (e.g. addresses, typically used for forward geocoding) while others are expected to be precise but with incomplete information (e.g. longitude and latitude coordinates used in reverse geocoding).

Why the Census Geocoder?¶

Geocoding is used for many thing, but the Census Geocoder API in particular is meant to provide the US Census Bureau’s canonical meta-data about identified locations and areas. This meta-data is then typically used when executing more in-depth analysis on data published by the US Census Bureau and other departments of the US federal and state governments.

Because the US government uses a very complicated and overlapping hierarchy of geographic areas, it is essential when working with US government data to start from the precise identification of the geographic areas and locations of interest.

But using the Census Geocoder API to get this information is non-trivial in its complexity. That’s both because the API has limited documentation on the one hand, and because its syntax is non-pythonic and requires extensive familiarity with the internals of the (complicated) datasets that the US Census Bureau manages/publishes.

The Census Geocoder library is meant to simplify all of that, by providing an easy-to-use, batteries-included, pythonic wrapper around the Census Geocoder API.

Census Geocoder vs. Alternatives ¶

While we’re partial to the US Census Geocoder as our primary means of interacting with the Census Geocoder API, there are obviously alternatives for you to consider. Some might be better for your use specific use cases, so here’s how we think about them:

The Census Geocoder API is a straightforward RESTful API. Which means that you can just execute your own HTTP requests against it, retrieve the JSON results, and work with the resulting data entirely yourself. This is what I did for years, until I got tired of repeating the same patterns over and over again, and decided to build the Census Geocoder instead.

For a super-simple use case, probably the most expedient way to do it. But of course, more robust use cases would require your own scaffolding with built-in retry-logic, object representation, error handling, etc. which becomes non-trivial.

Why not use a library with batteries included?

Tip

When to use it?

In practice, I find that rolling my own solution is great when it’s an extremely simple use case, or a one-time operation (e.g. in a Jupyter Notebook) with no business logic to speak of. It’s a “quick-and-dirty” solution, where I’m trading rapid implementation (yay!) for less flexibility/functionality (boo!).

Considering how easy the Census Geocoder is to use, however, I find that I never really roll my own scaffolding when working with the Census Geocoder API.

The Census Geocode library is fantastic, and it was what I had used before building the Census Geocoder library. However, it has a number of significant limitations when compared to the US Census Geocoder:

Results are returned as-is from the Census Geocoder API. This means that:
- Results are essentially JSON objects represented as dict, which makes interacting with them in Python a little more cumbersome (one has to navigate nested dict objects).
- Property/field names are as in the original Census data. This means that if you do not have the documentation handy, it is hard to intuitively understand what the data represents.
The library is licensed under GPL3, which may complicate or limit its utilization in commercial or closed-source software operating under different (non-GPL) licenses.
The library requires you to remember / apply a lot of the internals of the Census Geocoder API as-is (e.g. benchmark vintages) which is complicated given the API’s limited documentation.
The library does not support custom layers, and only returns the default set of layers for any request.

The Census Geocoder explicitly addresses all of these concerns:

The library uses native Python classes to represent results, providing a more pythonic syntax for interacting with those classes.
Properties / fields have been renamed to more human-understandable names.
The Census Geocoder is made available under the more flexible MIT License.
The library streamlines the configuration of benchmarks and vintages, and provides extensive documentation.
The library supports any and all layers supported by the Census Geocoder API.

Tip

When to use it?

Census Geocode has one advantage over the US Census Geocoder: It has a CLI.

I haven’t found much use for a CLI in the work I’ve done with the Census Geocoder API, so have not implemented it in the US Census Geocoder. Might add it in the future, if there are enough feature requests for it.

Given the above, it may be worth using Census Geocode instead of the Census Geocoder if you expect to be using a CLI.

The CensusBatchGeocoder is a fantastic library produced by the team at the Los Angeles Times Data Desk. It is specifically designed to provide a fairly pythonic interface for doing bulk geocoding operations, with great pandas serialization / de-serialization support.

However, it does have a couple of limitations:

Stale / Unmaintained? The library does not seem to have been updated since 2017, leading me to believe that it is stale and unmaintained. There are numerous open issues dating back to 2020, 2018, and 2017 that have seen no activity.
No benchmark/vintage/layer support. The library does not support the configuration of benchmarks, vintages, or layers.
Limited error handling. The library has somewhat limited error handling, judging by the issues that have been reported in the repository.
Optimized for bulk operations. The design of the library has been optimized for geocoding in bulk, which makes transactional one-off requests cumbersome to execute.

The Census Geocoder is obviously fresh / maintained, and has explicitly implemented robust error handling, and support for benchmarks, vintages, and layers. It is also designed to support bulk operations and transactional one-off requests.

Tip

When to use it?

CensusBatchGeocoder has one advantage over the US Census Geocoder: It can serialize results to a pandas DataFrame seamlessly and simply.

This is a useful feature, and one that I have added/pinned for the US Census Geocoder. If there are enough requests / up-votes on the issue, I may extend the library with this support in the future.

Given all this, it may be worth using CensusBatchGeocoder instead of the US Census Geocoder if you expect to be doing a lot of bulk operations using the default benchmark/vintage/layers.

Census Geocoder Features ¶

Easy to adopt. Just install and import the library, and you can be forward geocoding and reverse geocoding with just two lines of code.
Extensive documentation. One of the main limitations of the Geocoder API is that its documentation is scattered across the different datasets released by the Census Bureau, making it hard to navigate and understand. We’ve tried to fix that.
Location Search
- Using Geographic Coordinates (reverse geocoding)
- Using a One-line Address
- Using a Parametrized Address
- Using Batched Addresses
Geography Search
- Using Geographic Coordinates (reverse geocoding)
- Using a One-line Address
- Using a Parametrized Address
- Using Batched Addresses
Supports all available benchmarks, vintages, and layers.
Simplified syntax for indicating benchmarks, vintages, and layers.
No more hard to interpret field names. The library uses simplified (read: human understandable) names for location and geography properties.

Overview ¶

How the Census Geocoder Works ¶

The Census Geocoder works with the Census Geocoder API by providing a thin Python wrapper around the APIs functionality. Rather than having to construct your own HTTP requests against the API itself, you can instead work with Python objects and functions the way you normally would.

In other words, the process is very straightforward:

Install the Census Geocoder library. (see here)
Import the geocoder. (see here)
Geocode something - either locations or geographies. (see here)
Work with your geocoded locations or geographical areas. (see here)

And that’s it! Once you’ve done the steps above, you can easily geocode one-off requests or batch many requests into a single transaction.

1. Installing the Census Geocoder ¶

To install the US Census Geocoder, just execute:

$ pip install census-geocoder

Dependencies ¶

Validator-Collection v1.5.0 or higher

Backoff-Utils v1.0.1 or higher

Requests v2.26 or higher

2. Import the Census Geocoder ¶

Importing the Census Geocoder is very straightforward. You can either import its components precisely (see API Reference) or simply import the entire module:

# Import the entire module.
import census_geocoder as geocoder

result = geocoder.location.from_address('4600 Silver Hill Rd, Washington, DC 20233')
result = geocoder.geography.from_address('4600 Silver Hill Rd, Washington, DC 20233')

# Import precise components.
from census_geocoder import Location, Geography

result = Location.from_address('4600 Silver Hill Rd, Washington, DC 20233')
result = Geography.from_address('4600 Silver Hill Rd, Washington, DC 20233')

3. Geocoding ¶

Geocoding a location means to retrieve canonical meta-data about that location. Think of it as getting the “official” details for a given place. Using the Census Geocoder, you can geocode locations given:

A single-line address (whole or partial)

A parametrized address where you know its components parts

A set of longitude and latitude coordinates

A batch file in CSV or TXT format

However, the Census Geocoder API provides two different sets of meta-data for any canonical location:

Location Data. Think of it as the canonical address for a given location/place.

Geographic Area Data. Think of it as canonical information about the (different) areas that contain the given location/place.

Using the Census Geocoder library you can retrieve both types of information.

Hint

When retrieving geographic area data, you also get location data.

Getting Location Data ¶

Retrieving data about canonical locations is very straightforward. You have four different ways to get this information, depending on what information you have about the location you want to geocode:

import census_geocoder as geocoder

result = geocoder.location.from_address('4600 Silver Hill Rd, Washington, DC 20233')

See also

Location.from_address()

import census_geocoder as geocoder

result = geocoder.location.from_address(street = '4600 Silver Hill Rd',
                                        city = 'Washington',
                                        state = 'DC',
                                        zip_code = '20233')

See also

Location.from_address()

import census_geocoder as geocoder

result = geocoder.location.from_coordinates(longitude = -76.92744,
                                            latitude = 38.845985)

See also

Location.from_coordinates()

import census_geocoder as geocoder

result = geocoder.location.from_batch(file_ = '/my-csv-file.csv')

Caution

The batch file indicated can have a maximum of 10,000 records.

Warning

While the Census Geocoder API supports CSV, TXT, XLSX, and DAT formats the Census Geocoder library only supports CSV and TXT formats so as to avoid dependency-bloat (read: Why rely on other libraries to read XLSX format data?).

See also

Location.from_batch()

Getting Geographic Area Data ¶

Retrieving data about the geographic areas that contain a given location/place is just as straightforward as getting location data. In fact, the syntax is almost identical. Just swap out the word 'location' for 'geography' and you’re done!

Here’s how to do it:

import census_geocoder as geocoder

result = geocoder.geography.from_address('4600 Silver Hill Rd, Washington, DC 20233')

See also

GeographicArea.from_address()

import census_geocoder as geocoder

result = geocoder.geography.from_address(street = '4600 Silver Hill Rd',
                                         city = 'Washington',
                                         state = 'DC',
                                         zip_code = '20233')

See also

GeographicArea.from_address()

import census_geocoder as geocoder

result = geocoder.geography.from_coordinates(longitude = -76.92744,
                                             latitude = 38.845985)

See also

GeographicArea.from_coordinates()

import census_geocoder as geocoder

result = geocoder.geography.from_batch(file_ = '/my-csv-file.csv')

Caution

The batch file indicated can have a maximum of 10,000 records.

Warning

While the Census Geocoder API supports CSV, TXT, XLSX, and DAT formats the Census Geocoder library only supports CSV and TXT formats so as to avoid dependency-bloat (read: Why rely on other libraries to read XLSX format data?).

See also

GeographicArea.from_batch()

Benchmarks and Vintages ¶

The data returned by the Census Geocoder API is different from typical geocoding services, in that it is time-sensitive. A geocoding service like the Google Maps API or Here.com only cares about the current location. But the US Census Bureau’s information is inherently linked to the statistical data collected by the US Census Bureau at particular moments in time.

Thus, when making requests against the Census Geocoder API you are always asking for geographic location data or geographic area data as of a particular date. You might think “geographies don’t change”, but in actuality they are constantly evolving. Congressional districts, school districts, town lines, county lines, street names, house numbers, etc. are all constantly evolving. And to ensure that the statistical data is tied to the locations properly, that alignment needs to be maintained through two key concepts:

Benchmarks

Vintages

The benchmark is the time period when geographic information was snapshotted for use / publication in the Census Geocoder API. This is typically done twice per year, and represents the “geographic definitions as of the time period indicated by the benchmark”.

The vintage is the census or survey data that the geographies are linked to. Thus, the geographic identifiers or statistical data associated with locations or geographic areas within a given benchmark are also linked to a particular vintage of census/survey data. Trying to use those identifiers or statistical data with a different vintage of data may produce inaccurate results.

The Census Geocoder API supports a variety of benchmarks and vintages, and they are unfortunately poorly documented and difficult to interpret. Therefore, the Census Geocoder has been designed to streamline and simplify their usage.

Vintages are only available for a given benchmark. The table below provides guidance on the vintages and benchmarks supported by the Census Geocoder:

	BENCHMARKS
	Current	Census2020
VINTAGES	Current	Census2020
	Census2020	Census2010
	ACS2019
	ACS2018
	ACS2017
	Census2010

When using the Census Geocoder, you can supply the benchmark and vintage directly when executing your geocoding request:

import census_geocoder as geocoder

result = geocoder.location.from_address('4600 Silver Hill Rd, Washington, DC 20233',
                                        benchmark = 'Current',
                                        vintage = 'ACS2019')

result = geocoder.geography.from_address('4600 Silver Hill Rd, Washington, DC 20233',
                                         benchmark = 'Current',
                                         vintage = 'ACS2019')

See also

import census_geocoder as geocoder

result = geocoder.location.from_address(street = '4600 Silver Hill Rd',
                                        city = 'Washington',
                                        state = 'DC',
                                        zip_code = '20233',
                                        benchmark = 'Current',
                                        vintage = 'ACS2019')

result = geocoder.geography.from_address(street = '4600 Silver Hill Rd',
                                         city = 'Washington',
                                         state = 'DC',
                                         zip_code = '20233',
                                         benchmark = 'Current',
                                         vintage = 'ACS2019')

See also

import census_geocoder as geocoder

result = geocoder.location.from_coordinates(longitude = -76.92744,
                                            latitude = 38.845985,
                                            benchmark = 'Current',
                                            vintage = 'ACS2019')

result = geocoder.geography.from_coordinates(longitude = -76.92744,
                                             latitude = 38.845985,
                                             benchmark = 'Current',
                                             vintage = 'ACS2019')

See also

import census_geocoder as geocoder

result = geocoder.location.from_batch(file_ = '/my-csv-file.csv',
                                      benchmark = 'Current',
                                      vintage = 'ACS2019')

result = geocoder.geography.from_batch(file_ = '/my-csv-file.csv',
                                       benchmark = 'Current',
                                       vintage = 'ACS2019')

See also

Hint

Several important things to be aware of when it comes to benchmarks and vintages in the Census Geocoder library:

Unless over-ridden by the CENSUS_GEOCODER_BENCHMARK or CENSUS_GEOCODER_VINTAGE environment variables, the benchmark and vintage default to 'Current' and 'Current' respectively.

The benchmark and vintage are case-insensitive. This means that you can supply 'Current', 'CURRENT', or 'current' and it will all work the same.

If you want to set a different default benchmark or vintage, you can do so by setting CENSUS_GEOCODER_BENCHMARK and CENSUS_GEOCODER_VINTAGE environment variables to the defaults you want to use.

Layers ¶

When working with the Census Geocoder API (particularly when getting geographic area data), you have the ability to control which types of geographic area get returned. These types of geographic area are called “layers”.

An example of two different “layers” might be “State” and “County”. These are two different types of geographic area, one of which (County) may be encompassed by the other (State). In general, geographic areas within the same layer cannot and do not overlap. However different layers can and do overlap, where one layer (State) may contain multiple other layers (Counties), or one layer (Metropolitan Statistical Areas) may partially overlap multiple entities within a different layer (States).

When using the Census Geocoder you can easily specify the layers of data that you want returned. Unless overridden by the CENSUS_GEOCODER_LAYERS environment variable, the layers returned will always default to 'all'.

Which layers are available is ultimately determined by the vintage of the data you are retrieving. The following represents the list of layers available in each vintage:

Note

You may notice that there are (logical) duplicate layers in the lists above, for example “2010 Census PUMAs” and “2010 Census Public Use Microdata Areas”. This is because there are multiple ways that users of Census data may refer to particular layers in their work. This duplication is purely for the convenience of Census Geocoder users, since the Census Geocoder API actually uses numerical identifiers for the layers returned.

When geocoding data, you can simply supply the layers you want using the layers keyword argument as below:

import census_geocoder as geocoder

result = geocoder.location.from_address('4600 Silver Hill Rd, Washington, DC 20233',
                                        benchmark = 'Current',
                                        vintage = 'ACS2019',
                                        layers = 'Census Tracts, States, CDPs, Divisions')

result = geocoder.geography.from_address('4600 Silver Hill Rd, Washington, DC 20233',
                                         benchmark = 'Current',
                                         vintage = 'ACS2019',
                                         layers = 'Census Tracts, States, CDPs, Divisions')

See also

import census_geocoder as geocoder

result = geocoder.location.from_address(street = '4600 Silver Hill Rd',
                                        city = 'Washington',
                                        state = 'DC',
                                        zip_code = '20233',
                                        benchmark = 'Current',
                                        vintage = 'ACS2019',
                                        layers = 'Census Tracts, States, CDPs, Divisions')

result = geocoder.geography.from_address(street = '4600 Silver Hill Rd',
                                         city = 'Washington',
                                         state = 'DC',
                                         zip_code = '20233',
                                         benchmark = 'Current',
                                         vintage = 'ACS2019',
                                         layers = 'Census Tracts, States, CDPs, Divisions')

See also

import census_geocoder as geocoder

result = geocoder.location.from_coordinates(longitude = -76.92744,
                                            latitude = 38.845985,
                                            benchmark = 'Current',
                                            vintage = 'ACS2019',
                                            layers = 'Census Tracts, States, CDPs, Divisions')

result = geocoder.geography.from_coordinates(longitude = -76.92744,
                                             latitude = 38.845985,
                                             benchmark = 'Current',
                                             vintage = 'ACS2019',
                                             layers = 'Census Tracts, States, CDPs, Divisions')

See also

import census_geocoder as geocoder

result = geocoder.location.from_batch(file_ = '/my-csv-file.csv',
                                      benchmark = 'Current',
                                      vintage = 'ACS2019')

result = geocoder.geography.from_batch(file_ = '/my-csv-file.csv',
                                       benchmark = 'Current',
                                       vintage = 'ACS2019',
                                       layers = 'Census Tracts, States, CDPs, Divisions')

See also

Hint

When using the Census Geocoder to return geographic area data, you can request multiple layers worth of data by passing them in a comma-delimited string. This will return separate data for each layer indicated. The comma-delimited string can include white-space for easy readability, which means that the following two values are considered identical:

layers = 'Census Tracts, States, CDPs, Divisions'

layers = 'Census Tracts,States,CDPs,Divisions'

To retrieve all available layers that have data for a given location, you can submit 'all'. Unless you have set the CENSUS_GEOCODER_LAYERS environment variable to a different value, 'all' is the default set of layers that will be returned.

Note that layer names in the Census Geocoder are case-insensitive.

4. Working with Results ¶

If all geographical area data is contained within a Location, why differentiate between working with location data and working with geographical area data at all?

The answer is two-fold: use case and performance. The act of geocoding is very simple and occurs at the level of a given Location. This process is done as soon as the Census Geocoder API has determined a canonical location (a MatchedAddress). Typically, use cases that need that geocoded canonical address require it to be very fast, and that’s how the Census Geocoder API has been optimized.

However, pulling geographical area data relies on first determining the canonical location. And then, it has to pull a set of additional geographical area meta-data for that canonical location’s geographical surroundings. That takes time, and the more layers you request, the longer that process will take.

Therefore, both the Census Geocoder API and the Census Geocoder library differentiate between the two so that you can use the more-performant location-only API calls when appropriate, and the less-performant but more robust geographical area API calls as needed.

Now that you’ve geocoded some data using the Census Geocoder, you probably want to work with your data. Well, that’s pretty easy since the Census Geocoder returns native Python objects containing your location or geographical area data.

Shared Methods ¶

Most of what you will do with your results is read properties from them so as to consume or use the canonical location/geographic meta-data in your application. However, there are a number of methods that are shared between both location data and geographic area data that may prove helpful:

inspect(as_census_fields=False)¶

Parameters: as_census_fields (bool) – If True, returns the properties using the Census field name rather than the Census Geocoder (user-friendly) property name. Defaults to False.

Returns a list of the properties that are populated with values in the object.

Return type: list of str

to_dict()¶

Serializes the data for the location/geographic area into a dict that conforms directly to the output from the Census Geocoder API.

Return type: dict

to_json()¶

Serializes the data for the location/geographic area into a str containing a JSON object that conforms directly to the output from the Census Geocoder API.

Return type: str

Location Data ¶

When working with location data, there are two principle sets of meta-data made available:

Input. This is the input that was submitted to the Census Geocoder API, and it includes:
- The address that you submitted.
- The benchmark requested.
- The vintage requested.
Matched Addresses. This is a collection of addresses that the Census Geocoder API returned as the canonical addresses for your inputs.

Each matched address exposes its key meta-data, including:

The address components in a term:parametrized <parametrized address> form.

The address in a single-line form.

The Tigerline identifier information for the address.

The side of the street where the address can be found, per the Tigerline data.

See also

Geographical Area Data ¶

Geographical area data is always returned within the context of a MatchedAddress instance, which itself is always contained within a Location instance. That matched address will have a .geographies property, which will contain a GeographyCollection. That .geographies property is what contains the detailed geographical area meta-data for all geographical areas returned in response to your API request.

Each layer requested is contained in a property of the GeographyCollection. For example, the relevant regions would be contained in the .regions property, while the relevant census tracts would be contained in the .tracts property.

See also

For a full list of the properties/layers that are available within a GeographyCollection, please see the detailed API reference:

GeographyCollection

If a layer is not requested (or is irrelevant for a given benchmark / vintage), then its corresponding property in the GeographyCollection will be None.

Within each layer/property, you will find a collection of Geography instances (technically, layer-specific sub-class instances). Each of these instances represents a geographical area returned by the Census Geocoder API, and their properties will contain the meta-data returned by that API.

Because different types of geographical area return different meta-data, there is a useful .inspect() method that will tell you what meta-data properties are available / have data.

The most universal properties (and the ones that are going to prove most useful when working with other Census Bureau datasets) are:

.geoid which contains the GEOID (unique consolidated identifier for the geographical area)

.name which contains the human-readable name of the geographical area

.geography_type which contains a human-readable label for the instances’s geographical area/layer type

.functional_status which contains a human-readable indication of the geographical area’s functional status

See also