A few things you should know
- This tutorial implies some basic understanding of data manipulation, working with GIS files, and basic Tableau skills
- This is not a comprehensive tutorial - I will not be explaining each tool used in-depth but will link to resources to help you if you're unfamiliar.
- There are many ways to accomplish this technique, but this is my preferred method even if it may not be the most efficient.
What are we building?
In this tutorial we will be building a disaggregated view of the racial distribution of Dallas, TX. We will be visualizing each individual from the US Census Bureau’s decennial Census by race and plot them on a map in QGIS.
This kind of visualization technique is useful to show the humans behind the data. It's an effective way to visualize distributions and density on a map in a novel way and has applications outside of visualizing race.
I used this technique recently on my Tableau Visualization of The Day.
QGIS - A free and open-source cross-platform desktop geographic information system application that supports viewing, editing, and analysis of geospatial data.
R Studio - RStudio is an integrated development environment for R, a programming language for statistical computing and graphics.
Tableau - Is a visualization and data analysis tool that can help anyone see and understand their data.
tidyverse - is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
tidycensus - is an R package that allows users to interface with the US Census Bureau’s decennial Census and five-year American Community APIs and return tidyverse-ready data frames, optionally with simple feature geometry included.
Censusapi - is the API key source needed for the tutorial.
Okay, so first we need to grab our data.
Start the tutorial by signing up for a census api key - you'll need this to pull the data from the US Census and is required for the rest of the tutorial.
Once you have that information you'll need to open up R Studio and create a new R Script which can be done using File -> New File -> New R Script
Now, we will use tidyverse and tidycensus to pull the amount of race for each person in Dallas.
Here's the full code:
library(tidycensus) library(tidyverse) library(sf) options(tigris_use_cache = TRUE) census_api_key("YOUR KEY") racevars <- c(White = "P005003", Black = "P005004", Asian = "P005006", Hispanic = "P004003") dallas <- get_decennial(geography = "block group", variables = racevars, state = "TX", county = "Dallas County", geometry = TRUE, summary_var= "P001001") head(dallas) dallasrace <- dallas %>% spread(variable,value) head(dallasrace) st_write(dallasrace, "widedallas.shp")
Let's break this down step by step so we understand what we are doing with the data:
- Load the Libraries
library(tidycensus) library(tidyverse) library(sf) options(tigris_use_cache = TRUE) census_api_key("YOUR KEY") # setting our API key
First, we need to load our libraries into R Studio so that we can use them to transform and access our data that we'll use for our visualization.
2. Load the Census Data
# Set the names for the different ethniticities that we are studying racevars <- c(White = "P005003", Black = "P005004", Asian = "P005006", Hispanic = "P004003") # Perform the API call dallas <- get_decennial(geography = "block group", variables = racevars, state = "TX", county = "Dallas County", geometry = TRUE, summary_var= "P001001")
To begin, we need to define what we are actually looking to visualize. For this tutorial, we are looking at the ethnic breakdown in Dallas. There are a large variety of ethnicities that we could study but I've chosen four: White, Black, Asian and Hispanic for this project.
Next, we perform the API call that tidycensus will use to populate the data for us. We are choosing "block group" so we can get some of the most granular data (block is the most granular), we set the variables to our racevars, we define our location, and then we set geometry = TRUE to ensure we get our shapefiles.
Let's look at what the head(dallas) returns:
So, this is what we are looking for. We can see that the race names(variable) and their counts(value) have been returned. We can also see the MULTIPOLYGON referring to our block groups.
However, we still have one last thing to do before we can visualize this info in QGIS. We need to represent each race as its own column.
3. Unstack the data for visualization and export
dallasrace <- dallas %>% spread(variable,estimate) head(dallasrace) st_write(dallasrace, "widedallas.shp")
Using the tidyverse library we are able to do many transformations of data. In order to unstack our columns for visualization in QGIS we can use the function 'spread()' which returns this output:
We then use st_write to export our shapefile to be transformed in QGIS.
Next, we transform the data in QGIS
- Bring the Data into QGIS.
Use the Add Layer tool in QGIS and select "Vector" to add your shapefile to the map.
One thing that we need to is ensure that our map plays nicely with Tableau. In order to do this we need to make sure that our map is in the correct projection. The difference in projections is outside the scope of this tutorial but in short:
WGS 84/Pseudo-Mercator(3857) is a variant of the Mercator projection and is the de facto standard for Web mapping applications.
Our map is natively in WGS84(4326) and therefore must be converted to render correctly in Tableau.
This is easily done:
2. Generate the points representing the people.
In order to generate the points for each individual, you need to follow a three step process for each race. Let's plot all the Black individuals in Dallas first.
- Navigate to the Vector - > Research Tools - > Generate Random Points in Polygon option
- Use the following settings and run the tool:
3. Rename the Layer to "Black"
Repeat the process for each of the races and you'll end up with something like this:
Finally, ready for visualization in Tableau
- Merge the files - Before you export the information to Tableau we need to pull together all the individual layers and merge them together. This can be accomplished using the Merge Vector Layer tool in QGIS which can be found using Vector - > Data Management Tools -> Merge Vector Layers. For the Input Layer, select all four races and set the Destination CRS as 3857. Save and export this file.
- Load the data into Tableau
When you load the data into Tableau you'll see the file looks like this:
Go ahead and drop the "Path" column using the 'hide' function in Tableau and open a new worksheet.
3. Visualize in Tableau
In your new worksheet, drag Geometry onto the sheet and you'll see all the points visualized in your worksheet. I suggest adjusting the size of the points, lowering the opacity to 20% and removing the border now.
Drag the "Layer" field from the Tables onto the color card for Tableau, change the map to "Streets", zoom in a bit, and you should see something like this:
The points are a little too large still and hard to distinguish from one another. One way to solve for this is to drag the Id field to Detail and then change the mark from Automatic to Circle
Ah, much better. You can clearly see the distribution of people by race in Dallas.
From here it's a matter of styling the map and points or potentially highlighting other cities or areas of the world.
Thank you for reading, and I hope you found this useful. If you have any questions, find me on Twitter and ask me anything. 🙏
Steven Shoemaker Newsletter
Join the newsletter to receive the latest updates in your inbox.