As some of you may or may not know, I am one of St. Louis’ Alteryx User Group Leaders! I was (thankfully) allowed to head up the group last summer, and although we have been rather quiet, I was extremely excited to host our first official meet up for this year this past Monday! And while this was co-hosted by Josh Brandt and Tim Bass of Slalom who spoke on Alteryx Certifications and new features like the Python and Visualytics tools, respectively, I thought I would recap just my end-to-end process of what I presented: a ZIP code spatial map counting the different areas where the attendees are from.
Before We Begin
When starting any analysis, I try to follow a typical analytical process. This case being no different, I would need to follow my typical steps:
- Define the Problem
- Make sure I have access to the data
- Cleanse the data as needed
- Explore the data
- Report the data
- Productionize the data
Defining the Problem
In the early talks of putting the user group meeting together, we decided we wanted to do something with spatial analytics. On our conference call, we all agreed it would be cool if we could give an idea of where everyone was from. Sounding like a cool challenge, I stepped up to try my hand at creating a workflow to solve this. In doing so, I defined the problem as:
Taking the registrant’s input, create a spatial map that counts where different user group attendees are from.
Accessing the data
In taking on the challenge, I knew I needed a couple of key components:
- A base spatial map that included all the ZIP codes and counties for the Greater St. Louis area, which includes both Missouri and Illinois counties.
- Some clearly defined ZIP codes, names, and other categorical variables to help put descriptors to the polygons that would make up my spatial map.
- The ZIP codes from the attendees.
The ZIP codes were provided from the registrations of attendees (given to me by the wonderful Alteryx Community crew), so that part I didn’t need to worry about. The other parts would take some digging, however.
In order to get the polygons needed to create a spatial map, I would need a shapefile – a file that contains spatial data that draws the lines, points, or polygons for spatial data. Fortunately, I was able to find a website that contains a shapefile for the entire US! From there, I would just need to trim it down to just the Greater St. Louis area. In order to do this, I free-hand drew a polygon on a Map Input tool over the area that I wanted to capture.
As for naming conventions, I had to really dig to find a good source of ZIP code to county names. This proved to be tougher than first imagined as some ZIP codes are split between counties and counties can contain multiple ZIP codes. So when you look at a county like St. Louis County, you can lose a piece of that like “Maryland Heights”, “Arnold”, etc. I ended up using this website as a resource and simply copy/paste the data into a text input tool.
This was surprisingly easy in Alteryx. With the data components defined, I just had to join up the different data sources one by one in order to get my map all put together. To start with the actual map, I used a Spatial Match tool to join my Map Input drawing against my US shapefile. I made my drawing the universe (all the data that should exist) and made the shapefile the target (the data I am trying to capture). I then changed the setting to capture all targets within the universe. Throw a Browse tool on my match output and voila! I have my base map!
With my base map, I then just needed to attribute it. A slightly sticky point here: The data I pulled from the website for ZIP code attribution wasn’t clean, so I would need to filter out the unnecessary data points to just get a list of ZIP codes and names. From there, the shapefile had ZIP codes to match to the different polygons, so I joined by ZIP codes for both the shapefile and the attribution data sources. I now had a map that could tell me the names of the shapes that were drawn!
Last but not least, I had the ZIP codes as entered by the registrants. After a little bit of clean-up (auto-field and removing NULLs) and summarizing to count how many people were from different area codes, I was ready to join to my map. I then could put it all the data sources together to get my ideal data set created!
Explore the Data
With everything joined properly, I could explore the data in an easy and efficient way. Since I had a very specific question I was trying to answer, there wasn’t much of a need to do too much exploring (however, I welcome you to try different explorations with my same workflow). After summarizing and simplifying the data, I had my answer that shows the different counts by county and a spatial file to go with it. Now, to make it more report-able…
Reporting the Findings
With the ideal data set created and the Exploratory Data Analysis (EDA) done, I now just needed to make the data “pretty.” In order to turn a report, I should really make the data show up on a map so that way the stakeholders using the report can see the relative distance from each represented county. It helps to answer those next questions like “Are they all from one area?“, “How far out did we reach?“, and “What areas could we do better to outreach to?“
Before creating a map report, I highly recommend reading the documentation first. This will save you a lot of headaches as to what fields you need to adjust, how to layer on the counts, and how to separate your legend so you can adjust it easily. In a nutshell, here’s the steps I took to create the report:
- Inputted data into the Report Map tool
- Adjusted the Settings, including Scale and Base Map
- Selected my data on the Data tab. I also selected the Thematic Field to Counts and the Label Field to City to have my counts and cities show up on the map.
- The layers field took the most tweaking on my first go around to understand how to create what I wanted. There were actually two parts to this:
- Base layer: Here I had to create a base layer for the next layer to build on. I adjusted the Base Layers – Polygons settings so that way I could have a base legend and some base polygons to fill.
- Data Input #1: Next I had to create a new data input on the Layers Tab. This is where I adjusted the actual data that showed up – city names coloring, the legend specs and tiling, and also the fill color scheme.
- Last, but not least, I used the Legend tab to make my legend a separate field so I could adjust it as needed.
With the base map made, I used the Map Legend Splitter and Map Legend Builder tools to take my legend and change the format of it. From there, I joined the fields back to my data and I created a temporary PDF file that I could send to anyone who needed it. I lather, rinsed, and repeated this for another report that broke the maps out by county by simply choosing Grouping Field = County under the Data tab of the Map Report tool. And just for fun, I used some of the other neat spatial tools like heat map and trade areas just to show what else could be done once you have an ideal data set.
Now that I have gone from getting the data all the way to reporting my findings, I could take the next step of productionizing the findings given new ZIP code counts. While I didn’t do this in the actual workflow, I discussed how the workflow could be further enhanced with the user group. One example was that we could generate emails to the stakeholders instead of creating a temporary file and sending that out. We could also turn the workflow into an Alteryx app to replace the ZIP code counts input with an updated, similar file. There are countless other examples we could do, we just need a specific question to answer before starting the enhancements!
While the workflow took me maybe 2-4 hours to build (most of this was due to trying to find good data sources and experimenting with the Map Reporting Tools), it really is not that complex when you break it down. And as I am not a spatial expert, I hope this is beam of hope for those of you who have wanted to crack into the spatial/GIS realm but never knew where to start. This workflow is available through the Alteryx Gallery and I welcome you to download it, run it, and tweak it yourself! I have included all my references and failed experiments in the actual workflow, so you can see a true end-to-end analysis!