When starting in data analytics, it can be difficult to know which career path to choose. Whether it is the type of analyst you want to be (financial/quant, performance, etc.), the type of company you want to work for, or even the data language you want to use. Well, maybe I can help you with one of those. In this post, I will walk-through the same analysis on two different platforms: R and Alteryx.
To start, all of my analyses and images can be found in my github (ADerbak/2018-07-R-vs-Alteryx-Blog-Post). Feel free to fork or clone to your hearts content. Admittedly, these analyses are really just some simple examples of the different features of each platform.
For both of these analyses, I am using the FDIC Failed Bank List that I pulled from Data.gov. This is a relatively simple file that contains failed banks by state with their close date and a timestamp of when the data was updated. I did three challenges with each of them: Data preview, data relevancy (last time the data was updated), and create a chart to show the top 10 states who had the most bank closures.
R Challenge! Go!
For those unfamiliar with R or R Studio, R is a statistical programming language. It is free and open-sourced, which means anyone can download it, create functions and packages, and share it with other analysts. One of the biggest benefits of R is that if you need something (a function, a model, etc.), odds are another developer has already created a package for it. The language has been around for quite some time and it is quite flexible. That being said, it is still a language. So if you want to do anything, you have to learn to “talk” to R. For those that come from excel, simply filling down or changing column names will take actual programming knowledge in R. Rstudio is a Graphical User Interface (GUI) that simply helps to write/run R code.
I ran all of my code in R markdown, which allows me to create an html file so you can not only see my notes, but also my code when I’ve chosen to show it. You can view the entire analysis in R here. This will allow you to follow along as I talk through my thoughts while writing the report.
So with this analysis, I imported the data as well as the dplyr (data manipulation) and ggplot2 (grammar of graphics) packages. In order to run any packages, you must load them every time you run R. Fortunately, they aren’t very large, and you only need to install them only once, though you might want to run install.packages() in your code with each package listed for anyone else running your code.
Once these were going, I simply ran head() and count() functions to preview my data and get the total number of rows. This solved my first problem of previewing the data.
For relevancy, I had to convert the dates from factors to actual dates – it’s very important to understand data types in any language! – and then I could pull the max date value from the different columns to get data relevancy.
I then grouped my states and got a count of failed banks using the following code:
State_counts <- banklist_df %>% group_by(ST) %>% summarise(count = n()) %>% arrange(desc(count))
From there I created a colorful chart using the following code:
Top10 <- head(State_counts, 10) p <- ggplot(Top10, aes(y=Top10$count, x=ST, fill=ST)) + geom_col() + ylab("Count of Failed Banks")+ xlab("States") p + labs(title = "Top 10 Failed Banks by State") + theme(plot.title = element_text(hjust = 0.5))
I won’t walk through all of this code, but basically know that I created the top 10 states as an object, plotted it using count and states, and then labelled and adjusted the chart as needed.
Finally, I put in my notes and knitted the whole thing using the knitr package and voila!, I have created a fancy little, reproducible report!
Okay Alteryx, it’s your turn!
I have talked about Alteryx before in this blog post. However, as a quick recap, Alteryx is a drag-and-drop GUI mega-tool. It is very visual and requires little-to-no coding to run from the user’s side. What’s more, there are tools that allow you to write code if you are already familiar with a coding language.
While I cannot share the behind-the-scenes code with you like I could in R, you can see everything I did in the image below.
Another bonus about Alteryx is that if you need to explain your methodology as to how you got to your insights, you can visually run through everything (with annotations as well!) and you won’t have to worry about reading one lick of code. This is especially handy if you are trying to explain that to someone who is not familiar with programming languages.
Now for the challenges. For previewing the data, I could bring in the file using the input tool and actually see a preview before running anything (pretty sweet!). To fight fair, though, I sampled the first 20 rows and put a browse on it. You can see the results of that browse tool below:
For data relevancy, I again had to convert the string data to a date format using the DateTime parse tool. For this particular tool, there are several default options I could pick to convert the code, so I selected the one that matched the “dd-Mon-yy” format, and then used the summarize tool to pull the max dates from my two date columns. You can see those results below:
Finally, to get the Top 10 list, I simply summarized by state/count combination, sorted descending on count, sampled the first 10 records, and the resorted by state name. I then put this in the charting tool and added some of my own labeling and graphics to create a pretty little chart.
As you can see, both of these tools give you the same results and have their own pros and cons. For R, I like the customization, visualization, and reporting features, but it does have a hefty learning curve. If you’re brave enough and can use it regularly, you will pick up R in no time. For Alteryx, I like the visual aspect to everything and setting up the tools is pretty intuitive. As for cons, it can be a bit tricky to figure out how to make a chart or report, especially for a platform that is suited for analysts that do not know code or the grammar of graphics. That being said, I know they are working on better visualization tools, and a lot of Alteryx users output their dataset to be picked up by other platforms like Qlik or Tableau for reporting.
So which platform should you pick? There is no great answer, it just depends on your goals. I, personally, found R to be both challenging and rewarding, and learning Alteryx was like being 16 and getting a Ferrari! If I had to pick for you, I would make these simple statements:
- If you love customization and have extra time to learn code, pick R
- If you need insights now and will learn code later, pick Alteryx
Bonus: If you know both R and Alteryx, you can write R code in Alteryx!