Very early on in my analytics/MDM career (and it is still early), I had a series of questions about data sources that I could not answer easily. I could get in the ballpark, but never get the answer or right place to find the data I needed. And this wasn’t because people didn’t want to give me the answer, but it was due to the data being so scattered that there was no centralized, easy way to find the answer (hence why Master Data Management is so important). But, using a couple of amazing system calls and some Alteryx magic, I have been able to create my own navigation system. These two macros I call Searchlight and Glass. And while I still have a ton of work in front of me, these macros have been pivotal in helping me quickly answer any question that gets tossed my way.
Note: These macros run system calls on Oracle. Everyone’s systems are different, but I think that with any system, you can work on a similar macro to help you find your way in the dark.
Before we delve into the who, what, when, where, and why (5 W’s) of these macros, it is important to quickly talk about where this falls on our analytical process. For those new to the concept, there are 5 distinct buckets of an analytical process: Collection, Cleansing, Exploring, Modeling, and Reporting. Collection – which is what we will be talking about – is the process of clearly defining the question you are trying to answer (5 W’s), knowing what data you need, defining your tidy data set, and knowing whether you have access to this data.
Note, again: Due to the data outputs of these being somewhat sensitive in nature, I cannot share all the nitty-gritty. However, the system calls and general workflows are not sensitive and were, in fact, rather easy to build. I have changed the workflows to make them generic and to protect my company’s data.
Searchlight Macro
For any question that was being asked, I had to go back to my notes on collection understand what all was needed. I need to know all products with x attributes at n number of stores. I need to know how many products have x cost from y vendor. For both of these questions, I need to know where to find product data, attribute data, store data, costing data, and vendor data. Five (or more) different sources from just two somewhat simple questions. And, if you have a ton of de-aggregated tables, you will need to know how to aggregate up the answers to these questions through a series of joins, summation, and other tidy data tools. Before any of that data magic, you need to know how to find the data.
Fortunately, using two Oracle system calls, I was able to dig into the metadata of tables to help navigate to what I need. These to two calls are ALL_TAB_COLUMNS and ALL_COL_COMMENTS (you can read more about them at the links). Essentially ALL_TAB_COLUMNS is pulling back all the schemas and tables within a data warehouse, and ALL_COL_COMMENTS is pulling back the comments on those tables and schemas. Before I built any macros, I would frequently pull this data down, blend, and filter to a table or schema I was looking for. Over time, though, I wanted to create a way to do this more quickly. That’s when I macro-tized these calls for a few main data warehouses, blended, cleansed, and filtered to get the workflow below:
By running this macro, you get two outputs. You can search for a term (ex. product) and get the all the columns that contain that term, or you can – if you don’t know what you’re looking for – simply get all the schemas and tables associated with the data warehouses. The output has many fields, but the main ones you get are the Warehouse, Owner, and Column Names. These are essential to being able to know the location and actually pull the data.
By turning this process into something quickly repeatable and customized, no one has to remember the system calls or to blend the data a certain way, and it prepares us for the next macro: Glass.
Glass Macro
So the next step I would typically do before after finding the Warehouse, Owner, and Column Names is drop in another Input Tool in Alteryx and write a “Select * From ….” statement. Well what if you need to pull back 2-3 tables and compare them? What if you need to pull back 15 tables to compare? What if you’re not really sure this is the table you need, but you don’t want to write out all that SQL just to painstakingly realize this wasn’t what you wanted? (I’m speaking from experience on that one). With the Glass Macro (and Select Records and Browse tools) you can search a table at a time and analyze the results. Note: You will have to run Searchlight at least once to know which record you want to select. Searchlight on my machine takes under 30 seconds to run, so I usually throw it on the canvas and drink some coffee as I patiently await the results.
This Macro tests for just one record coming in, which is why you will need to select the line of data from the Searchlight tool using the Select Records tool. If there is more than one record, it immediately fails, saving you some time. Once it has the single record, it will concatenate and prepare fields to write an SQL statement for you! (There is a warehouse selection drop-down, but this macro can run with no input from the user as it will automatically identify the warehouse from your input data.) Finally, it will output the first 1000 records for your viewing pleasure. Throw a Browse on the other end of this and Bam! You can run a quick analysis on what kind of records you might see and if this is the table you may or may not be looking for.
By chaining these two macros together, I took a process from 10-20 minutes each time I wanted to find a table and view its data, to mere minutes for any end-to-end search, albeit 1 table or 10 tables. And by tweaking searchlight, you can also search through new data warehouses if you gain access to more. So, while I cannot actually share the macros or the data outputs, the builds of these are generic enough to where you can take the idea of them and run with them on your own!
If you would like to try to re-create these macros for your own use, please feel free to send me a message at andrewderbak@gmail.com!