Search This Blog

20170414

Rant on analysis 1 of many

There's nothing I enjoyed more in my old job than a fresh streaming pile of raw data, give me a months data and a quarter to compare it to, I'm like a pig in shit. Nothing teaches you more about statistics than wading into it up to the knees. From fucking around with SQL to force a database into giving you what you need, to performing black magic with vlookups holding a concatenate referencing a pivot table makes my dick tingle. It's not just the joy of a good script coming together but the overview showing you in clear numbers or charts for the discalcic.

Then you must do the hard work RCA (Root Cause Analysis), comparing the results to the reality, then the horror kicks in. When looking into the causes of contact, often by listening in to calls you slowly realise that data gets skewed by laziness, things like 'billing' isn't the top call driver it's the top of the list. Then you need to start a new table, list the user, the reason given and the actual cause. From that you get what I call "frequent fuckers",
essentially an exclude list of users that misuse logging and cause anomalous data. Go back to the original piece of data, create a new tab with a table of the "frequent fuckers" to to your raw data putting in a vlookup against the new tab, if they're on the table get it to give a mark, throw in a ifna to keep it clean, rebuild your pivot table but this time add a filter to exclude the "frequent fuckers" and then this look different.

The worst thing someone can do is over use averages, averages are often the bit in the middle that no one uses, providing solutions on averages gives these best solution for no one.

Try it yourself, http://www.pewresearch.org/data/ has lots of data sets, see can you get to the same results they do, more interesting is if you don't, then you get to figure out why not. Any questions leave a comment, who knows I might even get back to you.

No comments:

Post a Comment