Data Traps

KPIs. Machine learning. Data-driven. AI. Metrics. These 21st century buzzwords brought an idea to the masses. It’s an idea that’s older than most people realize.

Put simply, the idea is information. Gathering and using facts. Modern science and technology provide powerful tools to manipulate information. One way to view this is simply as an extension of my favorite technology: language.

The three biggest hurdles to making the buzzwords really work are the same three hurdles we’ve faced for thousands of years. When writing things down (recording data) and interpreting written things (building models), we should be very careful about these three traps because they aren’t always immediately apparent. Attacking them head-on will be very helpful in the long run.

First, we must understand our overall objectives. If a school has a laundry list of priorities, without clarity on ranking or tradeoffs, then there are in fact no priorities. What is my organization actually focusing on and trying to accomplish over the next few days, weeks, or months? Let’s define which puzzle the team is solving.

Second, we must set ourselves up with useful questions, such that answering those questions will make progress towards the organization’s objectives. The first trap is about having institutional priorities, and this one is the practical counterpart. How do we relate our data effort to the overall goals? Let’s define which piece of the puzzle we’re slotting into place.

Third, we must obtain clean data and do a bunch of other busy-work to make happy results. It’s been said that such tasks are 95% of the work, and the magical ML or other buzzwordy thing is less than 5% of the effort. So what do we actually require in order for the data effort to succeed? Let’s plan and execute how we’re going to get the puzzle piece dusted off, painted, and into place.

These are all really simple and obvious-sounding. Yet, in my experience across multiple organizations including those of my own creation, these traps are almost never adequately addressed. Thankfully, certain organizations have gotten aspects of this correct, and we now look to emulate those success stories everywhere.

There’s still a funny dichotomy between the wannabes and the legit folks. Say a company recognizes which facts they need to collect in order to achieve their goals successfully. Then they’ve already got the first two things: concrete goals and a breakdown of goals into deliverables. All that’s left is to do the busy work, which still must be planned and executed carefully à la point three. But this company isn’t just interested in being data-driven. They are already working, with buy-in and alignment from the board and C-suite down, in a data-driven way.

Let’s talk science for a minute. The scientific method relies on gathering a bunch of data and interpreting them. On one hand, we want to expend effort on studies that will likely give us useful results. On the other hand, we want to avoid biased collection of evidence in support of an unproven hypothesis. This tension causes a number of issues in academia, and it’s worth being aware of when dealing with data in the corporate sphere as well.

My overall perspective here is that we are really only in the infancy of true adoption of the mindset the buzzwords represent. It’s going to take most of the 21st century if not longer for the culture to catch up with our tools. We have to deal with so many ancient things, including our language, cognitive biases, and institutions. Many folks in positions of power just aren’t used to doing things rigorously enough, and it’s going to take time for those folks to age out or decide to seek feedback effectively.

The nice thing about our modern age of flowing and abundant information is that powerful tools bring everything to the fore more quickly. Our inadequacies are revealed in painful and undeniable ways. Iteration cycles are ever more rapid, and if the truth wins out—as I believe it does—then we should converge in good time.


Random | Archive | Join | About | Contact