Many years ago, the company I worked for produced a business/daytime population report for radius studies. We did not have a complete business list but rather used the County Business Patterns which provides unedited (dirty) data at the ZIP code level only (not detailed) for the previous year (not particularly timely). The notion that we could accurately estimate the number and size of businesses at the block group level from aggregate statistics was in retrospect rather audacious, but in general, the $25 report was generally well received by clients.

One particular client had bought a report and called to complain about its accuracy. The report stated that there were 11 gas stations in the trade area, but he personally drove the area and enumerated 12.   He demanded his money back since the report was, well, wrong. By the time he became my problem, he had made his plea to sales, production, and my data team, so clearly the conversation did not begin well. Rather than do “the right thing” and refund his money, I asked him a series of increasingly pointed questions which culminated in:

  • Was the report, correct or not, a good value?
  • Does the error change your business decision?

While he was adamant that the report was wrong, he did have to admit that it was delivered quickly and – this is key – at a low price relative to his costs of enumerating the area himself. He still wanted his money back, of course, until I asked the second question. It was here that he finally understood the purpose of the report in the first place, as he admitted that no, it did not change his business decision.

I then rather pointedly asked him why the heck – confession, it was probably not the actual word I used – he was wasting my time when we had provided at minimal cost sufficiently accurate data needed to make a business decision. The error in this case was less than his tolerance for uncertainty, thus the report was a good value. It could have gone either way, but in this case the client became one of our largest report customers within short order.

But why tell this story? It neatly captures the essence of why we bother with data driven analytics and how it should be conducted. The primary purpose is to cost effectively reduce our uncertainty in decision-making.

Imagine for a moment that you have been tasked with finding the best location for a new restaurant concept in a county that you have never heard of, does not appear on Google maps, and even Wikipedia doesn’t seem to know about. You have zero information on who the customers might be, how many there are, where they live, and what characteristics would work in a site. Zero information. Or, put differently, you have 100% uncertainty. The cost of providing that information was zero.

Now imagine the opposite case where we have complete knowledge of where each person lives, works, travels, and we know with absolute certainty their shopping patterns to the point that even for a new concept, we could look at any particular location and know exactly what its sales would be. We would be absolutely certain to find the correct site to maximize the sales and profitability of the restaurant.   That is, we would have 0% uncertainty, and the cost of obtaining that data would theoretically be infinite. We would need to track and model every single movement, purchase, conversation, and even the unrevealed preferences of every single person.

At the risk of oversimplifying, in the site location world, we can construct a chart that shows our dilemma –


As the information content increases or improves in quality, the grey zone of uncertainty in the middle will be squeezed and for any potential site we look at, the probability that it falls into the uncertain zone declines. Ideally, we want to push the entire uncertainty zone left, and narrow it if at all possible.

But while we want to reduce our uncertainty zone, we must do so at a reasonable cost. That is, we want good value for our investment. Ideally, the goal of applying information to the problem is twofold – to reduce the width of the uncertainty band, and to shift it as far left as possible so that we rarely face decisions in that gray zone. The problem, of course, is that the marginal gain in information per dollar spent rapidly diminishes – so we must find that happy intersection of reduced uncertainty and higher price that maximizes the value to us.

Over the coming weeks, we will be focusing on a wide range of AGS analytics data and tools that are information dense, meaning that they can dramatically reduce and shift your uncertainty band with high marginal value.