Throwing Spaghetti at the Wall

In the traditional realm of parametric statistics, we were taught that the correct approach to building a statistical model of any sort was to follow the scientific method. You know, formulate a set of hypotheses about what independent factors might contribute to the value of the dependent variable in a model, then test them out. If the model doesn’t fit well, make some more hypotheses. Continue until happy with the results or have exhausted the list of hypotheses.

Causality was king, and while it might work well in the physics lab, things undeniably go sideways when we step outside and toss in the spatial dimension, competitors, and seemingly unpredictable consumers. That and the fact that the client was promised a model by Friday because they are presenting it to their board of directors at 4:00 p.m.

Reality sets in, and most models ultimately are built by throwing everything but the kitchen sink at the wall and seeing what sticks. While this should not be openly stated in polite company, the reality is that while we don’t want counter-intuitive variables in the model, we often care more about model fit than theoretical esthetics. If earthquake risk pops up as a strong signal for sales performance of our sporting goods chain, we probably shrug it off and say that we are really measuring something else that is spatially autocorrelated with earthquake risk. We might have to craft some creative jargon-laced statements in order to make it sound plausible, but that we will ultimately do and still sleep relatively well.

But the analyst has a serious issue here: to get enough pasta to stick to the wall, the pot must be overflowing with different sizes and shapes of the stuff, most of which is expected to fall to the ground to be consumed by their dog, who is now quite sadly being fat shamed at the veterinarian’s office.

The essence of the problem is that each modeling project will need different variables to make the model work. Sure, there will be a core set of demographics that are in most every model, but to get better performance you will need to bolster the model with other data. Clients like to know how much the model is going to cost them, up front and without surprises. And they don’t want to end up as the proud owners of a database that includes the summer nesting range of the purple snarter-darter, even though you really, really thought it would help the model. Worse yet, the data vendor sells the data only as a complete set.

Since most data vendors insist that you can’t buy the data then not pay for it, the sales team quoting the project will try to minimize the data costs to make the total project cost palatable. At the same time, the analyst desperately wants to buy absolutely every shape of pasta known to man in the hopes of building a good model for them. The quote will almost always be light on the data side.

So what happens when not enough pasta sticks to the wall? The client is asked to pay for an additional database. If they agree, the analyst is pretty much obligated to make sure that some of that data sticks, even if they staple it to the wall. The result? The client now has more data than they need, the model could probably work better than it does because we didn’t dare ask to test other data, and worst of all, they must buy that same data every year to keep a model running that doesn’t benefit from its presence.

Why does this happen? Because most of the creators of demographic data are not themselves users of it. Even if it comes from a large company that has spatial modeling gurus, the data itself is likely to come from a small group who confine their interests to the data. Ask the demographers how to build a retail site model and their eyes will glaze over. In effect, they might be good at demographics, but more often than not haven’t got the foggiest notion of how you would actually use their product.

At AGS, we are radically different. Our roots are in modeling, and we continue to engage in modeling projects on a regular basis covering a broad range of areas, just to name a few –

Site location and network performance models for retailers and restaurants, including some of the world’s largest retailers.
Bank branch deposit models in multiple countries
Credit scoring
Direct response models including direct mail and internet search keyword models
Commercial insurance underwriting
Multi-office location optimization
Missing data estimation models

How does this experience guide us as a data company? Very simple. We want your models to work as effectively for your clients as possible. We want your analysts to have access to as much data as possible for every project. And we want your sales team to be able to offer a fixed quote which is more stable than the pre-renovation quotes on your favorite HGTV show.

Our innovative statistical model package covers all three needs. For a flat per model fee, your analysts already have our entire library of over 40,000 variables in house and are free to use any or all of them. The client has access to every variable used in the model and a broad range of standard demographics which can be customized to their specific requirements. Best yet? We freely offer our modeling experience to anybody who asks. Want to talk spatial interaction models, trade area forecasting, or supervised learning to us? Don’t worry, our eyes won’t glaze over. Just know that we give honest advice, even when ears might not be keen on hearing it.

Submit a Comment Cancel reply

Subscribe To The AGS Weekly Newsletter

You have Successfully Subscribed!

Recent Posts

Categories

Recent Comments