Over the years, the subject of the assignment of demographics to trade areas – whether radius, drive time, or any old polygon – has been prominent in retail location research. The debate exists because demographic data is collected for one set of geographic areas, and most trade area definitions are based on geometric shapes which don’t match. Methodologies have improved with computing speed, so while the problem is now of minor importance, it remains. The geographic overlay of two sets of boundaries will always have some estimation error, even if we were to go to a building level, as inevitably our shape will cut buildings in half.

Beyond the data itself and its inherent errors, there are really three sources of error that affect trade area reports –

  • Edge error, which affects the narrow band along the trade area polygon
  • Size errors, where the trade area chosen may have significant impact on the report because of population clusters either just within, our just outside, the study area boundary
  • Positional or location errors, where the geocoded location of the site may be slightly off, or where the property in question is large and a range of valid coordinates could be chosen

Being fussy about these sorts of things, we set off to consider these sources of uncertainty and whether we could quantify them. Our Snapshot engine uses weighted block centroids (by type of content) to extract portions of block groups, and now includes a sensitivity function which considers each of the three types of errors, and even allows users to tinker with the buttons and dials.

Size and location errors will be covered in a separate discussion, and for the time being, we will focus on what we refer to as “edge errors”.

To look at the issue, we chose a retail zone along Charlotte Pike (US 70) in the west side of Nashville and used a 2-mile radius (36.136456,-86.884182). Many trade areas have similar properties – mixed land uses, discontinuous population, rivers, and other nuisances. The trade area is shown below over a satellite image base.

If we map instead the block groups, blocks, and block centroids, we can see that the distribution of block centroids within the blocks is very uneven, and that there are large areas of the radius without centroids and, just to the northeast of the radius, there is a large area of dense population.

The area to the northeast is our area of particular interest – as several well populated blocks are located along this line. The following map highlights the area in this zone where the census block polygon is partially inside and partially outside the radius –

Note the very uneven distribution of block centroids within these block groups. The block group at the center of the radius line has one centroid near the line, and a number of them well away from the line to the east. This is a particularity related to the definition of census blocks and occurs frequently – the largest part of the block group is a golf course. Note that estimating the block group share based on area would result in serious overestimation in this case, since only one block centroid is within the zone. A zoomed view of two problem blocks showing the building outlines highlights the issue –

Here, one centroid lies just outside the trade area and would be entirely excluded, and the other just inside and entirely included. Both have buildings which are relatively evenly distributed. We expect to find both conditions in most trade areas, and because blocks are generally similar in population size and shape, things tend to all come out in the wash.

An area-based percentage overlay here would be more precise, but it actually induces considerable error in blocks where the population is completely unevenly distributed and is computationally very expensive.

The approach we have taken to analyze this is to define a zone of uncertainty on each side of the radius line. Based on average block sizes nationwide, the default parameter is 0.05 miles on each side. With a 2 mile radius, this results in an “uncertain” zone which is 10% of the total trade area size. The simple approach taken here is as follows –

  • If a block centroid falls inside the zone of uncertainty, it is assigned a probability of 1.0 and its share of the block group is assumed to be all inside the radius
  • If the centroid falls outside the radius, and outside the uncertainty zone, it is excluded entirely (given a probability of 0.0)
  • Points falling in the uncertainty zone are assigned a probability based on the linear distance from the inner to outer radius points, so that a point just on the line would be given a probability of 0.5

For our two blocks in the map above, in our normal methods, one would be in, the other out. Here, the one just inside the line is given a probability of being about 0.6 inside (60%) and the one just outside a value of about 0.49.

By using the average size of populated census blocks, the results should yield numbers which at the very least, will warn of significant edge effects when comparing the two methods, and yet it avoids the computational burden of polygon overlays.

The results —

With population, the difference between the two methods was just less than 200 people, or under 1% of the total. The minimum value excludes all population, and the maximum value includes all population in the uncertainty zone.

So, having chosen a trade area which we thought might be problematic, how does it stack up to normal trade areas? Using a 5% sample of block groups, we ran a series of trade areas at each centroid, and it turns out that this trade area is by no means abnormal – in fact, it had a below average difference between the two methods – with the average absolute deviation at 1.15% for this size trade area. A chart of average deviation by radius size is as you would expect – the larger the radius, the smaller the expected percentage difference:

Our conclusions are several –

  • Absolute error in the “fuzzy” zone is dependent upon trade area size, with error decreasing with larger trade areas
  • The level of error is small, and certainly within range of the expected error in simple population counts at a block level (+/- 3% based on census studies over the years)
  • The edge sensitivity method is useful in determining whether a particular trade area has unusual edge properties which may warrant detailed field investigation

In the coming weeks, we will continue this conversation around enhancing trade areas as it relates to radius and locational effects. We think that this function will enhance site selection and assist in research efforts. Expect this new feature to hit Snapshot when 2021B data releases in early November.