At a user conference some years ago, I was introduced by the host saying that there are two things you should never see being made – sausage and demographics. He went on to tell the audience that I wasn’t there to talk about sausage. Great introduction, especially since the topic he specifically wanted me to address was understanding and using error as an analytical tool. Let’s take a look at what is happening on the factory floor for our latest database creation, gentrification.

Gentrification has long been a hot topic in urban geography circles. It is often considered as a mixed blessing – a decaying area is revitalized by middle- and upper-income people wishing to live relatively close to the city center. But in so doing, the existing lower-income tenants of the area, often from a minority group, are displaced. One of the better works on the topic, albeit with a clear negative bias, we found was from the National Community Reinvestment Coalition (NCRC), entitled Shifting Neighborhoods: Gentrification and Cultural Displacement in American Cities.

We know what gentrification looks like when we see it – some houses are newly renovated while others are somewhat worse for wear with driveways that look like the beginnings of a museum for 1990’s economy cars. But the reality is that we can’t visit every street in the nation on a regular basis, so we must rely on data.

Sadly, we found no gentrification questions on the American Community Survey. This occurs with alarming regularity – with no direct measure, we must cobble together something that is plausible and testable against known cases.

Gentrification is a particular form of neighborhood demographic shift that occurs in areas which:

  • Are of at least moderate urban density and are within the spatial context of an urban area
  • Are older neighborhoods with largely single-family dwellings (attached or detached) with relatively low real estate prices
  • Is of modest or low income

It is recognized, after the fact, by considering:

  • Growth in relative income
  • Decline in vacant housing
  • Increasing home ownership
  • Appreciating real estate values
  • Declining unemployment
  • Increasing educational attainment levels
  • Significant race/ethnicity changes

Immediately, we have two components to the model – a selection phase, and a scoring phase.

The selection phase is used to weed out block groups in which the demographic shifts would not be considered as gentrification. For example, we might find all seven factors in an area which was recently farmland and is now a newly minted affluent suburb. But what constitutes a lower income neighborhood? What might be considered low income in one city can be solidly middle income in another. The solution is to index the income of each neighborhood to the metropolitan area average, then do the same with the other variables. Selection results in a pool of eligible block groups which satisfy each of our criteria and will be scored.

Measuring change requires time series data. For geographic areas, this requires consistently defined areas in a world where every ten years the block groups are significantly modified. The older the census data, the more times it has been abused by being estimated on new boundaries, so we chose 1990 as the starting point and pulled the appropriate variables from 1990, 2000, 2010, and the current year.

But how do you use them? After all, they are measuring different objects (dwellings, households, and people) on incompatible scales. The solution is to index each against the metropolitan average which standardizes the data and allows us to compute a trend line. The magnitude of the slope indicates the strength of the shift, while its sign indicates direction.

These scores were then weighted – we considered a relative income growth to be much more important than a relative increase in educational attainment. The benefit of this approach is that does not require all seven measures to show the expected trend to result in a high score which indicates gentrification has occurred. It also allows us to differentiate between slight and substantial shift quite readily.

At the end of the process, any block group with a positive score was assumed to have some degree of gentrification. The high scores were, of course, way off the scale, so we take a logarithmic transformation of the results in order to provide a more reasonable distribution of the results. This was termed the “historic gentrification index”.

And what about projections? The criteria for selection remain the same, but obviously these are a different set of block groups than in 1990. By comparing the initial conditions on the seven variables to the gentrification index, we were able to determine that areas closer to the average income were more likely to have gentrified 30 years later than those which were extremely low income.

Each block group can then be given a potential score which is based on observed change. But we also know that this process tends to occur on one street, followed by an adjacent street, and so forth. In other words, this is a classic spatial contagion model, which we implemented as a simple gravity model that incorporates both the current conditions of each neighborhood, but also its spatial context in relation to nearby areas where gentrification has occurred. As you would expect, the potential index is generally high in low-income areas which are adjacent to areas that have recently gentrified. The result is often sharp boundaries which is what we expect to see in many areas.

So how good is the sausage? Time will tell, but in cities we know well, the historical index appears to capture the neighborhoods we expected to find, and the patterns of potential properly identify areas locally known as the next up-and-coming areas. Like a good sausage, there is always room for tinkering with the recipe.