Centroids: The Good, The Bad, and the Highly Imprecise
As discussed in our previous blog post , Blis is incredibly selective with respect to the data we allow into our pool. We apply many filters to ensure data is both accurate and precise, and not all of it makes the cut. Of the data that isn’t up to snuff, nearly 85 percent fails to pass through these three important filters:
- Centroids: When visualized data falls into grids, straight lines or symmetrical shapes on a map, this is the result of broad-reaching centroids, which may deliver generally accurate but highly imprecise data.
- Precision: Blis considers lat/long data of three data points or less too imprecise to deploy in programmatic campaigns.
- Uniques: When curiously large volumes of data originate from a single lat/long – a space that is only square millimeters in size, that’s a red flag.
Today’s post will focus on that first one, centroids.
What is a centroid?
As the name suggests, a centroid is central location used to determine a user’s location. The term “centroid” itself is a catchall for many things. Most often, it refers to latitude and longitude data returned from a central IP location in metropolitan area. Centroids are often employed when a user denies permission for a publisher to use their GPS data. In these cases, the publisher will estimate the location based on loose assumptions, tying the user to the nearest IP address at the time they’re on their device and engaging with that publisher.
The problem with centroid data is that, while it’s technically accurate, it’s also imprecise. Centroids will effectively draw a grid over an entire metro area, with grid points several kilometers apart. To illustrate, London, a city with a population of 14 million, only has about 12 grid points. So while the nearest grid point to a user will likely be within London, it won’t be more specific than Westminster or Brixton – not nearly precise enough for behavioral targeting.
How does Blis identify centroid data?
A lot of bad data is called centroid data. At Blis, we can easily and specifically identify centroid data by visualizing it and scanning for grid patterns. Centroid data will place humans in predictable patterns and shapes along the map, suggesting that people live and move in straight lines along a grid. This clearly doesn’t match true human behavior, which is much more random and unpredictable by comparison. A heatmap of actual human beings in an inhabited location would reveal clusters of people in shopping malls or coffee shops, and large voids in football fields or empty office blocks.
The concerns about centroid data highlight the issue we raised in our last post about precision and accuracy. Centroid data is often accurate. If you’re sold device IDs that are identified as “New Yorkers” that were captured via centroids, you can be fairly confident the data is accurate. Most of those users will be in New York City, although some may be upstate or in Connecticut or New Jersey.
For Blis, however, this data is not nearly precise enough, but we can often use it in conjunction with other sources. If Centroid data is identified, we simply remove the lat/long data from the bid request, and treat it as if the bid request came into our platform with no lat/long attached to it. We then make an effort find the right location based on other factors within the bid request. For example, we may have a device ID from a previous bid request, so we can identify a particular user from a previous request.
On its own, Centroid data is adequate for targeting broad geographic areas, but without additional data appended to it, it can’t be used to precisely target location. At Blis, we require much more precise information to deliver the accurate behavioral targeting our clients expect.
In our next post, we’ll look at how Blis filters for precision, and what that means in the context of programmatic advertising.
Tags: Amy Fox, Centroids, Debunking Data Series, precision data