Why Being Unique Matters in Location Data
At Blis, we generally only approve a small percentage of the location data that comes into us. On a recent day in the US, we passed just 22 percent of the publishers that sent us data. That means only 15,000 passed, while another 54,000 failed. Of that failed data, 40 percent was from centroids, 16 percent was not precise enough, and another 28 percent was “unique.”
Before we get any deeper into this post, let’s clarify what we mean by “uniques” here. We’re not talking about unique devices or device IDs as you may assume. We’re actually talking about unique latitude/longitude data.
Uniques are important in this context because, as you may recall from our last post, we get pretty granular in our lat/long data – down to the location of a single person or tree, and sometimes even more precise than that. This translates to five or six decimal places in each lat/long data point, so while there may be handful of mobile devices accessing the internet at any one given moment in time at any single lat/long, there usually aren’t more than that.
With that in mind, imagine if you saw two million impressions coming from a single lat/long point in day. Even though you’re probably not a data scientist or analyst, you would probably find that data suspect, right?
In fact, this maps back to the reason why Blis developed our SmartPin technology. Years ago, our team was running a campaign for a mobile brand in the UK that was targeting consumers within a five-minute walk of certain big phone stores. The campaign setup geofences of about 500 meters around ten stores in major cities. After four or five days of collecting data, we noticed that 95 percent of the data was coming from a single store in Manchester. That seemed strange because, although Manchester is a major city, it does not represent 20 times the population of the other cities included in the campaign. It just didn’t make any sense.
We dug in and analyzed the data. We visualized it, and once it was mapped out, we were able to really see the problem: a full 99.5 percent of all the data was coming from a single point on the map. That’s suspicious enough, but take into consideration that a point on the map for us is less than square meter – down to the “individual trees” to “individual humans” level. It’s literally impossible to replicate in the real world: To have this many mobile devices (millions, in this case) in one spot, they’d have to:
- Be stacked in a pyramid in a precise area of less than one square meter
- All be accessing the internet at the same time
- All sending bid requests at the same time
It hardly seems likely. What does seem likely is that all these users were not actually within the geofence. They were likely users elsewhere in Manchester, and some less-than-respectable publisher had randomly picked a spot within the city to assign to these users – and it happened to be within our campaign area. Unfortunately, we had bought all of that bad data, and we obviously couldn’t use it.
This was the first time the team had seen this happen, and we began to realize how much bad data could potentially be out there. We knew we had to find a solution to protect our clients, since data this inaccurate could negatively impact or even ruin their campaigns. We also didn’t want to buy data like this ever again.
We still see these scenarios frequently, where data from a single lat/long appears again and again and again. However, today we have SmartPin technology in place, so we never pass it on to clients. SmartPin makes it easy for us to spot and remove from the pool.