The more, the better? What we can learn from Foursquare’s open-source 100M POI dataset
Foursquare has launched an extensive open-source POI dataset—but is it a case of quality versus quantity?
Last week, Foursquare announced an open-source dataset with over 100 million POIs. In the past two years, open data has been rising in popularity with initiatives such as the Overture Maps Foundation and the EU’s High-Value Datasets Regulation, making various datasets publicly accessible. With Foursquare’s recent announcement, the company has joined this data democratization movement, which is set to change the accessibility and understanding of geospatial insights.
Since the announcement, Foursquare’s Open Source Places has been making the rounds on LinkedIn and garnering significant attention. For example, consider this viral post by the co-founder and CEO of Fused.io:
The size of Foursquare's POI dataset has generated excitement, and for good reason. Businesses depend on geospatial insights to drive strategic, data-informed decisions. In an ideal setting, larger datasets provide more comprehensive information about an area and its places. This richness can power decisions in retail expansion, ad campaign optimization, or trend analysis. When data is detailed and accurate, it offers a clearer reflection of real-world activity.
However, as we mentioned in a recent article, data quality matters a great deal. A massive dataset loses its value if the information it contains is unreliable or flawed, thereby making it practically useless—a classic situation of quality vs. quantity.
Why data quality matters more than quantity
Foursquare once competed with social media giants like Twitter and even received a $120M acquisition offer from Facebook. The platform initially relied on users “checking in” at locations to share their activities with friends and family. Foursquare's social media ambitions didn’t fully materialize—even though its app, Swarm, still exists—but the company discovered the true value lay in the data it had accumulated. Years of user activity provided rich insights into points of interest (POIs), which Foursquare now offers in an open-source dataset.
One significant advantage of open-source data is the collective contribution from a larger number of people, which can dramatically increase the potential quantity of data. However, this advantage is a double-edged sword: the lack of control over filtering and quality assurance can lead to subpar datasets.
This is where the question of quality comes into the discussion. As the Director of Geospatial at 4M Analytics observed, alongside this valuable data lies a trove of less meaningful content: casual messages, profanities, and trivial location updates shared over the years. For instance, Swarm allows users to create their own POI and categorize them as they choose, which happens without anyone checking the veracity of it.
Within the static point dataset, you’ll find logged activities like tourists writing about tourist experiences in their native languages and complaints about traffic in an area at a certain time. It’s skewed biases and inconsistent insights with a dash of trolling comments. Collectively, this non-serious content defeats the purpose of such a dataset in the first place. Essentially, this portion of “raw” data needs good ol’-fashioned processing and cleaning up before it can provide reliable information.
There’s no better way to illustrate this point than by using Foursquare’s Open Source Places as an example:
It’s a rather interesting-looking map. Has Foursquare found the last city of Atlantis? No, unfortunately, we’re looking at a void around Null Island. You’ll notice a bunch of POIs scattered around it. You might wonder, how’s that happened? The presence of POIs near Null Island and the surrounding void reflects a combination of data anomalies, defaults, and a lack of cleaning processes rather than meaningful geographic information.
Why POIs appear around Null Island
Here are some reasons why the Null Island Void has occurred on their map:
- Default coordinates for missing data: Some systems default to 0,0 (Null Island) when geospatial data lacks proper coordinates or location information. This results in incorrect points being displayed in this location.
- Mistaken data logging: Devices such as GPS trackers, when unable to determine accurate location data due to poor signal or hardware issues, may log their location as (0,0).
- Misconfigured systems or databases: Errors during data entry, transmission, or storage (e.g., placeholder fields not updated) can cause POIs or activity logs to resolve to Null Island incorrectly.
It all comes down to this: Misattributed data. POIs in this area are almost certainly artifacts of the data-logging process rather than actual locations, and yet they are present in the dataset.
Prioritize data quality with Echo
At Echo Analytics, trust in data is vital to our business. We place a real emphasis on prioritizing quality over quantity. We’ve built our methodology around confidence scores, ensuring the data we process meets the high standards businesses need to make informed decisions. It’s our goal to mitigate the risks of poor delivery and unreliable insights.
In a world where the reliance on location data is growing exponentially, integrating and managing diverse datasets to uncover meaningful insights can be daunting. We address this challenge with a unified data schema and a rigorous in-house methodology. This is ensuring the highest standards of consistency and quality. For example, to establish a confidence score on a country level, we focus on four key pillars: volume, brand accuracy, completeness, and freshness. These pillars are evaluated using detailed metrics, including POI density, brand alignment, attribute robustness, and data recency, providing actionable insights across markets. (Want to know more? You can learn all about our methodology here.)
Insights for businesses exploring open-source data
If you’re delving into open-source data, such as the offerings from the likes of Foursquare, be prepared for significant mining and in-house analysis. While open-source datasets can offer comprehensive information, they demand substantial effort, time, and resources to transform raw data into actionable insights. In other words, it’s a lot of heavy lifting.
That’s where Echo Analytics steps in. We specialize in turning geospatial data into actionable intelligence. By delivering datasets that are accurate, comprehensive, and derived from reliable sources, we simplify decision-making for businesses. Staying true to our quality- and accuracy-first methodology ensures the geospatial intelligence we provide won’t lead you astray. Or to Null Island for that matter.