1. Website Planet
  2. >
  3. Blog
  4. >
  5. Millions of Users’ Website Traffic Exposed in Data Breach

Millions of Users’ Website Traffic Exposed in Data Breach

Website Planet Security Team
Company name and location: Unknown
Size (in GB and amount of records): Around 359M records, 579.4 GB of data
Data Storage Format: ElasticSearch
Countries Affected: Worldwide

The Website Planet research team discovered a critical data exposure affecting an organization using an open-source data analytics software that allows entities to gather and analyze information about their websites’ visitors.

Two ElasticSearch servers owned by an unknown organization using this software were left unsecured, exposing data related to website visitors.

Web analytics tools collect a vast amount of data to build a detailed profile for each website visitor. Even though websites often collect this information without users knowing, the servers demonstrate that these detailed user profiles can be exposed. As such, users must take adequate steps to protect their privacy when using the internet.

Customer Data Exposed

The ElasticSearch servers were misconfigured; left without any user authentication controls or encryption in place. As such, the unsecured ElasticSearch servers exposed 359,019,902 total records, equating to 579.4 GB of data.

Both of the unknown organization’s ElasticSearch servers contained detailed logs of website user traffic — information that belongs to users of various websites collecting data with the open-source technology.

Website user traffic found on the server included:

  • Geolocation data
  • Web page visited
  • Referrer page
  • Timestamp IP
  • User agent data of website visitors

The servers contained user information collected over two months in 2021. The first server contained September 2021 data and the second server featured December 2021 data.

The September 2021 server consisted of 242,728,328 records, totaling 389.7 GB of data, and this data was collected between September 2nd, 2021, and October 1st, 2021.

The December 2021 server featured 116,291,574 records, totaling 189.7 GB of data, and was collected between December 1st, 2021, and December 27th, 2021.

People could be located through each server’s logs of user profiles. Users could be filtered based on their IP addresses and, from here, the server disclosed extensive details about each user’s passive digital footprint — information that’s collected from internet users without them knowing, such as web-browsing activity.

Exposed users appeared in 4 to 100 records on average across the two servers. Considering the presence of multiple logs for each user, we estimate there are around 15 million people affected by the misconfigured ElasticSearch servers.

Two factors could impact our estimated figure. Firstly, a number of each server’s user profiles belonged to bots that were crawling each website, including Googlebot and Pinterest Bot. The presence of bots could lower our estimate, though, only by a small amount. Secondly, exposed users were distinguished through their IP addresses, so any website visitors using a VPN or Tor may be included in our “people affected” estimate but wouldn’t be exposed. In practice, these two factors likely balance out to some degree.

The servers were live and were being updated at the time of discovery. ElasticSearch is not at fault for this data exposure, and neither is the company providing the open-source web-analytics technology that was used to harvest the data.

You can see evidence of server logs that exposed website user traffic in the screenshots below.

Web Analytics Company Exposes Tracked Data for Millions of People
Exposed data belonging to a website visitor

Who was Affected?

An estimated 15 million users were exposed on the two open ElasticSearch servers. These people visited various websites that were analyzed by the company at fault, which was using a web analytics technology provided by the software company SnowPlow Analytics.

What’s more, this data exposure has a global impact. Users from around the world had data stored on the unsecured server.

How Was the Exposed Data Collected?

The unknown organization’s dataset was collected using software from Snowplow Analytics.

Founded in 2012, Snowplow Analytics Ltd. provides a suite of web analytics products that websites and companies can tailor to their needs.

The Snowplow Open-Source software gathers information about visitors’ traffic on websites and apps and gives users the functionality to control and customize their data collection. Organizations can use Snowplow to help analyze visitors’ passive digital footprints and gain insights into these visitors.

Snowplow Analytics is based in London, England, and turns-over annual revenues of around US$10 million. Snowplow’s software is popular with huge corporations, including Strava, The Wall Street Journal, AutoTrader, Capital One, and ABC.

It’s important to note that we aren’t accusing Snowplow Analytics of this data exposure. While the unknown organization used Snowplow’s technology, the data was seemingly exposed due to the unknown organization’s misconfigured servers.

We know the ElasticSearch servers belong to a Snowplow user because Snowplow’s website URL appears as a source in server records. Upon further correspondence, Snowplow told us that the servers’ owner was using an open-source installation of Snowplow’s software.

Web Analytics Company Exposes Tracked Data for Millions of People
Snowplow’s URL shows up in server records

Impact on End Users

We do not and cannot know whether malicious parties have accessed the ElasticSearch servers. If bad actors have read or downloaded the servers’ records, exposed users could face the threat of cybercrime.

Privacy Violation

The organization’s ElasticSearch servers violate users’ privacy, though, this fact may not be obvious given the various details exposed. IP address information, used in conjunction with geolocation data, ISP, device types, operating system details, browser details, timestamps, and website visit history could be used to locate and identify specific individuals. It’s not easy to identify users with this information. Nonetheless, each user could be subject to various cybercrimes if a hacker was able to identify them on one of the ElasticSearch servers.

Impersonation

Malicious actors could conduct illegal activities online while posing as another device with the exposed user agent data. Hackers impersonate user agents to appear as legitimate sources to web applications. This means hackers can avoid being blocked during attacks on websites or user accounts.

Impact on the Server’s Owner

The organization that owns the ElasticSearch servers could also be impacted by this data leak. This organization could be investigated by several data protection authorities with users from around the world exposed on its server.

Data Privacy Violations

Prominent data protection regulations in the United States, the EU, and the United Kingdom may have been breached when the ElasticSearch servers exposed website users’ traffic. This is because users from around the world are identifiable through their leaked data.

Status of the Data Exposure

The Website Planet research team discovered the misconfigured ElasticSearch servers on January 18th, 2022. It was easy to discern the servers’ origin because of references to Snowplow’s website URL in logs and further correspondence with Snowplow. Though, we do not have a name for the ElasticSearch servers’ owner.

The Website Planet research team sent an email to Snowplow after discovering the open ElasticSearch servers. On January 20th, 2022, we sent another message to Snowplow Analytics. The company’s head of engineering replied and told us that the servers belong to an organization using an open-source installation of Snowplow’s software. Snowplow said it would reach the organization to close the breach. On January 26th, 2022, we followed up with Snowplow regarding the open servers, and between January 27th and January 31st, 2022, the misconfigured Elasticsearch servers were secured.

Protecting Your Data

The open ElasticSearches demonstrate that online users can have personal data exposed that was collected through on-site tracking, something many of us do not even consider while browsing the internet.

There are steps users can take to limit on-site tracking and prevent this kind of data exposure from happening in the future.

A Virtual Private Network (VPN) hides the user’s online activity and IP address, making the user anonymous to on-site tracking and cookies. A VPN should be the first option for internet users who want to protect their online activity. People can also use the Tor browser to access the internet anonymously and maintain the privacy of their data.

Internet users should consider carefully whether they want to accept “cookies” from a website before entering. Cookies are used to track our online activity. Cookies can improve our experience of a website, brand, or service, though we ultimately hand over more data in exchange for this improved customer experience.

Finally, most web browsers allow users to disable cookies and ad tracking in their browser settings. Users can exercise this option if they want to avoid tracking altogether.

How and Why We Report on Data Breaches

We want to help our readers stay safe when using any website or online product.

Unfortunately, most data breaches are never discovered or reported by the companies responsible. So, we decided to do the work and find the vulnerabilities putting people at risk.

We follow the principles of ethical hacking and stay within the law. We only investigate open, unprotected databases that we find randomly, and we never target specific companies.

By reporting these leaks, we hope to make the internet safer for everyone.

What is Website Planet?

Website Planet is the number one resource for web designers, digital marketers, developers, and businesses with an online presence. You’ll find tools and resources for everyone, from beginners to experts — and honesty is our top priority.

We have an experienced team of ethical security research experts who uncover and disclose serious data leaks as part of a free service for the online community at large. This has included a breach in a medical AI platform, as well as a breach in a French real estate agency leaking sensitive data.

You can read about how we tested five popular web hosts to see how easily hackable they are here.

Rate this Article
5.0 Voted by 2 users
You already voted! Undo
This field is required Maximal length of comment is equal 80000 chars Minimal length of comment is equal 10 chars
Any comments?
Reply
View %s replies
View %s reply
Related posts
Show more related posts
We check all user comments within 48 hours to make sure they are from real people like you. We're glad you found this article useful - we would appreciate it if you let more people know about it.
Popup final window
Share this blog post with friends and co-workers right now:

We check all comments within 48 hours to make sure they're from real users like you.

Once a month you will receive interesting, insightful tips, tricks, and advice to improve your website performance and reach your digital marketing goals!

So happy you liked it!

Share it with your friends!

2657490
100
5000
44107201