On March 21st, 2021 theWebsitePlanet research team in cooperation with Security Researcher Jeremiah Fowler discovered a non-password protected database that contained over 1 billion records. Upon further research it was apparent that the data was connected to CVS Health. We immediately sent a responsible disclosure notice to CVS Health and public access was restricted the same day.
Here is what the database contained:
Total Size: 204.0 GB
Total Records: 1,148,327,940
The database contented the following files:
Indices: Aggregate Data and Event Data.
Types: add to cart, configuration, dashboard, index-pattern, more refinements, order, remove from cart, search, server.
Production records that exposed Visitor ID, Session ID, device information (ie: iPhone, Android, iPad etc.)
A sampling search query revealed emails that could be targeted in a phishing attack for social engineering or potentially used to cross reference other actions.
The files gave a clear understanding of configuration settings, where data is stored, and a blueprint of how the logging service operates from the backend.
CVS Health acted fast and professionally to secure the data and a member of their Information Security Team contacted me the following day and confirmed my findings and that the data was indeed theirs. I was informed that this was a contractor or vendor who managed this dataset on behalf of CVS Health, but it was confidential as to who the vendor was.
The exposed records were marked “production”. When searching for potentially identifiable information we performed several search queries for common email extensions such as Gmail, Hotmail, and Yahoo. There were results for each query within the dataset that indicated the records contained email addresses. It is well known that many personal email addresses are formatted using portions or all of the user’s name. In addition, I was able to identify a small sampling of individuals by simply searching Google for the publicly exposed email address.
The records also contained a “Visitor ID” and “Session ID”. I saw multiple records that indicated visitors searching for a range of items including medications, Covid 19 vaccines, and other CVS products. Hypothetically, it could have been possible to match the Session ID with what they searched for or added to the shopping cart during that session and then try to identify the customer using the exposed emails.
According to the CVS representative,these emails were not from CVS customer account records and were entered into the search bar by visitors themselves. The search bar captures and logs everything that is entered into the website’s search function and these records were stored as log files.
When reviewing the mobile version of the CVS site it is a possible theory that visitors may have believed they were logging into their account, but were really entering their email address into the search bar. The searches were formatted as “event type” parameters and were set to “search” and the email addresses are values for a parameter named “query”. This could explain how so many email addresses ended up in a database of product searches that was not intended to identify the visitor. The records also show what device was used and a majority of the searches I saw were from phones and mobile devices, but there were also desktop computers.
CVS Health provided us with the following statement:
“Thank you again for contacting us about this. We were able to reach out to our vendor and they took immediate action to remove the database. Protecting the private information of our customers and our company is a high priority, and it is important to note that the database did not contain any personal information of our customers, members or patients.”
Activity Logging: A Necessary Evil
Tracking all activity from a website or ecommerce platform helps build valuable insights about visitors and customers. This logging and tracking can often contain metadata or error logs that inadvertently expose more sensitive records. In this case these were search logs from everything that visitors searched for and contained references to both CVS Health and CVS.com. This would have provided valuable analytical data to see what customers are looking for and if they are finding the products they want.
The logging system used a mixed case alpha-numeric visitor ID that appears to ensure that shoppers were anonymous. It should be noted that email addresses for the visitor’s profile or shopping cart were not collected to this database. Unfortunately, only human error can be blamed for both the misconfiguration that publicly exposed the database and website visitors who entered their own email addresses in the search bar. I recommended to CVS that in the future they should block any searches that match email address patterns or domain names from being executed or logged. This could help avoid unwanted data from being collected or stored.
The Visitor ID and Session ID alone contained no identifiable data and only when combined with the email addresses could there have been any remote possibility to identify the user. Theoretically, the search would still create a “Session ID” that might not change during that visit and the ability to link that email with that “Session ID”. This exposure could have potentially identified the unknown number of sessions where users added their emails in the search bar and then went on to perform other actions such as; search history, and products added or removed from their online shopping cart. The Session IDs with emails were unique and the time stamps were not consecutive this would indicate these were not likely to be automated search queries.
We do not download or extract the publicly accessible data we find and only take a limited number of screenshots for verification purposes. It is always a race against the clock to help secure exposed data before it is exploited or wiped out by ransomware. We were unable to review all 1.1 billion records due to the urgency we put into responsibly reporting this exposure and how fast the CVS vendor restricted public access. We were only able to review a limited sampling of records and not the entire dataset.
Other Risks to Consider
When any database is exposed there is a possibility to see configuration, applications, software, operating systems, and build information that could identify potential vulnerabilities if they were unpatched or outdated. Cyber criminals and Nation States alike use complex methods to collect and exploit the data they find. Often they use the same methods as legitimate security researchers to identify publicly exposed data. While we work daily to protect the data we discover there are cyber criminals looking to exploit the data for nefarious purposes. Each record of information serves as a puzzle piece to provide a larger picture of an organization’s network or data storage methods.
According to Wikipedia: CVS Health is an American healthcare company that owns CVS Pharmacy, a retail pharmacy chain; CVS Caremark, a pharmacy benefits manager; Aetna, a health insurance provider, among many other brands. The company’s headquarters is in Woonsocket, Rhode Island.
We are not implying any wrongdoing by CVS Health, their contractors, or vendors. We are also not implying that customers, members, patients or website visitors were at risk. The theories expressed here are based on hypothetical possibilities of how this data could be used. We are only highlighting our discovery to raise cyber security awareness of how something as simple as search logging and a misconfigured database could potentially capture and expose data.
Jeremiah Fowler is a Security Researcher and co-founder of Security Discovery. Jeremiah began his career in security research in 2015 and has a mission of data protection. He has helped identify and secure the data of millions of people around the world. His discoveries have been covered in Forbes, BBC, Gizmodo, among others. Security and responsible disclosure are not only a passion, but a way of protecting our digital lives.