Report: Medical AI Company Exposed Millions of Records Online

Medical-AI-Company-Exposed-Millions-of-Records-Online.jpg

First published: October 27, 2021

Number of Updates: 6

Security researcher Jeremiah Fowler together with the Website Planet research team discovered a non-password protected database that contained 886,521,320 records. The total size of the dataset was 68.53 GB and contained medical related data. Upon further research there were multiple references to Deep6.AI including internal emails and usernames. We immediately sent a responsible disclosure notice and public access was restricted shortly after. The records appear to contain data of those based in the United States.

The type of data collected were divided in to the following sections:

Date, document type, physician note, encounter IDs (An interaction between a patient and healthcare provider(s) for the purpose of providing healthcare service(s)), patient ID, note, uuid, patient type, noteId, date of service, note type (example Nursing/other), and detailed note text. Some of this information was encrypted, but the notes and Physician information were in plain text. The danger would be if the patient ID were decrypted and the identity were exposed it would be clear to see their medical issues or diagnoses.

Deep6 takes raw medical data and tries to manage or organize it.

According to their website: “Deep6 AI’s software also identifies patients with conditions not explicitly mentioned in medical records. As a result, Deep6 AI’s software finds more patients who better match trial criteria in a fraction of the time”. Deepd6 is located in Pasadena, California, USA.

The exposed records revealed Physician Notes that provided intimate details of patient illness, treatment, medication, family, social and even emotional issues. These were very complete descriptions and it was surprising just how many small details were included in these notes. It is a rare look behind the scenes of how these notes look and the kind of information that is collected by medical workers.

Example of Physician Note:

“Sobbing and unable to stop, sat with pt at long intervals. Pt has never spoke to a therapist or been prescribed medications. Social Service Consult visited with pt and will follow during this admission. Received ativan 2mg IVB x2 during procedure with fair effect. Her home pain control for lupus takes Vicodin 2 tabs and Tylenol #3 2 tabs at HS.\nCV-VSS awaiting for PA catheter insertion before Flolan starts by MICU service”.

In a sampling of 10k records “patient” appears 8,672 times and some of these are in the Doctor’s notes. The word “Note” appeared 5,914 times. There were references to .csv documents and we can only speculate that these might have contained additional information. In theory if someone gained access to these .csv documents they could potentially match the detailed notes with the patient data, diagnosis, medicines, and treatments.

Details of the Discovery:

Total Size: 68.53 GB
Total Records 886,521,320
Concept-index – 21.0 Million records, exposing lab results and medicine details,
Patient-index – 422 Million records. Note: Although the patients’ names were not in plain text it provides a clear understanding of where this information is stored. What was exposed however was an internal patient logging and tracking processes.
Provider-index – 89 Thousand records exposing physician names, internal patient ID numbers (these are internal tracking numbers and shows the logging format), document locations and .CSV files, and other potentially sensitive information.
The files also show where data is stored and references to “Production” data.
The database was at risk of a ransomware attack and was publically accessible to anyone with an internet connection.

As security researchers we never circumvent password protected areas of the network or try to decrypt records. In the sample the word “Demo” did not appear and the only mentions of “test” were in the medical context of patient needs. The doctors all appear to be US based and multiple provider networks. The medical workers’ information was in plain text and included their full name, area of medical specialization, and their role in the patient’s care. The Physician or Doctor’s names were real individuals and I was able to validate a number of unique names using only a search engine.

According to a Press Release from Jun 29, 2021:

With the recent addition of four additional health systems, the Deep6 Ecosystem now contains: Dozens of leading research institutions including 6 NCI-designated comprehensive cancer centers, 30,000 healthcare physicians and other providers, 30 million patients, and thousands of active trials.

Healthcare is a Target

The healthcare industry in the last few years has been plagued by neverending cybersecurity threats and problems. Healthcare cyber attacks in the US were up nearly 55% last year. Cyber attacks on healthcare is a massive problem that is not going away anytime soon. Medical data is far more valuable than any other type of records on the dark web. The average data breach cost per medical record is now estimated to be as high as $499 according to some sources (up from $250 in the last 2 years).

Doctors are also targets of cyber criminals and scammers. During the Covid 19 pandemic doctors and nurses have been in close contact with infected patients. Scammers are now contacting Doctors and pretending to be a contact tracer and then asking for sensitive patient medical data. Hypothetically this exposure could have provided scammers with a list of 89,143 medical professionals that they could target using insider information and their own notes to gain trust.

How Safe is Artificial Intelligence?

As a security researcher I know very well how difficult it can be to search through massive amounts of data and identify what is sensitive and what is not. The basic idea of AI is for a machine to use large amounts of data to learn, become smarter, and predict accurate results from that data in a short amount of time. The same concept that makes artificial intelligence a functional solution is also the same process that makes AI at risk from a cyber security standpoint.

It is an open secret that defensive technology is often far behind that of cyber criminals and Nation States. As machine learning in security grows and becomes more complex so do the cyber attacks. Each time there is a failed intrusion the machine learns to become more intelligent and the attack becomes harder to predict, prevent, or stop.

Artificial intelligence in medicine and healthcare is a revolutionary use of technology that is already having a positive impact on the lives of real people. Organizations must do more to protect their AI applications, machine learning models, and most importantly the massive amount of data they collect. One of the best cyber security solutions for AI is to ensure the data is stored in a secure environment and encryption is used.

Data protection must go hand in hand with the development of AI applications. Even accidental data exposures can leak not only data, but they can also give cyber criminals a behind the scenes look at configurations, middleware that bridges applications together, or operational infrastructure. Armed with this information criminals can establish a more informed and coordinated attack on the database or infrastructure.

Years of research, testing, and clinical trials are necessary to assist sick individuals. Collecting massive amounts of medical and patient data is also part of this process. Our objective is to promote awareness regarding the safeguarding of valuable medical data. We aim to guarantee the protection and security of data for all individuals.

It is unclear how long the database was publicly exposed or who else may have gained access to these records. This discovery raises the question of what privacy protections patients have or how much control they have over how their data is used by healthcare providers, researchers, and 3rd party companies. It is unknown if these individuals have opted in to be selected for a clinical trial or what roles the Doctors or healthcare networks played in providing this data?

Deep6.ai acted fast and professionally to close public access to the dataset. We are not implying any wrongdoing by Deep6.ai or their partners and we are only highlighting the security vulnerabilities surrounding medical technology. We are not indicating that any patient, doctor, or medical data was ever at risk and we publish our findings for educational purposes of cyber security best practices.

Deep6’s reply:

“In August, a security researcher accessed a test environment that contained dummy data from MIT’s Medical Information Mart of Intensive Care (MIMIC) system, an industry standard source for de-identified health-related test data. To confirm, no real patient data or records were included in this ephemeral test environment, and it was completely isolated from our production systems. Based on current reporting, we have confirmed that the recent claims reference MIMIC data, and there was no access to real patient records. When the researcher notified us in August, we immediately secured the test environment to ensure there was no further concern. Data security and privacy is a top priority at Deep 6 AI, and the responsibility to protect data is at the core of our business and top-of-mind for all our people.”