How ChatGPT Could Be a Data Security Problem

How ChatGPT Could Be a Data Security Problem

In just a short period of time, ChatGPT and other large language models (LLMs) have become immensely popular. They are now used by millions of people daily.

Along with this popularity, concerns have arisen about vulnerabilities — such as critical security flaws in ChatGPT plugins reported on March 14, 2024. 

ChatGPT’s ability to provide human-like responses and generate content has shown to be valuable to businesses and people across customer service, education, healthcare, and other industries. However, to get access to OpenAI’s powerful AI-tool, what do these businesses and people give up? In short, the answer is their data.

When is Your Data not Your Data?

Andrew Lewis has a famous quote: “If you are not paying for it, you're not the customer; you're the product being sold.” 

While ChatGPT has both free and paid options, the fact remains that any data used as input for ChatGPT can be used by OpenAI for their own commercial purposes. ChatGPT benefits the more it is used, as any data inputted into ChatGPT helps it to create new models and improve future output. 

Take section 3 of OpenAI’s Terms of Service as an example of how OpenAI uses input data:

“You may provide input to the Services (‘Input’), and receive output generated and returned by the Services based on the Input (‘Output’). Input and Output are collectively ‘Content.’ …OpenAI may use Content to provide and maintain the Services, comply with applicable law, and enforce our policies.”

Under these terms, OpenAI can use any data you input into ChatGPT for any of their services related or unrelated to ChatGPT. For customers using ChatGPT in an embedded analytics implementation, this could have disastrous consequences.

In a recent, and real, “data leak” incident, a Samsung employee used ChatGPT to identify potential fixes for a bug in Samsung code. Needless to say, the employee provided Samsung’s proprietary code to ChatGPT as part of the query. This essentially gave anyone using ChatGPT access to Samsung’s proprietary code as the code became part of the training data set ChatGPT uses to generate responses. Any future queries to ChatGPT runs the risk of exposing part or all of the proprietary code to the non-Samsung user.

Read More: Data Security and Compliance: 5 Essential Considerations

ChatGPT risks data security

 

The Risks of ChatGPT with Embedded Analytics

While organizations should take note of the data leak risks associated with employees using ChatGPT, embedded analytics implementations that use ChatGPT need to be even more careful. 

Consider a fictitious wellness company with embedded analytics and a LLM like ChatGPT integrated into their application. This application tracks health indicators of customers through wearables and manual customer input. Customers are able to use embedded analytics capabilities to analyze health trends, and can even query the application’s ChatGPT implementation to query analysis about their own health data and receive personalized responses.

In this case, all health data of the customer used to derive the personalized responses becomes content that OpenAI has access to, and can use, when creating output for other ChatGPT queries, or even to build new products and services. Granting ChatGPT this type of access to end user data can be particularly problematic in areas with strict data privacy regulations. 

If the customer requests removal of personal data from the fictitious wellness company, there is no mechanism in place to recover or delete the data that has already been provided to ChatGPT. And more importantly, the data has already been provided to OpenAI to use for future outputs, so customer privacy has been breached.

Learn More: What is Data Governance? Accountability and Quality Control in Analytics

 

What is data security in analytics

 

Taking Data Security Seriously

While LLM models like ChatGPT are impressive, it’s important to have a strong data security policy in place before rushing to adopt these integrations into embedded analytics applications

A few factors to consider are:

  • Proprietary vs public: Will your application be using proprietary data as an input into the LLM algorithm? ChatGPT shines in generating content and human-like responses based on input data; however, any data used as input should be treated in the same way as publishing the data to a public website. If you wouldn’t publish the data to a public Internet site, you likely should not be using it as input to ChatGPT.

  • GDPR and CCPA: Data privacy regulation is a hot topic globally, and countries and states continue to evolve rules around customer data. By adopting ChatGPT into your embedded application, you may be violating GDPR or CCPA and becoming exposed to costly litigation. It is important to note that the terms of service for ChatGPT explicitly state that users of the service agree to be responsible for any legal consequences associated with the Content of ChatGPT. This includes any litigation against OpenAI.

  • Usability: Will your implementation of ChatGPT into your embedded analytics application provide a useful feature to your end-user or are you implementing it to capture part of the buzz associated with ChatGPT? ChatGPT and LLM implementations are impressive, but have limited use cases within the embedded analytics space that have not already been solved with other solutions. It’s important to consider the value something like ChatGPT can bring and weigh it against the risks associated with giving data to OpenAI.

At Yellowfin, we know the importance of data and data security. The platform has been designed from the ground up with data governance and segregation as key features of the product. Companies that use Yellowfin to embed BI and analytics into their applications are able to provide their end users valuable insights into their analytics without sacrificing and exposing internal data to the public.

While we continue to follow the innovations with LLM and ChatGPT, we find that the risk of integrating a technology like this into Yellowfin far outweighs the benefits. We already offer accessibility features like Guided NLQ that enables end uses to perform data analytics using an easier to comprehend human-like questioning process. And this is accomplished without copying the user data outside its silo.

Try Yellowfin For Yourself

Learn how Yellowfin can provide your customers and users with a sophisticated, AI-powered, highly secure analytics experience for your unique use cases.