Home    |

Learn    |

Listen    |
Read    |

Experience    |

|     Contact

Every impenetrable LLM can be jailbroken. And every service agreement that guarantees the safety of your data, once entered into a prompt window, will promise not to use it to train future models. All of this can be broken, loopholed or hacked. Once you give your information, texts, emails, poetry into a Large Language Model, or post anything onto the web, it’s no longer yours. AI makes it more real and complex. A guide to keeping data safe in the AI landscape.



If you have chosen for an off-the-shelf solution, you will necessarily be sending your data and personal information about your customers and employees to a place outside your four walls. It’s possible that, depending on the IT maturity level of your organization, the state of your own house is in such disarray that anywhere your data resides is safer than where it is right now. But if you have reasonable control over what comes in and out of your servers, consider the pros and cons of going off the grid.

The big AI vendors make great promises that your data will never become training data for new models nor find its way into an output field for some other user in some far away country. But promises can be cheap. These companies have legal departments that occupy entire floors of their buildings, and service agreements can be full of loopholes that preclude future models or new tech and there is no way to verify any of it.

Consider the following excerpt from the OpenAI website. Here, they discuss your data, and what they promise to do and not do with it. Notice how they make no guarantees about the future, and are quick to make the distinction about “sensitive info” and “knowledge files” as different from run-of-the-mill data, thus giving themselves permission to use non-sensitive for whatever they want. Who decides? They do.

As a part of OpenAI’s privacy settings, you can choose to keep your account from being used for training the models. This way, OpenAI won’t touch anything with sensitive info, such as your chat history and knowledge files, to improve their models[i].

X’s Grok model is trained on years of your and my Twitter posts (another reason to avoid this platform), once having promised its users not to use your information for various things that had nothing to do with AI, before it was even a thing. Amazon’s mammoth Olympus is likely trained on all the books in its possession, whether the authors like it or not. And we don’t know what new tech will arise in the future, whether AI is self-propagating, self-replicating and re-writing its own training data.

With any AI model that generates anything, there’s a lot of data going in and out all the time. And that comes with the chance of data leaks, security breaches and folks finding their way into your knowledge sources. Be careful what you upload.

Part of the reason why your data is safer – even if your IT department doesn’t have its own zip code – is due to your low profile. It’s every hacker’s dream to break into OpenAI’s sanctuary, or Google’s or Meta’s or xAI. And as a result, every hacker on earth is currently trying.

Your organization is an OEM suppling an obscure for a particular medical device. Your data runs with the pack, one wildebeest running among a million, invisible by sheer numbers from the lions at the water’s edge.

Every impenetrable LLM can be jailbroken. In November 2023, a team of researchers studying model memorization asked ChatGPT to repeat one word endlessly – “poem”, oddly – to see whether repetition affected performance. After generating this word a few thousand times, the model “snapped”. It suddenly began coughing up portions of its own training data. Entire pages of what seemed like random text came issuing out, followed by reams of prose that appeared to be articles published somewhere, out there, by a Human[ii]. They published the paper and sure enough, it appeared to be a repeatable error, repeated gleefully by developers and The Curious everywhere.

This bizarre accidental discovery reveals that there is no such thing as a secure system that will never break to the stress tests of global hackers, researchers or the lucky prompter.





Reach out to me for advice – I have a few nice tricks up my sleeve to help guide you on your way, as well as a few “insiders’ links” I can share to get you that free trial version you need to get started.

No eyeballs to read or watch? Just listen.

Keeping Data Safe

Every impenetrable LLM can be jailbroken. And every service agreement that guarantees the safety of your data, once entered into a prompt window, will promise not to use it to train future models. All of this can be broken, loopholed or hacked. Once you give your information, texts, emails, poetry into a Large Language Model, or post anything onto the web, it’s no longer yours. AI makes it more real and complex. A guide to keeping data safe in the AI landscape.

About Fiona Passantino

Fiona is an AI Integration Specialist, coming at it from the Human approach; via Culture, Engagement and Communications. She is a frequent speaker, workshop facilitator and trainer.

Fiona helps leaders and teams engage, inspire and connect; empowered through our new technologies, to bring our best selves to work. She is a speaker, facilitator, trainer, executive coach, podcaster blogger, YouTuber and the author of the Comic Books for Executives series. Her next book, “AI-Powered”, is due for release soon.