ChatGPT Watermarking And How The ChatGPT Watermark Works
Headlines
- A cryptographic watermark is said to be coming that will make it easy to catch ChatGPT-generated content
- OpenAI Creators reveals how the ChatGPT watermark could be defeated
- Computer scientist Scott Aaron discusses AI Safety and Alignment work at OpenAI
ChatGPT Watermark News Sources
We want it to be much harder to take [an AI system’s] output and pass it off as if it came from a human
This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propaganda... you know, spamming every blog with seemingly on-topic comments supporting Russia’s invasion of Ukraine without even a building full of trolls in Moscow. Or impersonating someone’s writing style in order to incriminate them.
ChatGPT watermarking
Online publishers are afraid of the prospect of AI content flooding the search results, supplanting expert articles written by humans.
Consequently, news of a watermarking feature that unlocks detection of ChatGPT-authored content is likewise anticipated with anxiety and hope.
Reason for ChatGPT Watermarking
Why the need for a watermark? ChatGPT is a strong example. The chatbot developed by OpenAI has taken the internet by storm, showing an aptitude not only for answering challenging questions but writing poetry, solving programming puzzles, and waxing poetic on any number of philosophical topics.
While ChatGPT is highly amusing — and genuinely useful — the system raises obvious ethical concerns. Like many of the text-generating systems before it, ChatGPT could be used to write high-quality phishing emails and harmful malware, or cheat on school assignments. And as a question-answering tool, it’s factually inconsistent — a shortcoming that led programming Q&A site Stack Overflow to ban answers originating from ChatGPT until further notice.
To grasp the technical underpinnings of OpenAI’s watermarking tool, it’s helpful to know why systems like ChatGPT work as well as they do. These systems understand input and output text as strings of “tokens,” which can be words but also punctuation marks and parts of words. At their cores, the systems are constantly generating a mathematical function called a probability distribution to decide the next token (e.g. word) to output, taking into account all previously outputted tokens.
Cryptographic Watermark
A watermark is a semi-transparent mark (a logo or text) that is embedded in an image. The watermark signals who is the original author of the work.
It’s largely seen in photographs and increasingly in videos.
Watermarking text in ChatGPT involves cryptography in the form of embedding a pattern of words, letters and punctuation in the form of a secret code.
How Does ChatGPT Watermarking Work?
ChatGPT watermarking is a system that embeds a statistical pattern, a code, into the choices of words and even punctuation marks.
Content created by artificial intelligence is generated with a fairly predictable pattern of word choice.
The words written by humans and AI follow a statistical pattern.
Changing the pattern of the words used in generated content is a way to “watermark” the text to make it easy for a system to detect if it was the product of an AI text generator.
The trick that makes AI content watermarking undetectable is that the distribution of words still have a random appearance similar to normal AI generated text.
This is referred to as a pseudorandom distribution of words.
Pseudorandomness is a statistically random series of words or numbers that are not actually random.
ChatGPT watermarking is not currently in use. However Scott Aaronson at OpenAI is on record stating that it is planned.
Right now ChatGPT is in previews, which allows OpenAI to discover “misalignment” through real-world use.
Presumably watermarking may be introduced in a final version of ChatGPT or sooner than that.
Scott Aaronson wrote about how watermarking works:
“My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT.
Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT.”
How to Detect ChatGPT or GPT Watermarking
Something interesting that seems to not be well known yet is that Scott Aaronson noted that there is a way to defeat watermarking.
He didn’t say it’s possible to defeat the watermarking, he said that it can be defeated.
“Now, this can all be defeated with enough effort.
For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that.”
It seems like the watermarking can be defeated, at least in November when the above statements were made.