ChatGPT Watermarking And How The ChatGPT Watermark Works

ChatGPT  Watermarking And How The ChatGPT Watermark Works

A watermark will make it easy to detect ChatGPT-generated content. This is what it is and why it might be easy to defeat.

ChatGPT is a large language model developed by OpenAI that has been trained on a massive dataset of text from the internet. It is designed to be able to generate human-like text by predicting the next word in a sequence based on the context of the words that come before it. GPT has been used to build a variety of natural language processing (NLP) applications, including chatbots, language translation systems, and content generation systems.

Headlines

  • A cryptographic watermark is said to be coming that will make it easy to catch ChatGPT-generated content
  • OpenAI Creators reveals how the ChatGPT watermark could be defeated
  • Computer scientist Scott Aaron discusses AI Safety and Alignment work at OpenAI
OpenAI’s attempts to watermark AI text hit limitIt's proving tough to rein in systems like ChatGPT

Open AI ChatGPT Watermarking


ChatGPT Watermark News Sources

In a lecture at the University of Texas at Austin, computer science professor Scott Aaronson, currently, a guest researcher at OpenAI, revealed that OpenAI is developing a tool for “statistically watermarking the outputs of a text [AI system].” Whenever a system says, ChatGPT generates text, the tool would embed an “unnoticeable secret signal” indicating where the text came from.

OpenAI engineer Hendrik Kirchner built a working prototype, Aaronson says, and the hope is to build it into future OpenAI-developed systems.

We want it to be much harder to take [an AI system’s] output and pass it off as if it came from a human
Aaronson said in his remarks. 
This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propaganda... you know, spamming every blog with seemingly on-topic comments supporting Russia’s invasion of Ukraine without even a building full of trolls in Moscow. Or impersonating someone’s writing style in order to incriminate them.

ChatGPT watermarking 

OpenAI’s ChatGPT introduced a way to automatically create content but plans to introduce a watermarking feature to make it easy to detect are making some people nervous. This is how ChatGPT watermarking works and why there may be a way to defeat it.

ChatGPT is an incredible tool that online publishers, affiliates and SEOs simultaneously love and dread.

Some marketers love it because they’re discovering new ways to use it to generate content briefs, outlines and complex articles.

Online publishers are afraid of the prospect of AI content flooding the search results, supplanting expert articles written by humans.

Consequently, news of a watermarking feature that unlocks detection of ChatGPT-authored content is likewise anticipated with anxiety and hope.

Reason for ChatGPT Watermarking

Why the need for a watermark? ChatGPT is a strong example. The chatbot developed by OpenAI has taken the internet by storm, showing an aptitude not only for answering challenging questions but writing poetry, solving programming puzzles, and waxing poetic on any number of philosophical topics.


While ChatGPT is highly amusing — and genuinely useful — the system raises obvious ethical concerns. Like many of the text-generating systems before it, ChatGPT could be used to write high-quality phishing emails and harmful malware, or cheat on school assignments. And as a question-answering tool, it’s factually inconsistent — a shortcoming that led programming Q&A site Stack Overflow to ban answers originating from ChatGPT until further notice.


To grasp the technical underpinnings of OpenAI’s watermarking tool, it’s helpful to know why systems like ChatGPT work as well as they do. These systems understand input and output text as strings of “tokens,” which can be words but also punctuation marks and parts of words. At their cores, the systems are constantly generating a mathematical function called a probability distribution to decide the next token (e.g. word) to output, taking into account all previously outputted tokens.


Cryptographic Watermark

A watermark is a semi-transparent mark (a logo or text) that is embedded in an image. The watermark signals who is the original author of the work.

It’s largely seen in photographs and increasingly in videos.

Watermarking text in ChatGPT involves cryptography in the form of embedding a pattern of words, letters and punctuation in the form of a secret code.

How Does ChatGPT Watermarking Work?

ChatGPT watermarking is a system that embeds a statistical pattern, a code, into the choices of words and even punctuation marks.

Content created by artificial intelligence is generated with a fairly predictable pattern of word choice.

The words written by humans and AI follow a statistical pattern.

Changing the pattern of the words used in generated content is a way to “watermark” the text to make it easy for a system to detect if it was the product of an AI text generator.

The trick that makes AI content watermarking undetectable is that the distribution of words still have a random appearance similar to normal AI generated text.

This is referred to as a pseudorandom distribution of words.

Pseudorandomness is a statistically random series of words or numbers that are not actually random.

ChatGPT watermarking is not currently in use. However Scott Aaronson at OpenAI is on record stating that it is planned.

Right now ChatGPT is in previews, which allows OpenAI to discover “misalignment” through real-world use.

Presumably watermarking may be introduced in a final version of ChatGPT or sooner than that.

Scott Aaronson wrote about how watermarking works:

“My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT.

Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT.”

How to Detect ChatGPT or GPT Watermarking

Something interesting that seems to not be well known yet is that Scott Aaronson noted that there is a way to defeat watermarking.

He didn’t say it’s possible to defeat the watermarking, he said that it can be defeated.

“Now, this can all be defeated with enough effort.


For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that.”

It seems like the watermarking can be defeated, at least in November when the above statements were made. 

 

Post a Comment

Type your comment!

Previous Post Next Post