When I first looked at whether it's possible to fight back against AI-generated plagiarism, and how that might work, it was January 2023, just a few months into the world's exploding awareness of generative AI. More than a year later, it feels like we've been exploring generative AI for years, but we've only looked at the issue for about 18 months.
In any case, this is an updated version of that original January 2023 article. When I first tested GPT detectors, I used three: the GPT-2 Output Detector (this is a different URL than we published before), Writer.com AI Content Detector, and Content at Scale AI Content Detection.
In October 2023, I removed the Writer.com AI Content Detector from our test suite because it failed back in January 2023, it failed again in October, and it failed now. See below for a comment from the company, which their team sent me after the original article was published in January.
Before I go on, though, we should discuss plagiarism and how it relates to our problem. Merriam-Webster defines "plagiarize" as "to steal and pass off (the ideas or words of another) as one's own; use (another's production) without crediting the source."
This definition fits AI-created content well. While someone using an AI tool like Notion AI or ChatGPT isn't stealing content, if that person doesn't credit the words as coming from an AI and claims them as their own, it still meets the dictionary definition of plagiarism.
Also: The best AI image generators to try
In this experimental article, I've asked ChatGPT to help out. My words are in normal and bold text. The AI's words are italicized. After each AI-generated section, I'll show the results of the detectors. At the end of the article, we'll look at how well the detectors performed overall.
Here are the test results for the above text, which I wrote myself:
Human-written content: 5-of-7 correct
Before we move on, let's discuss what I'll consider a "correct" analysis. If the score is numeric, anything above 80% will be counted as a correct analysis, and that's being generous. If the score has a summary, I'll look at the summary and make a judgment.
With that, let's dig in some more.
Explain why ChatGPT can be a real problem for teachers and editors when it comes to plagiarism
ChatGPT is a variant of the GPT (Generative Pre-trained Transformer) language model developed by OpenAI. It is designed to generate human-like text by predicting the next word or phrase in a given input sequence. While ChatGPT can be a useful tool for generating text and responding to prompts, it can also be a problem for teachers and editors when it comes to plagiarism.
Also: 4 ways to overcome your biggest worries about generative AI
One of the main reasons why ChatGPT can be a problem for teachers and editors is that it can generate text that is very similar to that of real human writers. This means that students or other individuals could potentially use ChatGPT to generate text that they could then submit as their own work, thereby committing plagiarism.
For teachers and editors, detecting whether a given piece of text was generated by ChatGPT or written by a human can be very difficult, as the generated text may not contain any obvious errors or discrepancies that would reveal its artificial origin. This can make it difficult for teachers and editors to identify and prevent plagiarism, as they may not be able to tell whether a given piece of text is original or copied from another source.
Overall, ChatGPT and other similar language models can be a problem for teachers and editors when it comes to plagiarism because they can generate text that is difficult to distinguish from that written by humans, making it easier for students and other individuals to commit plagiarism without being detected.
Show me at least three ways that I can detect ChatGPT was used to produce a piece of text
There are several ways that you can potentially detect whether ChatGPT or another language model was used to produce a piece of text. Here are three potential approaches you could use:
Keep in mind that it can be difficult to definitively determine whether a language model was used to generate a piece of text, and these methods may not always be effective. However, they can be useful tools to help you identify potential instances of language model-generated text.
In this set of tests, only the italicized content above was run through the checker suite.
AI-written content: 5-of-7 correct
List three plagiarism checkers that can detect ChatGPT text
It is worth noting that there is currently no plagiarism checker that is specifically designed to detect text generated by ChatGPT or other language models. While some plagiarism checkers may be able to identify certain characteristics of language model-generated text that could indicate its artificial origin, there is no guarantee that they will be able to definitively determine whether a piece of text was generated by a language model.
With that said, here are three plagiarism checkers that you might consider using to help identify potential instances of language model-generated text:
It is worth noting that no plagiarism checker is completely foolproof, and there is always the possibility that a piece of language model-generated text could escape detection. Therefore, it is important to use plagiarism checkers as just one part of a larger strategy to detect and prevent plagiarism.
AI-written content: 5-of-7 correct
Most plagiarism detectors are used to compare writing against a corpus of other writing. For example, when a student turns in an essay, a product like Turnitin scans the submitted essay against a huge library of essays in its database, and other documents and text on the internet to determine if the submitted essay contains already-written content.
However, the AI-writing tools generate original content, at least in theory. Yes, they build their content from whatever they've been trained on, but the words they construct are somewhat unique for each composition.
Also: OpenAI pulls its own AI detection tool because it was performing so poorly
As such, the plagiarism checkers mentioned above probably won't work because the AI-generated content probably didn't exist in, say, another student's paper.
In this article, we're just looking at GPT detectors. But plagiarism is a big problem, and as we've seen, some choose to define plagiarism as something you claim as yours that you didn't write, while others choose to define plagiarism as something written by someone else that you claim is yours.
That distinction was never a problem until now. Now that we have non-human writers, the plagiarism distinction is more nuanced. It's up to every teacher, school, editor, and institution to decide exactly where that line is drawn.
Human-written content: 7-of-7 correct
Overall, test results this time are dramatically better than they have been with previous tests.
In our previous runs, none of the tests got everything right. This time, three of the seven services tested got the results correct 100% of the time.
While the overall results have improved dramatically, I would not be comfortable relying solely on these tools to validate a student's content. As has been shown, writing from non-native speakers often gets rated as generated by an AI, and even though my hand-crafted content has no longer been rated as AI, there were a few paragraphs flagged by the testers as possibly being AI-based. So, I would advocate caution before relying on the results of any (or all) of these tools.
Also: Could someone falsely accuse you of using AI-generated text? This could be why
Let's look at the individual testers and see how each performed.
This first tool was built using a machine-learning hub managed by New York-based AI company Hugging Face. While the company has received $40 million in funding to develop its natural language library, the GPT-2 detector appears to be a user-created tool using the Hugging Face Transformers library. Of the six tests I ran, it was accurate for four of them.
Writer.com is a service that generates AI writing, oriented towards corporate teams. Its AI Content Detector tool can scan for generated content. Unfortunately, I found this tool unreliable, and it failed to generate results -- exactly the same way it did in January 2023.
After this article was originally published in January, the folks at Writer.com reached out to ZDNET. CEO May Habib had this comment to share:
Demand for the AI detector has skyrocketed. Traffic has grown 2-3x per week since we launched it a couple months ago. We've now got the necessary scaling behind it to make sure it doesn't go down, and our goal is to keep it free - and up to date to catch the latest models' outputs, including ours. If AI output is going to be used verbatim, it absolutely should be attributed.
The third tool I found was also produced by an AI content generation firm. Content at Scale pitches itself as "We Help SEO-Focused Content Marketers by Automating Content Creation." Its marketing call to action is, "Upload a list of keywords and get 2,600+ word blog posts that bypass AI content detection -- all with no human intervention!" Disturbingly, the results got worse from January -- back then, it was 50% accurate. It has not improved since.
It's not entirely clear what drives GPTZero. The company is hiring engineers and sales folks, and it runs on AWS, so there are expenses and sales involved. However, all I could find about a service offering was a place where you could register for a free account to scan more than the 5,000 words offered without login. If you're interested in this service for GPT detection, you'll have to see if they'll respond to you with more details. Accuracy has increased since the last time I ran these tests.
ZeroGPT seems to have matured as a service since we last looked at it. When we last looked, no company name was listed, and the site was peppered with Google ads with no apparent strategy for monetization. The service worked fairly well but seemed sketchy as heck.
That sketchy-as-heck feeling is now gone. ZeroGPT presents as any other SaaS service, complete with pricing, company name, contact information, and all the rest. It still performs quite well, so perhaps the developers decided to turn their working code into more of a working business. Accuracy increased as well. Good for them.
Writefull sells writing support services, as well as a free taste of its tools. The GPT detector is fairly new and worked fairly well. Although not fully accurate, it did improve from 60% accurate to 80% accurate with my tests.
Originality.ai is a commercial service that bills itself as both an AI checker and a plagiarism checker. The company sells its services based on usage credits. To give you an idea, all the scans I did for this article used a total of 30 usage credits. The company sells 2,000 credits a month for $12.95 per month. I pumped about 1,400 words through the system and used only 1.5% of the monthly allocation.
Results were great for the AI checker, but they failed 3 out of 5 times when it came to using the service as a plagiarism checker. The following screenshot claims that the text pasted in was 0% plagiarised:
That's wrong since all the text pasted into it was from this article, which has been published online for 18 months. I thought, perhaps, that the plagiarism scanner couldn't read ZDNET content, but that's not the case, as this screenshot shows:
To be fair, I didn't set out to check plagiarism checkers in this article. But since I'm using source material I know I pulled from my existing article, I figured the plagiarism checker would have slammed all of them as 100% plagiarized. In any case, Originality.ai did well on the part we set out to test, the AI checker. They get points for that.
Nothing is ever easy. The first time I ran my first test through QuillBot, it said that 45% of the text was likely generated by an AI. It wasn't. I wrote it. But then, after completing all the other tests, I returned to QuillBot to grab a screenshot for this section, fed it the same text that generated the 45% score, and, as you can see, it now reports 0% AI.
So, what are we to make of this result? Sadly, I didn't grab a screenshot of the first time I tested this text, but it highlights the concern with relying too much on AI detectors because they are also quite capable of hallucination.
Well, OpenAI pulled it last year because the detector wasn't particularly accurate. Now, it's supposed to be 99% accurate. That's a little tough to believe because ChatGPT itself is far from 100% accurate.
Also: Will OpenAI's new AI detection tool put an end to student cheating?
But, in any case, as my ZDNET buddy Steven Vaughan-Nichols writes, it's not available yet. We'll give it a spin when it's released for public use. Stay tuned.
First, I'll let ChatGPT have its say: AI-driven plagiarism is a growing problem, as AI tools like Notion AI and ChatGPT can generate text that is difficult to distinguish from that written by humans. There are several ways to detect whether a piece of text was generated by an AI, such as looking for repetitive or unusual patterns, checking for a lack of originality, or using a plagiarism checker. Plagiarism checkers like Turnitin, PlagScan, and Copyleaks may be able to identify certain characteristics of language model-generated text, but they are not foolproof.
Even the specialty tools I found were woefully unsuited to the task. There was some limited improvement over my tests in January. However, I certainly wouldn't feel comfortable threatening a student's academic standing or accusing them of cheating based on the results of these tools.
Also: Surprise! AI chatbots don't increase student cheating after all, research finds
At this point, I don't think we're ready to let AI fight AI. Instead, we'll have to rely on human perception (which means folks have to pay attention) to identify works generated by an AI.
Don't immediately assume that something is machine-generated. Just because something is stupid, or wildly inaccurate, doesn't mean it was not written by a human. As we all know, humans can write some bad stuff on their own.
Also: Can Perplexity Pro help you code? It aced my programming tests - thanks to GPT-4
So, what do you think? Can you tell if a piece is AI-generated? Have you used these AI writing tools in the months since they suddenly became available? Do you think they're a help or hindrance to education and journalism? Let us know in the comments below. Here's how ChatGPT answered this:
As a language model, I can't see or interact with physical pieces of writing, so I can't tell if a piece is AI-generated or not. However, I can tell you that there are several AI writing tools that are available, and they can be used to generate a wide range of written content, including news articles, fiction, poetry, and more.