ChatGPT Can Fake A Receipt But Can’t Spot Its Own Work

A single number on a receipt can now be changed in under one second for a few cents. The tool is OpenAI’s GPT-Image-2 , and the technique is called AI inpainting. Think of it as digital surgery. The AI removes one small part of a photograph, generates a replacement that matches the lighting, texture, font and paper grain, and stitches it back in so cleanly that no seam is visible.

A research team that includes founders of Scam.ai , a synthetic media detection company, published a study on April 28 to see how well existing defenses hold up. The results are bad news for anyone who relies on document authenticity in fraud investigations, insurance claims, expense auditing or litigation.

The team built a dataset of 3,066 document forgeries created by GPT-Image-2, each paired with its authentic original. The source documents include Indonesian retail receipts, English receipts, mobile-phone receipt photographs and multilingual business forms in seven languages. The forgeries target the fields that matter most in fraud: monetary totals, line-item prices, dates and document IDs.

Through a public testing site at CanUSpotAI.com , 120 participants viewed 365 side-by-side pairs, one authentic and one forged, and tried to identify which was AI-edited. Their accuracy was 50.1 percent. That is a coin flip. Even with both images in front of them at the same time, people could not reliably spot the forgery.

The Best Forensic Tools Barely Beat Guessing

The researchers tested TruFor , a leading forensic detector that looks for mismatches in camera sensor noise, along with DocTamper, a detector designed specifically for document forgeries. On the GPT-Image-2 forgeries, TruFor scored 0.599 and DocTamper scored 0.585, only slightly above random chance at 0.500.

The same tools performed much better on traditional copy-paste forgeries built from the same documents. TruFor scored 0.962 and DocTamper scored 0.852 on those older-style edits. In other words, the tools still work on classic tampering.

They struggle specifically when the forgery is created with AI inpainting.

The AI Cannot Recognize Its Own Work

The researchers also asked GPT-Image-2 itself whether each image had been AI-edited. The model classified its own forgeries as real 84.7 percent of the time. The team tried five different prompting strategies, and none fixed the problem. Step-by-step reasoning actually made things worse because the model began hallucinating forensic clues in authentic images.

For anyone building AI-powered fraud detection, the lesson is straightforward. You cannot rely on the generating model to police its own output.

Why Current Detection Fails

Traditional document forgery is basically cut and paste. It tends to leave seams: differences in noise, compression history or local image structure that forensic tools are designed to find. AI inpainting works differently. It generates brand-new pixels that are statistically consistent with the surrounding image, which removes many of the discontinuities those tools expect to see.

OpenAI’s Safety Filter Is Not Enough

In the researchers’ tests, OpenAI’s safety filter blocked about 10 percent of document-editing requests. Many blocked requests could be made to succeed with minor rephrasing, and the requests that did pass produced forgeries that none of the tested detectors could reliably flag. A modest rejection rate that is easy to bypass slows attackers down a little. It does not stop them.

The economics of document fraud have changed almost overnight. Creating a convincing forgery used to require skill, time and specialized software. Today it can be done on any consumer grade smartphone or laptop. In forensic practice, many of the signals experts rely on to authenticate documents, including local compression inconsistencies, pixel seams and other classic tampering artifacts, are much weaker or entirely absent when AI inpainting is used.

Visual inspection is no longer enough. A better first step is to verify documents against original source systems. If someone submits a receipt, match it against the actual point-of-sale transaction record instead of trusting the image on its own.

Provenance standards such as C2PA are helpful, but they are not the full solution. They can show where an image came from, but they do not resolve the hardest cases when provenance is missing, stripped away or disputed.

What companies need is a full stack: top-level screening at intake, verification against source systems and, when the stakes are high or the dispute is headed to court, the ability to escalate all the way to device-level forensics on the original phone, camera or storage media. Trust in digital documents will come from that kind of end-to-end capability, not from any single detector, model or provenance standard on its own.

ChatGPT Can Fake A Receipt But Can’t Spot Its Own Work

The Best Forensic Tools Barely Beat Guessing

The AI Cannot Recognize Its Own Work

Why Current Detection Fails

OpenAI’s Safety Filter Is Not Enough

Read Next