The Algorithm Arms Race: When AI Creates Watermarks and Another AI Tries to Erase Them

Is unbreakable watermarking a genuine achievement — or just the lead in a race that never ends?
In 2024, a team of researchers submitted a paper to NeurIPS — one of the most competitive venues in machine learning — with a title that cut straight to the point: invisible image watermarks are provably removable using generative AI. The paper did not claim that all watermarks were easy to defeat. It claimed something more troubling: that given a sufficiently capable generative model and a watermarked image, an attacker could reconstruct a clean version of that image without the embedded marker, and do so without any visible degradation in quality. The research won the challenge it was designed for. Studios, broadcasters, and anti-piracy vendors took notice. The question the paper raised — whether invisible watermarking offers genuine, durable protection or merely a temporary technical edge — has since become one of the most contested problems in digital media security.
To understand why the stakes are so high, it helps to understand what invisible watermarking actually accomplishes and why it exists alongside its more obvious cousin, visible watermarking. When you see a logo burned into the corner of a news clip or a broadcaster’s bug overlaid on a sporting event, you are looking at visible and invisible watermarking working in tandem: the visible element asserts ownership in plain sight, while the invisible component does something different and more forensically valuable. It embeds a hidden identifier — session-specific, user-specific, or distribution-channel-specific — deep inside the structure of the video or image file itself, in a way that survives compression, re-encoding, cropping, and format conversion. Visible marks can be cropped away or covered. The invisible forensic watermarking layer cannot be located or removed without knowing where it is and how it was put there — or at least, that was the assumption that held for most of the last decade.
Table of contents
How Invisible Watermarks Are Built to Last
The technical foundation of invisible watermarking techniques has shifted dramatically over the past ten years, moving from classical signal processing methods to deep learning architectures. In the earlier generation of systems, watermarks were embedded in the frequency domain of an image or video frame — regions of the signal where human perception is least sensitive — using mathematical transforms such as discrete wavelet transforms or discrete cosine transforms. These methods were robust against basic compression attacks and could survive JPEG re-encoding reasonably well, but they had a structural weakness: because the embedding logic followed fixed mathematical rules, anyone who understood those rules could design a targeted removal attack.
The shift to neural network-based invisible digital image watermarking changed the nature of the problem. Modern systems use encoder-decoder architectures, typically built around generative adversarial networks, where the encoder learns to distribute a hidden identifier across the full content of a frame in a way that minimizes perceptual impact while maximizing recoverability. The GAN’s discriminator component plays a crucial role here: it is trained to reject watermarked images that look different from clean ones, which forces the encoder to find embedding strategies that are genuinely invisible at the pixel level. The resulting watermarks are not located in a predictable region of the file. They are distributed across the content in patterns that depend on the specific model weights, which are kept secret.
The robustness of forensic watermarking video systems has become a measurable science. Researchers evaluate these systems using two primary metrics: PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index). A well-engineered system maintains PSNR values above 40 decibels and SSIM scores above 0.98, meaning the watermarked version is functionally indistinguishable from the original under both objective measurement and trained human review. Netflix, Disney+, and Amazon Prime Video all operate systems of this class for their pre-release screener workflows, where content moves through post-production houses, VFX vendors, and distribution partners before public release. The forensic watermark embedded at each handoff point can be extracted from any subsequent leak and traced to the specific session, device, or account responsible.
The Attack Surface: What Removal Actually Looks Like
The NeurIPS 2024 paper framed its attack around a concept called regeneration: feeding a watermarked image through a generative diffusion model that has learned the distribution of natural images. The model, during its denoising process, implicitly discards information that does not fit its learned prior of what images should look like. Since an invisible watermark consists of carefully engineered deviations from the natural image distribution — small but structured perturbations in pixel values — a sufficiently capable generative model can recognize these deviations as noise and remove them. The resulting image retains the visual content of the original but has the watermark’s signal attenuated to the point where standard detection methods fail.
This attack is adversarial in a precise technical sense: it does not require the attacker to know where the watermark is or how it was generated. It works by exploiting the same property that makes invisible watermarking possible in the first place — the gap between what humans can perceive and what computational analysis can detect. Earlier, cruder removal attempts using Gaussian blur, JPEG recompression, or noise injection would degrade image quality noticeably in the process of attacking the watermark. The diffusion-based regeneration attack preserves quality. That is what makes it a genuine threat rather than an inconvenient tradeoff.
Beyond outright removal, researchers have identified a second category of attack: watermark forgery. A 2025 paper introduced WForge, a system that inverts the removal process — studying the residual patterns left behind when a watermark is stripped from an image, then using those patterns to inject false watermarks into clean content. The implication is significant: if watermark forgery becomes reliable, the traceability that invisible watermarking provides could be weaponized. An attacker could theoretically embed another party’s watermark identifier into pirated content, making the leak appear to originate from a legitimate account.
How Watermark Developers Are Fighting Back
The response from the watermarking research and vendor community has not been to deny the validity of these attacks. It has been to design against them explicitly. The current generation of robust invisible watermarking systems incorporates adversarial training — the watermark encoder is trained not just against known compression and noise attacks, but against regeneration attacks from diffusion models. The system learns, in effect, to embed its identifier in regions of the content that generative models are least likely to modify during reconstruction. ROBIN, a watermarking framework developed in 2024, injects its marker into an intermediate state during the diffusion sampling process itself and uses adversarial optimization to find embedding strategies that survive the exact type of regeneration attack that NeurIPS researchers described.
The deeper technical response involves rethinking what a watermark is for. Early invisible watermarking techniques treated the watermark as a passive signal: embed it, detect it later. More recent approaches treat the watermark as an active adversary against removal systems. Content-binding methods tie the embedded identifier to a cryptographic hash of the content itself, so that any regeneration process that modifies the content — even imperceptibly — breaks the binding and renders the watermark unverifiable. This does not prevent removal, but it does change the consequences: a successfully regenerated image can no longer carry a fake watermark that traces back to a real account, because the binding check fails.
The Provenance Problem Beyond Entertainment
The watermarking arms race has acquired a dimension that extends well beyond film studios and broadcasting rights. The rise of AI-generated imagery has created a new category of invisible digital image watermarking deployment: marking synthetic content at the point of generation, so that deepfakes and AI-produced media can be identified as such even after distribution. Google’s SynthID system, deployed across Imagen and other Google generative tools, embeds watermarks directly during the generation process rather than as a post-production step. The C2PA (Coalition for Content Provenance and Authenticity) standard, backed by Adobe, Microsoft, and major news organizations, combines watermarking with cryptographically signed metadata to create a verifiable chain of custody for digital media from creation through distribution.
These applications raise the stakes of the removal problem in a different way. In the entertainment context, a successfully removed forensic watermark means a piracy trail goes cold. In the provenance context, a successfully removed AI-generation watermark means a synthetic image can circulate without any flag identifying it as machine-made. The regulatory dimension is already moving: the European Union’s AI Act requires that AI-generated content be labeled, and invisible watermarking is one of the primary technical mechanisms proposed for compliance. Whether those watermarks can survive adversarial removal attempts is no longer purely an academic question.
The Fundamental Tension That Cannot Be Resolved
Research from NeurIPS 2024 identified a mathematical constraint that sits underneath the entire field: there is an inherent tradeoff between the invisibility of a watermark and its resistance to removal. A watermark that is perfectly invisible — one that introduces zero perceptible difference in the image — is, by definition, one that a generative model can reconstruct without it. A watermark that survives all possible removal attacks must alter the image in some way that is detectable to a sufficiently sensitive system. The gap between what human eyes can perceive and what computational systems can detect is where watermarking lives, and that gap is narrowing as both generative models and perceptual quality metrics become more capable.
This does not mean the technology is futile. The practical question is not whether a sufficiently determined, well-resourced attacker with access to state-of-the-art diffusion models can defeat a given watermarking system. Given enough computational power, the answer is probably yes, eventually. The practical question is whether that attack is feasible for the operators running pirated streaming services or distributing leaked screeners — and for most of those operators, today’s robust invisible watermarking systems represent a deterrent that is simply too expensive and technically demanding to overcome at scale. The arms race continues because neither side can declare permanent victory, but it continues on terms that currently favor the defenders in the commercially relevant threat environment. Whether that balance holds as generative AI becomes cheaper and more widely accessible is the question the industry cannot yet answer.
The Race Has No Finish Line — Only a Moving Lead
What the current state of invisible watermarking research reveals is not a technology in crisis but a technology in honest confrontation with its own limits. The researchers who built the NeurIPS removal attack were not adversaries of the watermarking community — several of them work on watermarking defenses. The challenge they ran was designed to stress-test existing systems under worst-case conditions, so that the next generation of systems could be built to survive those conditions. That is how adversarial machine learning is supposed to function: not as a race between separate camps trying to defeat each other, but as a single research community probing the boundaries of what is possible and pushing those boundaries outward on both sides simultaneously. The studios, platforms, and broadcasters deploying these systems today are not operating under the illusion that their watermarks are unbreakable. They are betting that their watermarks are hard enough to break that the economics of piracy tip against the attempt — and so far, in most markets, that bet is holding.
Chief editor of Side-Line – which basically means I spend my days wading through a relentless flood of press releases from labels, artists, DJs, and zealous correspondents. My job? Strip out the promo nonsense, verify what’s actually real, and decide which stories make the cut and which get tossed into the digital void. Outside the news filter bubble, I’m all in for quality sushi and helping raise funds for Ukraine’s ongoing fight against the modern-day axis of evil. Besides music I’m also an SEO and AI content flow specialist and have an interest in everything finance from stocks to crypto. There is music in everything!
Since you’re here …
… we have a small favour to ask. More people are reading Side-Line Magazine than ever but advertising revenues across the media are falling fast. Unlike many news organisations, we haven’t put up a paywall – we want to keep our journalism as open as we can - and we refuse to add annoying advertising. So you can see why we need to ask for your help.
Side-Line’s independent journalism takes a lot of time, money and hard work to produce. But we do it because we want to push the artists we like and who are equally fighting to survive.
If everyone who reads our reporting, who likes it, helps fund it, our future would be much more secure. For as little as 5 US$, you can support Side-Line Magazine – and it only takes a minute. Thank you.
The donations are safely powered by Paypal.
