Anthropic Accuses Chinese AI Labs of Industrial-Scale Distillation Attacks

Anthropic Reveals Distillation Attacks by Chinese AI Labs

Anthropic has publicly accused three Chinese AI labs — DeepSeek, Moonshot AI, and MiniMax — of conducting industrial-scale campaigns to extract Claude’s capabilities through distillation attacks. According to the company, over 24,000 fraudulent accounts generated more than 16 million interactions with the model, violating terms of service and regional access restrictions.

How Distillation Attacks Work

Distillation is a training technique where a less capable model is trained on the outputs of a stronger model. While widely and legitimately used by AI companies themselves (to create smaller, cheaper versions of their models), the technique can also be used illicitly: competitors can use it to acquire powerful capabilities from other labs in a fraction of the time and cost it would take to develop them independently.

The Three Labs and Their Campaigns

DeepSeek: 150,000 Exchanges

DeepSeek’s operation focused on:

  • Reasoning capabilities across diverse tasks
  • Rubric-based grading tasks to transform Claude into a reward model for reinforcement learning
  • Creating censorship-safe alternatives to policy-sensitive queries

DeepSeek generated synchronized traffic across accounts, with identical patterns, shared payment methods, and coordinated timing suggesting “load balancing” to increase throughput and avoid detection.

Moonshot AI: 3.4 Million Exchanges

Moonshot’s (Kimi models) operation focused on:

  • Agentic reasoning and tool use
  • Coding and data analysis
  • Computer-use agent development
  • Computer vision

The company employed hundreds of fraudulent accounts spanning multiple access pathways, making the campaign harder to detect as a coordinated operation.

MiniMax: 13 Million Exchanges

MiniMax’s operation focused on:

  • Agentic coding
  • Tool use and orchestration

Anthropic detected this campaign while it was still active — before MiniMax launched the model it was training — giving unprecedented visibility into the life cycle of distillation attacks. When Anthropic launched a new model during MiniMax’s active campaign, the company redirected nearly half its traffic within 24 hours to capture capabilities from the new system.

How Distillers Access Frontier Models

For national security reasons, Anthropic does not currently offer commercial access to Claude in China or to Chinese company subsidiaries located outside the country. To circumvent this, labs use commercial proxy services that resell access to Claude and other frontier AI models at scale.

These services operate what Anthropic calls “hydra cluster architectures”: sprawling networks of fraudulent accounts that distribute traffic across Anthropic’s API and third-party cloud platforms. The breadth of these networks means there are no single points of failure. When one account is banned, another takes its place. In one case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously.

National Security Implications

Anthropic warns that illicitly distilled models lack necessary safeguards, creating significant national security risks. The company and other US companies build systems that prevent state and non-state actors from using AI to, for example, develop bioweapons or carry out malicious cyber activities.

Models built through illicit distillation are unlikely to retain those safeguards, meaning that dangerous capabilities can proliferate with many protections stripped out entirely.

Foreign labs that distill American models can then feed those unprotected capabilities into military, intelligence, and surveillance systems — enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance.

Connection to Export Controls

Anthropic argues that distillation attacks undermine export controls by allowing foreign labs, including those under the control of the Chinese Communist Party, to close the competitive advantage that export controls are designed to preserve through other means.

Without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective and able to be circumvented by innovation. In reality, these advancements depend in significant part on capabilities extracted from American models, and executing this extraction at scale requires access to advanced chips.

“Distillation attacks therefore reinforce the rationale for export controls: restricted chip access limits both direct model training and the scale of illicit distillation,” states Anthropic.

Anthropic’s Response

Anthropic continues to invest heavily in defenses that make such distillation attacks harder to execute and easier to identify, including:

  • Detection: Classifiers and behavioral fingerprinting systems to identify distillation attack patterns in API traffic
  • Intelligence sharing: Sharing technical indicators with other AI labs, cloud providers, and relevant authorities
  • Access controls: Strengthening verification for educational accounts, security research programs, and startup organizations
  • Countermeasures: Developing product, API, and model-level safeguards designed to reduce the efficacy of model outputs for illicit distillation

Anthropic emphasizes that no single company can solve this alone and that distillation attacks at this scale require a coordinated response across the AI industry, cloud providers, and policymakers.

Sources


This post was generated by AI using GLM-4.7

Translations: