The U.S. National Institute of Standards and Technology has recently released a 69 page report concerning DeepSeek and the risks associated with utilizing it. The Press Release states:
WASHINGTON — The Center for AI Standards and Innovation
(CAISI) at the Department of Commerce’s National Institute of Standards and
Technology (NIST) evaluated AI models from the People’s Republic of China (PRC)
developer DeepSeek and found they lag behind U.S. models in performance, cost,
security and adoption.
“Thanks to President Trump’s AI Action Plan, the Department
of Commerce and NIST’s Center for AI Standards and Innovation have released a
groundbreaking evaluation of American vs. adversary AI,” said Secretary of
Commerce Howard Lutnick. “The report is clear that American AI dominates, with
DeepSeek trailing far behind. This weakness isn’t just technical. It shows why
relying on foreign AI is dangerous and shortsighted. By setting the standards,
driving innovation, and keeping America secure, the Department of Commerce will
ensure continued U.S. leadership in AI.”
The CAISI evaluation also notes that the DeepSeek models’
shortcomings related to security and censorship of model responses may pose a
risk to application developers, consumers and U.S. national security. Despite
these risks, DeepSeek is a leading developer and has contributed to a rapid
increase in the global use of models from the PRC.
CAISI’s experts evaluated three DeepSeek models (R1, R1-0528
and V3.1) and four U.S. models (OpenAI’s GPT-5, GPT-5-mini and gpt-oss and
Anthropic’s Opus 4) across 19 benchmarks spanning a range of domains. These
evaluations include state-of-the-art public benchmarks as well as private
benchmarks built by CAISI in partnership with academic institutions and other
federal agencies.
The evaluation from CAISI responds
to President Donald Trump’s America’s
AI Action Plan, which directs CAISI to conduct research and publish
evaluations of frontier models from the PRC. CAISI is also tasked with
assessing: the capabilities of U.S. and adversary AI systems; the adoption of
foreign AI systems; the state of international AI competition; and potential
security vulnerabilities and malign foreign influence arising from the use of
adversaries’ AI systems.
CAISI serves as industry’s primary point of contact
within the U.S. government to facilitate testing, collaborative research, and
best practice development related to commercial AI systems, and is a key
element in NIST’s efforts to secure and advance American leadership in AI.
Key Findings
DeepSeek performance lags behind the best U.S. reference
models.
The best U.S. model outperforms the best DeepSeek model (DeepSeek V3.1) across
almost every benchmark. The gap is largest for software engineering and cyber
tasks, where the best U.S. model evaluated solves over 20% more tasks than the
best DeepSeek model.
DeepSeek models cost more to use than comparable U.S.
models.
One U.S. reference model costs 35% less on average than the best DeepSeek model
to perform at a similar level across all 13 performance benchmarks tested.
DeepSeek models are far more susceptible to agent
hijacking attacks than frontier U.S. models.
Agents based on DeepSeek’s most secure model (R1-0528) were, on average, 12
times more likely than evaluated U.S. frontier models to follow malicious
instructions designed to derail them from user tasks. Hijacked agents sent
phishing emails, downloaded and ran malware, and exfiltrated user login
credentials, all in a simulated environment.
DeepSeek models are far more susceptible to jailbreaking
attacks than U.S. models.
DeepSeek’s most secure model (R1-0528) responded to 94% of overtly malicious
requests when a common jailbreaking technique was used, compared with 8% of
requests for U.S. reference models.
DeepSeek models advance Chinese Communist Party (CCP)
narratives.
DeepSeek models echoed four times as many inaccurate and misleading CCP
narratives as U.S. reference models did.
Adoption of PRC models has greatly increased since
DeepSeek R1 was released.
The release of DeepSeek R1 has driven adoption of PRC models across the AI
ecosystem. Downloads of DeepSeek models on model-sharing platforms have
increased nearly 1,000% since January 2025.