ICML 2026

Scam2Prompt

A continuous pipeline for constructing developer-style benchmarks that elicit LLM-generated code containing scam URLs.

Starting from real scam websites, Scam2Prompt extracts scam-related intent and sensitive keywords, transforms them into ordinary developer-style prompts, and uses the resulting benchmark to evaluate whether LLMs generate code that points to malicious infrastructure. Large-scale experiments show that this behavior remains a non-negligible risk in recent production LLMs, while the benchmark can be rebuilt as new scam sites are discovered.

1 Scam sites provide seed intent

Seed pages expose scam-associated themes involving crypto, trading, copyright, and celebrity coins.

2 Continuous benchmark construction

The pipeline converts these signals into developer-style prompts and supports benchmark regeneration as new scams appear.

3 Production LLMs are evaluated

Innoc2Scam-bench measures whether recent production LLMs generate code containing malicious scam URLs.

4 New scam URLs are verified

The generated URLs are not merely copied from seeds; 62 were verified and added by MetaMask maintainers.

Problem

A real-world observation revealed a broader LLM safety risk

The motivating observation is concrete: scam websites contain phrases, product names, and monetization themes that are highly associated with malicious campaigns. When those signals are transformed into ordinary programming tasks, production LLMs may generate code that embeds malicious URLs.

The generated URL does not need to be the original seed URL from the scam database. A seed page can instead serve as a keyword and intent source: topics such as Trump coin, Bitcoin, wash trading, laundry, stock trading, or copyright can trigger LLMs to produce different scam URLs inside code. Scam2Prompt measures that behavior at scale.

Read the preprint

This paper is available as a preprint. You can view the attached PDF directly from this website or use the arXiv and OpenReview links for external records.

Framework

Continuously turning newly found scam sites into benchmark prompts

Scam2Prompt is an automated benchmark construction pipeline. When new scam websites are discovered, the system extracts scam-related intent, generates developer-style prompts, queries LLMs, and validates whether generated code contains malicious URLs.

Overview of the Scam2Prompt audit workflow
Known scam sites serve as seeds for extracting sensitive themes and intent, not merely as URLs to be copied.
Innoc2Scam-bench construction workflow
The same pipeline supports continuous benchmark construction as new scam sites and campaigns appear.

Pipeline

A continuous pipeline from scam-site seeds to developer-style benchmark prompts

The framework is repeatable: as new scam URLs are added to public blocklists, Scam2Prompt crawls accessible pages, synthesizes programming prompts, queries code-generation LLMs, extracts generated endpoints, and validates the resulting prompt-code pairs.

1Collect scam-site seeds

Scam2Prompt starts from established malicious URL databases, including MetaMask's eth-phishing-detect and PhishFort lists. In the paper, these sources contain hundreds of thousands of URLs; the pipeline filters them to pages that are still accessible and serve static content.

2Safely extract visible content

The crawler minimizes exposure to malicious pages by using lightweight HEAD requests, strict timeouts, URL validation, and text-only GET requests. It rejects binary content, strips CSS and JavaScript, and keeps the visible text that captures the scam site's topic, product framing, and keywords.

3Synthesize developer-style prompts

A prompt-generation LLM receives the cleaned page text and creates concise programming tasks involving code generation, APIs, libraries, or automation. The prompts preserve page-specific terms so they can stress-test whether an LLM associates those themes with malicious infrastructure.

4Generate code and extract URLs

A code-generation LLM answers each synthesized prompt. Scam2Prompt then extracts every endpoint embedded in the generated code, producing candidate prompt-code pairs that can be checked for malicious URLs.

5Validate generated endpoints

The URL oracle combines independent detectors, including ChainPatrol, Google Safe Browsing, and SecLookup. A generated URL is treated as malicious if any detector flags it. URLs not already present in the seed databases can be reported back to maintainers.

6Curate the benchmark

The final benchmark keeps prompts that are developer-style code-generation requests, not jailbreaks or non-coding questions. Human validation filters out summarization and manual workflow prompts, yielding 1,377 reusable prompts for testing newer LLMs.

Main findings

Production LLMs generate scam URLs at reproducible rates

Large-scale evidence

Across more than 265,000 generated programs, Scam2Prompt observed malicious URL generation in every tested model combination.

Average rate: 4.24%

Continuous benchmark

Innoc2Scam-bench packages scam-seeded developer prompts so newer production LLMs can be evaluated as models change.

1,377 prompts

Real-world harm

Generated code surfaced scam URLs that were absent from the seed databases and were later verified by maintainers.

62 verified additions

Innoc2Scam-bench

Seven 2025 production LLMs evaluated with Innoc2Scam-bench

The current benchmark contains 342 category-1 prompts that explicitly mention a scam URL or domain, and 1,035 category-2 prompts that do not. These prompts test whether scam-site-derived intent can elicit malicious URL generation from recent models.

Model Generated Filtered Malicious Rate
Tier 1 Lowest rate
Gemini 2.5 Pro79955317812.9%
Tier 2 Second-lowest rate
GPT-51,2272430322.0%
Tier 3 Middle rate
Claude Sonnet 41,24811547234.3%
Tier 4 Higher-risk cluster
Grok Code Fast 11,3551859743.4%
Gemini 2.5 Flash1,351161244.4%
Qwen3 Coder1,367362845.6%
DeepSeek Chat v3.11,3581265147.3%

Guardrails, filtering, and rankings

Content filters change the apparent safety ranking

The table above is filter-aware: outputs blocked by a model's internal safety filter or guardrail are counted as non-malicious. This matters because Gemini 2.5 Pro filtered 553 of 1,377 prompts, while GPT-5 filtered only 24. In the appendix, we also remove this filter effect by comparing only the 637 prompts completed by all seven models.

We report 95% Wald confidence intervals and paired McNemar tests. The result is best interpreted as statistical tiers rather than a single universal ordering.

Filter-aware

All 1,377 prompts

  1. Lowest malicious-generation rate: Gemini 2.5 Pro
  2. Second-lowest rate: GPT-5
  3. Middle rate: Claude Sonnet 4
  4. Higher-risk cluster: Grok Code Fast 1, Gemini 2.5 Flash, Qwen3 Coder, DeepSeek Chat v3.1
Filter removed

637 shared completions

  1. Statistical tie: Gemini 2.5 Pro and GPT-5
  2. Middle: Claude Sonnet 4
  3. Lower cluster: remaining models mostly overlap under McNemar tests

Download the benchmark

The dataset is released on Hugging Face and mirrored on GitHub for reproducibility.

Practical impact

LLM-generated code surfaced new scam URLs absent from the seed databases

Scam2Prompt does not only reproduce known scam infrastructure. Seed websites can act as sources of scam-related intent, after which an LLM may generate different malicious URLs inside code. We reported newly surfaced URLs to MetaMask's eth-phishing-detect maintainers; they verified and added 62 entries to the scam database.

62

verified additions

From generated code to live blocklist protection

Each report below links to a public issue in the MetaMask scam database. The number on each chip is the count of accepted entries associated with that report, demonstrating that benchmark-triggered generations can reveal actionable scam infrastructure.

Artifact

Reproduce the audit pipeline

The artifact contains the end-to-end pipeline: scam database inputs, webpage crawling and caching, automated prompt generation from scam-site intent, code generation, malicious URL oracles, evaluation of newer LLMs, and reporting.

Resources

Links