← ramimac.me

Claude Code Skills Scanner Benchmark

8 Scanners, Top 5 Skills Each | Ground Truth: 3 Known Malicious Skills

Methodology

Selection: For each of 8 scanners, skills were ranked by severity score (critical×4 + high×3 + medium×2 + low×1) and the top 5 selected, yielding 40 total findings (35 unique skills due to overlap).
Investigation: Each skill was analyzed by an independent AI agent examining the SKILL.md content against the scanner findings to determine intent and legitimacy.
Classifications: True Positive genuinely malicious, Benign True Positive correctly detected patterns with legitimate purpose, False Positive incorrectly flagged benign content.
Limitations: All scanners were run in offline mode using static/pattern-based analysis only. Some scanners support LLM-based semantic analysis which was not enabled for this benchmark.

0
True Positives
31
Benign True Positives
9
False Positives
35
Unique Skills

Scanner Overlap Matrix

Skills appearing in multiple scanners' top 5 flagged lists

Skill aid agentvet agentguard aguara caterpillar cisco nova skill-sec Count
snail3d_clawd - - - - - 3
ahuserious_nautilus-trader - - - - - - 2
snail3d_clawforgod - - - - - - 2
snail3d_voice-devotional - - - - - - 2

Ground Truth Analysis

Performance against 3 known malicious skills from a coordinated trojan campaign (C2: 91.92.242.30)

Malicious Skill Attack Vector Detected Missed Detection Fidelity
seedamir/amir
Fake "Solana Wallet CLI"
glot.io paste + GitHub ZIP
curl|bash to C2
5/8 aid, cisco, nova HIGH agentvet (shell/network)
MED aguara (EXTDL, MCPCFG)
LOW caterpillar (generic exfil)
zaycv/polymarket-assistant
Fake "Trading Agent"
base64 curl|bash
password-protected ZIP
5/8 cisco, nova, caterpillar CRIT aid (base64 payload)
HIGH agentvet (obfuscation)
MED aguara (base64 patterns)
jordanprater/yahoofinance
Fake "Yahoo Finance CLI"
glot.io paste link
GitHub ZIP download
3/7 aid, cisco, skill-sec, caterpillar HIGH agentvet (URL patterns)
MED agentguard, aguara
Key Observations:
  • agentvet detected all 3 (100%) - best performer on ground truth
  • cisco-skill-scanner missed all 3 (0%) - likely needs semantic/LLM analysis for social engineering
  • nova-proximity missed all 3 (0%) - pattern rules didn't match this campaign's TTPs
  • aid only detected base64 payload (1/3) - invisible char focus missed paste-site attacks
  • Detection fidelity varies: agentvet flagged specific shell/network IOCs; others flagged generic patterns

Key Findings

aid Scanner - Top 5 Flagged Skills

Invisible Unicode / ASCII smuggling detection

elevateson_buildertrend_7fa5a0a2
Score: 48 | 42 findings
Benign True Positive
Found 7 instances of variation selector-16 (U+FE0F) modifying emoji characters. These are legitimately used for markdown formatting/documentation clarity, not malicious payload injection.
flyingtimes_crawl-from-x_5ad88144
Score: 46 | 30 findings
Benign True Positive
Found 1 instance of variation selector-16 modifying a warning sign emoji in Chinese documentation. Used for visual emphasis in legitimate technical documentation.
gmilton09_face-reading-cn_f259ed1b
Score: 20 | 24 findings
Benign True Positive
Found 7 instances of variation selector-16 modifying decorative emoji in Chinese markdown documentation for visual formatting and emphasis.
ahuserious_nautilus-trader_1a21f5e7
Score: 16 | 7 findings
False Positive
No invisible Unicode characters detected in SKILL.md file. Scanner may have crashed during initial scan. Manual analysis found zero variation selectors or zero-width characters.
r00tid_token-alert_5e56e5fc
Score: 14 | 14 findings
Benign True Positive
Found 3 instances of variation selector-16 modifying common emoji in documentation. These control emoji rendering in markdown for visual consistency.

agentvet Scanner - Top 5 Flagged Skills

YARA rules, credential detection, URL analysis

davidajohnston_everclaw-inference_security_skillguard
Score: 795 | 123 findings
Benign True Positive
Legitimate infrastructure tool for decentralized AI inference. High scanner flags reflect security-sensitive operations (private key management, blockchain transactions) that are expected and well-implemented: keys stored only in Keychain, never on disk.
dgriffin831_input-guard_972de338
Score: 723 | 118 findings
Benign True Positive
Defensive prompt injection detection tool. Scanner flags reflect the detection patterns it intentionally contains. This is a SECURITY TOOL - flags show it successfully identifies threat patterns.
lvcidpsyche_skill-bomb-dog-sniff_ce95eb9f
Score: 720 | 126 findings
Benign True Positive
Pre-installation malware scanner. Flags reflect detection categories for actual malware (reverse shells, crypto stealers, credential harvesters). This is a MALWARE DETECTOR, not malware itself.
24601_surrealdb_c288711b
Score: 700 | 251 findings
Benign True Positive
Legitimate database development tool. Flags for database credentials and network calls are EXPECTED. High finding count reflects comprehensive documentation with examples.
yoder-bawt_yoder-skill-auditor_1c8f955c
Score: 688 | 120 findings
Benign True Positive
Security auditing tool with 18 safety checks. High flags expected - contains comprehensive threat knowledge to detect malicious skills. Test suite includes 8 malicious + 4 clean samples.

agentguard Scanner - Top 5 Flagged Skills

Security rules, risk scoring

kraken-1.1.0_d1df93ba
Score: 1291 | 378 findings
False Positive
Legitimate cryptocurrency portfolio management skill for the Kraken exchange. High risk score due to financial keywords and API credential handling, which are normal for exchange integrations.
snail3d_clawd_3e50598a
Score: 896 | 300 findings
Benign True Positive
ClawCamera - multi-camera surveillance system with legitimate use cases. Risk patterns detected (subprocess, file I/O, network calls) are justified by its intended surveillance function.
snail3d_clawforgod_719e771d
Score: 896 | 300 findings
Benign True Positive
Variant of clawd with AI agent personality features. Contains identical surveillance components. Detected risk patterns are legitimate given stated purpose.
snail3d_voice-devotional_70dfa874
Score: 896 | 300 findings
Benign True Positive
Another clawd variant focused on voice/devotional features. Same legitimate surveillance infrastructure with properly disclosed capabilities.
koatora20_guard-scanner_4894beba
Score: 693 | 204 findings
Benign True Positive
Legitimate security analysis tool designed to scan AI agent skills for 35+ threat categories. The risk patterns flagged are expected behavior for a security scanner - this IS genuine security tooling.

aguara Scanner - Top 5 Flagged Skills

YARA-based pattern detection, 177 rules

ahuserious_nautilus-trader_1a21f5e7
Score: 1227 | 623 findings
Benign True Positive
Legitimate algorithmic trading platform for NautilusTrader. Contains financial transaction code, environment variable handling (private keys), and SDK patches for production trading - all expected for a trading bot.
paolorollo_openclaw-sec_2c4258f1
Score: 455 | 347 findings
Benign True Positive
AI security validation suite with 6 detection modules (prompt injection, command injection, URL validation, path traversal, secrets detection). High pattern match count reflects comprehensive security detection.
snail3d_clawd_3e50598a
Score: 439 | 676 findings
Benign True Positive
Composite skill container. High pattern match count reflects shared utility code, message bus implementations, and event handling libraries across multiple related skills.
snail3d_clawforgod_719e771d
Score: 439 | 676 findings
Benign True Positive
Companion/derived skill with identical infrastructure code. Pattern matches reflect shared libraries rather than malicious code.
snail3d_voice-devotional_70dfa874
Score: 439 | 676 findings
Benign True Positive
Voice-based devotional/assistant application. High pattern match count reflects shared infrastructure code common across the snail3d skill ecosystem.

caterpillar Scanner - Top 5 Flagged Skills

Credential theft, data exfil, persistence, obfuscation detection

goodman333_skill-safeguard_ace51f7c
Score: 41 | 13 findings
Benign True Positive
Security scanner skill that teaches threat detection methodology. Contains descriptions of malicious behaviors (base64 encoding, eval, curl piping) but only in the context of detection examples.
zjzac_safe-skill_09af5071
Score: 37 | 12 findings
Benign True Positive
Programmatic security scanner with Python scanner implementation. Contains security detection patterns and descriptions of threats as examples of what to detect.
mohibshaikh_clawvet_37b65d14
Score: 34 | 11 findings
Benign True Positive
Security scanner for OpenClaw skills (npm package) with 54 pattern rules. Contains threat descriptions for detection purposes only.
silentcool_crusty-security_5e0632d8
Score: 34 | 11 findings
Benign True Positive
Security scanning skill with ClamAV integration. Threat detection framework documenting threat indicators for protective purposes.
starbuck100_agentaudit_004cceaa
Score: 33 | 11 findings
Benign True Positive
Security gate for package installation. Contains security patterns documented for preventing malicious installations. The gate 'never installs or executes' packages, only checks them.

cisco-skill-scanner - Top 5 Flagged Skills

Static analysis, bytecode analysis, policy violations

autogame-17_evolver_7c836167
Score: 665 | 207 findings
Benign True Positive
Legitimate meta-skill for AI agent self-improvement that intentionally uses shell execution, git operations, and code generation. Wrapped in safety controls (policy checks, blast radius analysis, rollback strategies).
autogame-17_capability-evolver_11b68c56
Score: 665 | 207 findings
Benign True Positive
Official Evolver meta-skill maintained by autogame-17. Contains intentional high-risk patterns for self-improvement with comprehensive safety mitigations.
sonyrw_workspace-main_1b426239
Score: 648 | 203 findings
Benign True Positive
Fork/variant of the Evolver system. Contains similar self-modifying capabilities with intentional shell access and code evolution features with safety guardrails.
wuzimaki_evolver-repo_fc45c522
Score: 555 | 171 findings
Benign True Positive
Evolver fork/reimplementation. Shares the core self-evolution architecture with controlled execution environments and policy constraints.
muguozi1_evolver-1-17-1_1f832c8b
Score: 550 | 175 findings
Benign True Positive
Evolver variant (version 1.17.1). Legitimate self-improving system with intentional dangerous operations within controlled safety boundaries.

nova-proximity Scanner - Top 5 Flagged Skills

Pattern-based, manifest validation, security flags

u45362_claw-audit_d1395f9e
Score: 768 | 256 findings
Benign True Positive
ClawAudit is a legitimate security auditing tool that scans skills for malicious patterns. High flag count expected because it contains pattern matching rules that intentionally look for dangerous code signatures.
jarb02_lobsterguard_8f349b6f
Score: 517 | 178 findings
Benign True Positive
LobsterGuard is a bilingual security auditor (68 checks across 6 categories). High finding count is legitimate for a comprehensive security scanner with multi-layer threat detection.
fcavalcantirj_proactive-amcp_a9f5dfc3
Score: 424 | 147 findings
Benign True Positive
Agent Memory Continuity Protocol - sophisticated but legitimate persistence and resurrection system. Findings due to cryptographic operations (Ed25519, X25519), IPFS interaction with proper security measures.
lmtlssss_caduceusmail_1b88f262
Score: 348 | 119 findings
Benign True Positive
Enterprise email alias/domain control tool integrating Microsoft 365 and Cloudflare DNS. Findings reflect legitimate credential handling and API authentication with proper security measures.
lmtlssss_mail-caduceus-v1_008d79e2
Score: 283 | 96 findings
Benign True Positive
Similar enterprise email control plane skill managing mailbox + domain via Microsoft 365 and Cloudflare. Uses strict credential autodiscovery with proper security validation.

skill-security-scan - Top 5 Flagged Skills

Risk scoring with severity levels

stevengonsalvez_agent-reflect_56510739
Score: 12946 | 3255 findings
False Positive
Legitimate self-improvement automation using standard file I/O and text analysis. High score triggered by extensive use of Read, Write, Edit, Grep, Glob, Bash tools which are appropriately declared. Includes human-in-the-loop safeguards.
stevengonsalvez_self-reflect_4b2c45d4
Score: 12946 | 3255 findings
False Positive
Same skill as above (identical content). High score stems from metadata and allowed tools declarations necessary for legitimate reflection/learning mechanism. Includes proper safety guardrails.
ivangdavila_api_44d75c99
Score: 7732 | 1933 findings
Benign True Positive
FastAPI skill containing legitimate API development patterns and best practices documentation. Flagged due to mentions of child_process, environment variables, network calls - all appropriate for API development.
brabaflow_openclaw-agent-skill_4a7dd7b1
Score: 7452 | 1876 findings
Benign True Positive
Comprehensive official documentation covering gateway configuration, authentication, deployment, CLI commands. High score from coverage of security-sensitive topics in documentation context.
snail3d_clawd_3e50598a
Score: 6706 | 2127 findings
False Positive
Actual SKILL.md file is empty (0 bytes) - this is a collection/aggregation folder. High count represents cumulative analysis of sub-skills, not a single skill with malicious patterns.