Using LLMs for Security Advisory Investigations How Far Are We?

Artificial Intelligence GAIadmin July 16, 2025 0 Comments

Using LLMs for Security Advisory Investigations How Far Are We?

Assessing the Role of Large Language Models in Security Advisory Investigations: Current Capabilities and Limitations

In the rapidly evolving landscape of cybersecurity and artificial intelligence, recent research sheds light on the potential and pitfalls of leveraging Large Language Models (LLMs) such as ChatGPT for security-related tasks. A notable study titled “Using LLMs for Security Advisory Investigations: How Far Are We?” authored by Bayu Fedra Abdullah, Yusuf Sulistyo Nugroho, Brittany Reid, Raula Gaikovina Kula, Kazumasa Shimari, and Kenichi Matsumoto, offers critical insights into this pressing topic.

Key Findings from the Research

Impressive Plausibility, Still Flawed Discrimination: The study found that ChatGPT can generate security advisories that appear highly credible, correctly mimicking real Common Vulnerabilities and Exposures (CVE) identifiers in 96% of cases involving authentic IDs and 97% for fabricated ones. This high rate of plausibility underscores an alarming difficulty for the model in differentiating authentic vulnerabilities from made-up entries.
Challenges in Validation: When tasked with verifying whether certain CVE-IDs were genuine, ChatGPT misclassified some fake entries as real in 6% of instances. This indicates that without external validation, relying solely on LLM outputs could inadvertently introduce inaccuracies into security workflows.
Quality and Accuracy of Generated Advisories: Further analysis revealed that a significant majority—approximately 95%—of advisories produced by ChatGPT significantly diverged from the original descriptions. Such discrepancies highlight a tendency of the model to generate information that may be misleading, raising concerns about its reliability for critical security reporting.
Implications for Automation in Cybersecurity: While automating the creation of security advisories could streamline cybersecurity operations, the inability of LLMs to reliably verify the authenticity of CVE-IDs presents a substantial risk of errors, which could have serious consequences in vulnerability management.
Emphasizing Human Oversight: The researchers stress that, given current limitations, human experts must remain actively involved in cybersecurity investigations involving AI-generated content. Ongoing efforts are essential to enhance the accuracy and dependability of these models before they can be fully trusted for security-critical applications.

Conclusion

The investigation offers a balanced perspective: while LLMs like ChatGPT show promise in generating plausible security advisories, significant challenges remain in ensuring their outputs are accurate and verifiable. Caution and human