Question: What do we really know about large language model (LLM) security? And are we willingly opening the front door to chaos by using LLMs in business?

Rob Gurzeev, CEO, CyCognito: Picture it: Your engineering team is harnessing the immense capabilities of LLMs to “write code” and rapidly develop an application. It’s a game-changer for your businesses; development speeds are now orders of magnitude faster. You’ve shaved 30% off time-to-market. It’s win-win — for your org, your stakeholders, your end users.

Six months later, your application is reported to leak customer data; it has been jailbroken and its code manipulated. You’re now facing SEC violations and the threat of customers walking away.

Efficiency gains are enticing, but the risks cannot be ignored. While we have well-established standards for security in traditional software development, LLMs are black boxes that require rethinking how we bake in security.

New Kinds of Security Risks for LLMs

LLMs are rife with unknown risks and prone to attacks previously unseen in traditional software development.

  • Prompt injection attacks involve manipulating the model to generate unintended or harmful responses. Here, the attacker strategically formulates prompts to deceive the LLM, potentially bypassing security measures or ethical constraints put in place to ensure responsible use of the artificial intelligence (AI). As a result, the LLM’s responses can deviate significantly from the intended or expected behavior, posing serious risks to privacy, security, and the reliability of AI-driven applications.

  • Insecure output handling arises when the output generated by an LLM or similar AI system is accepted and incorporated into a software application or Web service without undergoing adequate scrutiny or validation. This can expose back-end systems to vulnerabilities, such as cross-site scripting (XSS), cross-site request forgery (CSRF), server-side request forgery (SSRF), privilege escalation, and remote code execution (RCE).

  • Training data poisoning occurs when the data used to train an LLM is deliberately manipulated or contaminated with malicious or biased information. The process of training data poisoning typically involves the injection of deceptive, misleading, or harmful data points into the training dataset. These manipulated data instances are strategically chosen to exploit vulnerabilities in the model’s learning algorithms or to instill biases that may lead to undesired outcomes in the model’s predictions and responses.

A Blueprint for Protection and Control of LLM Applications

While some of this is new territory, there are best practices you can implement to limit exposure.

  • Input sanitization involves, as the name suggestion, the sanitization of inputs to prevent unauthorized actions and data requests initiated by malicious prompts. The first step is input validation to ensure input adheres to expected formats and data types. The next is input sanitization, where potentially harmful characters or code are removed or encoded to thwart attacks. Other tactics include whitelists of approved content, blacklists of forbidden content, parameterized queries for database interactions, content security policies, regular expressions, logging, and continuous monitoring, as well as security updates and testing.

  • Output scrutiny is the rigorous handling and evaluation of the output generated by the LLM to mitigate vulnerabilities, like XSS, CSRF, and RCE. The process begins by validating and filtering the LLM’s responses before accepting them for presentation or further processing. It incorporates techniques such as content validation, output encoding, and output escaping, all of which aim to identify and neutralize potential security risks in the generated content.

  • Safeguarding training data is essential to prevent training data poisoning. This involves enforcing strict access controls, employing encryption for data protection, maintaining data backups and version control, implementing data validation and anonymization, establishing comprehensive logging and monitoring, conducting regular audits, and providing employee training on data security. It’s also important to verify the reliability of data sources and ensure secure storage and transmission practices.

  • Enforcing strict sandboxing policies and access controls can also help mitigate the risk of SSRF exploits in LLM operations. Techniques that can be applied here include sandbox isolation, access controls, whitelisting and/or blacklisting, request validation, network segmentation, content-type validation, and content inspection. Regular updates, comprehensive logging, and employee training are also key.

  • Continuous monitoring and content filtering can be integrated into the LLM’s processing pipeline to detect and prevent harmful or inappropriate content, using keyword-based filtering, contextual analysis, machine-learning models, and customizable filters. Ethical guidelines and human moderation play key roles in maintaining responsible content generation, while continuous real-time monitoring, user feedback loops, and transparency ensure that any deviations from desired behavior are promptly addressed.

Source: www.darkreading.com