Microsoft announced several new capabilities in Azure AI Studio that the company says should help developers build generative AI apps that are more reliable and resilient against malicious model manipulation and other emerging threats.

In a March 29 blog post, Microsoft’s chief product officer of responsible AI, Sarah Bird, pointed to growing concerns about threat actors using prompt injection attacks to get AI systems to behave in dangerous and unexpected ways as the primary driving factor for the new tools.

“Organizations are also concerned about quality and reliability,” Bird said. “They want to ensure that their AI systems are not generating errors or adding information that isn’t substantiated in the application’s data sources, which can erode user trust.”

Azure AI Studio is a hosted platform that organizations can use to build custom AI assistants, copilots, bots, search tools and other applications, grounded in their own data. Announced in November 2023, the platform hosts Microsoft’s machine learning models and also models from several other sources including OpenAI. Meta, Hugging Face and Nvidia. It allows developers to quickly integrate multi-modal capabilities and responsible AI features into their models.

Other major players such as Amazon and Google have rushed to market with similar offerings over the past year to tap into the surging interest in AI technologies worldwide. A recent IBM-commissioned study found that 42% of organizations with more than 1,000 employees are already actively using AI in some fashion with many of them planning to increase and accelerate investments in the technology over the next few years. And not all of them were telling IT beforehand about their AI usage.

Protecting Against Prompt Engineering

The five new capabilities that Microsoft has added—or will soon add—to Azure AI Studio are: Prompt Shields; groundedness detection; safety system messages; safety evaluations; and risk and safety monitoring.  The features are designed to address some significant challenges that researchers have uncovered recently—and continue to uncover on a routine basis—with regard to the use of large language models and generative AI tools.

Prompt Shields for instance is Microsoft’s mitigation for what are known as indirect prompt attacks and jailbreaks. The feature builds on existing mitigations in Azure AI Studio against jailbreak risk. In prompt engineering attacks, adversaries use prompts that appear innocuous and not overtly harmful to try and steer an AI model into generating harmful and undesirable responses. Prompt engineering is among the most dangerous in a growing class of attacks that try and jailbreak AI models or get them to behave in a manner that is inconsistent with any filters and constraints that the developers might have built into them.  

Researchers have recently shown how adversaries can engage in prompt engineering attacks to get generative AI models to spill their training data, to spew out personal information, generate misinformation and potentially harmful content, such as instructions on how to hotwire a car.

With Prompt Shields developers can integrate capabilities into their models that help distinguish between valid and potentially untrustworthy system inputs; set delimiters to help mark the beginning and end of input text and using data marking to mark input texts. Prompt Shields is currently available in preview mode in Azure AI Content Safety and will become generally available soon, according to Microsoft.

Mitigations for Model Hallucinations and Harmful Content

With groundedness detection, meanwhile, Microsoft has added a feature to Azure AI Studio that it says can help developers reduce the risk of their AI models “hallucinating”. Model hallucination is a tendency by AI models to generate results that appear plausible but are completely made up and not based—or grounded—on the training data. LLM hallucinations can be hugely problematic if an organization were to take the output as factual and act upon it in some way. In a software development environment for instance, LLM hallucinations could result in developers potentially introducing vulnerable code into their applications.

Azure AI Studio’s new groundedness detection capability is basically about helping detect—more reliably and at greater scale—potentially ungrounded generative AI outputs.  The goal is to give developers a way to test their AI models against what Microsoft calls groundedness metrics, before deploying the model into product. The feature also highlights potentially ungrounded statements in LLM outputs, so users know to fact check the output before using it. Groundedness detection is not available yet, but should be available in the near future, according to Microsoft.

The new system message framework offers a way to developers to clearly define their model’s capabilities, it’s profile and limitations in their specific environment. Developers can use the capability to define the format of the output and provide examples of intended behavior, so it becomes easier for users to detect deviations from intended behavior. It’s another new feature that isn’t available yet but should be soon.

Azure AI Studio’s newly announced safety evaluations capability and its risk and safety monitoring feature are both currently available in preview status. Organizations can use the former to assess the vulnerability of their LLM model to jailbreak attacks and generating unexpected content. The risk and safety monitoring capability allows developers to detect model inputs that are problematic and likely to trigger hallucinated or unexpected content, so they can implement mitigations against it.

“Generative AI can be a force multiplier for every department, company, and industry,” Microsoft’s Bird said. “At the same time, foundation models introduce new challenges for security and safety that require novel mitigations and continuous learning.”

Source: www.darkreading.com