Microsoft’s AI red team is excited to share our whitepaper, “Lessons from Red Teaming 100 Generative AI Products.”
The AI red team was formed in 2018 to address the growing landscape of AI safety and security risks. Since then, we have expanded the scope and scale of our work significantly. We are one of the first red teams in the industry to cover both security and responsible AI, and red teaming has become a key part of Microsoft’s approach to generative AI product development. Red teaming is the first step in identifying potential harms and is followed by important initiatives at the company to measure, manage, and govern AI risk for our customers. Last year, we also announced PyRIT (The Python Risk Identification Tool for generative AI), an open-source toolkit to help researchers identify vulnerabilities in their own AI systems.
With a focus on our expanded mission, we have now red-teamed more than 100 generative AI products. The whitepaper we are now releasing provides more detail about our approach to AI red teaming and includes the following highlights:
Discover more about our approach to AI red teaming.
Over the years, the AI red team has tackled a wide assortment of scenarios that other organizations have likely encountered as well. We focus on vulnerabilities most likely to cause harm in the real world, and our whitepaper shares case studies from our operations that highlight how we have done this in four scenarios including security, responsible AI, dangerous capabilities (such as a model’s ability to generate hazardous content), and psychosocial harms. As a result, we are able to recognize a variety of potential cyberthreats and adapt quickly when confronting new ones.
This mission has given our red team a breadth of experiences to skillfully tackle risks regardless of:
AI red teaming is a practice for probing the safety and security of generative AI systems. Put simply, we “break” the technology so that others can build it back stronger. Years of red teaming have given us invaluable insight into the most effective strategies. In reflecting on the eight lessons discussed in the whitepaper, we can distill three top takeaways that business leaders should know.
The integration of generative AI models into modern applications has introduced novel cyberattack vectors. However, many discussions around AI security overlook existing vulnerabilities. AI red teams should pay attention to cyberattack vectors both old and new.
Red team tip: AI red teams should be attuned to new cyberattack vectors while remaining vigilant for existing security risks. AI security best practices should include basic cyber hygiene.
While automation tools are useful for creating prompts, orchestrating cyberattacks, and scoring responses, red teaming can’t be automated entirely. AI red teaming relies heavily on human expertise.
Humans are important for several reasons, including:
Red team tip: Adopt tools like PyRIT to scale up operations but keep humans in the red teaming loop for the greatest success at identifying impactful AI safety and security vulnerabilities.
Numerous mitigations have been developed to address the safety and security risks posed by AI systems. However, it is important to remember that mitigations do not eliminate risk entirely. Ultimately, AI red teaming is a continuous process that should adapt to the rapidly evolving risk landscape and aim to raise the cost of successfully attacking a system as much as possible.
Red team tip: Continually update your practices to account for novel harms, use break-fix cycles to make AI systems as safe and secure as possible, and invest in robust measurement and mitigation techniques.
The “Lessons From Red Teaming 100 Generative AI Products” whitepaper includes our AI red team ontology, additional lessons learned, and five case studies from our operations. We hope you will find the paper and the ontology useful in organizing your own AI red teaming exercises and developing further case studies by taking advantage of PyRIT, our open-source automation framework.
Together, the cybersecurity community can refine its approaches and share best practices to effectively address the challenges ahead. Download our red teaming whitepaper to read more about what we’ve learned. As we progress along our own continuous learning journey, we would welcome your feedback and hearing about your own AI red teaming experiences.
To learn more about Microsoft Security solutions, visit our website. Bookmark the Security blog to keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity) for the latest news and updates on cybersecurity.
¹ Phi-3 Safety Post-Training: Aligning Language Models with a “Break-Fix” Cycle
The post 3 takeaways from red teaming 100 generative AI products appeared first on Microsoft Security Blog.
Source: Microsoft Security