Armilla Review - The Regulatory Landscape: EU's Landmark AI Act, RAISE Benchmarks, and Ethical Advancements in Generative AI with Purple Llama

The Armilla Review is a weekly digest of important news from the AI industry, the market, government and academia tailored to the interests of our community: regarding AI evaluation, assurance, and risk.
December 13, 2023
5 min read

Last week’s EU AI Act trilogue agreement represents a global milestone for AI governance: the EU will be the first country to pass comprehensive regulations for the design, development and deployment of AI, and, accordingly, has the potential to play a disproportionate role in setting global rules and norms for Responsible AI, as we expect to see a wave of new rules taking inspiration from EU lawmakers. These rules will have the greatest impact on providers of high-risk AI systems, general purpose AI and foundation models, all of whom face extensive requirements, ranging from conformity assessments, to model evaluations and transparency requirements. Companies must be ready to comply or risk significant penalties, up to 7% global annual turnover.

EU Sets Stricter Rules with Landmark AI Act

EU lawmakers have finalized a significant piece of legislation, the A.I. Act, marking one of the world's initial comprehensive efforts to regulate artificial intelligence's multifaceted impact. The law focuses on limiting risky AI applications by introducing stringent regulations for companies utilizing AI, including transparency requirements for major AI systems such as those driving ChatGPT. The act restricts the usage of facial recognition by law enforcement, mandates disclosure of AI system functioning, and imposes substantial fines for non-compliance. Despite being hailed as a milestone, concerns persist regarding the law's enforceability, its effectiveness given the evolving AI landscape, and potential impact on innovation in the EU.

Source: Reuters

Navigating the Assurance Chasm: Assessing AI Governance Challenges

This article by members of the Schwartz Reisman Institute for Technology and Society and Armilla AI discusses the evolving landscape of artificial intelligence and the growing need for effective assurance mechanisms to manage its risks and trustworthiness. It highlights the diverse range of AI applications and the difficulties in evaluating their quality and reliability through assurance-based approaches. The paper examines President Biden's Executive Order aiming to enhance testing frameworks for high-capability AI models while emphasizing the expanding scope of AI applications, the challenges associated with their evaluation, and the necessity for policy interventions and technical advancements to bolster assurance mechanisms.

Source: Stanford

The Risky Terrain of General-Purpose AI: Urgent Need for Policy Measures

A consortium of researchers from UC Berkeley's AI Security Initiative highlights the pressing concern of managing the risks associated with General-Purpose AI Systems (GPAIS). With a focus on AI governance, the article emphasizes policymakers' efforts worldwide to balance harnessing the potential benefits of these powerful systems while mitigating potential harms to the public. Despite voluntary commitments from leading AI companies, the varied approaches to AI guardrails and governance create a landscape where different risk perceptions compete. The piece proposes the necessity for additional policies, advocating for mandatory standards, enhanced regulatory authority, and enforcement mechanisms. Furthermore, the researchers introduce the AI Risk-Management Standards Profile for GPAIS and Foundation Models, offering guidance for identifying, analyzing, and mitigating risks, acknowledging the collaborative effort required across industry, government, academia, and civil society for effective risk management in AI development and deployment.

Source: Tech Policy

RAISE Benchmarks: Guiding Responsible AI Adoption in the Evolving Landscape

The Responsible AI Institute (RAI Institute) has launched the Responsible AI Safety and Effectiveness (RAISE) Benchmarks, offering essential tools to support companies in implementing responsible AI principles into their systems. Comprising three benchmarks aligned with NIST and ISO 42001 standards, these tools address corporate AI policies, AI hallucinations, and vendor alignment, aiming to enhance the integrity and trustworthiness of AI products and services. By assisting organizations in navigating the evolving global regulatory landscape, these benchmarks provide guidance, evaluation, and alignment with ethical AI policies and practices. Notably, these vendor-agnostic benchmarks, available through the RAI Institute's responsible AI testbed, seek to foster a more accountable AI ecosystem, facilitating trust and compliance within the AI industry.

Source: Responsible AI Institute

Fortifying Language Models Against Adversarial Threats: Introducing Verifiable Safety Measures

This study addresses vulnerabilities in the safety measures of public-facing large language models (LLMs) used to prevent the generation of harmful content, specifically focusing on adversarial attacks known as adversarial prompts. Researchers propose "erase-and-check," a novel defense framework with verifiable safety guarantees against three types of adversarial attacks. Experimental results show promising outcomes, with the framework successfully detecting a significant portion of harmful prompts while maintaining accuracy in identifying safe prompts, achieving a 92% detection rate against specific adversarial suffixes with the open-source language model Llama 2. Additionally, the study introduces efficient empirical defenses, RandEC and GradEC, to further enhance safety filter performance against adversarial threats.

Source: arXiv

Addressing Bias in Language Models for Ethical Decision-Making

This study highlights the ethical concerns surrounding the potential discriminatory impact of language models (LMs) when utilized in critical decision-making processes like determining financial or housing eligibility. Researchers introduce a method to proactively assess the discriminatory tendencies of LMs across diverse societal decision scenarios, even in hypothetical settings where they haven't been deployed. Using this approach with the Claude 2.0 model uncovers instances of both positive and negative discrimination, prompting the development of prompt engineering techniques to significantly reduce these biases. While cautioning against automated decision-making based solely on language models, the study offers strategies to mitigate biases, providing insights for developers and policymakers to ensure safer and more equitable LM deployment in appropriate use cases amid the expanding landscape of LM capabilities and applications.

Source: Anthropic

Purple Llama: Advancing Trust and Safety in Open Generative AI

Announcing Purple Llama, an initiative aiming to foster open trust and safety tools for responsible deployment of generative AI models. It unveils CyberSec Eval, offering cybersecurity safety evaluations for Large Language Models (LLMs), and Llama Guard, a safety classifier aiding in input/output filtering for potential risky outputs. These tools address crucial concerns such as cybersecurity risks and content filtering, aligning with the Responsible Use Guide outlined in the Llama 2 model. By partnering with industry leaders and prioritizing an open ecosystem, the project aims to establish standardized development and usage of trust and safety tools, with plans to share these advancements at an upcoming NeurIPS 2023 workshop.

Source: Meta