-
Collective Constitutional AI: Aligning a Language Model with Public Input
Authors:
Saffron Huang,
Divya Siddarth,
Liane Lovitt,
Thomas I. Liao,
Esin Durmus,
Alex Tamkin,
Deep Ganguli
Abstract:
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a t…
▽ More
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Authors:
Markus Anderljung,
Joslyn Barnhart,
Anton Korinek,
Jade Leung,
Cullen O'Keefe,
Jess Whittlestone,
Shahar Avin,
Miles Brundage,
Justin Bullock,
Duncan Cass-Beggs,
Ben Chang,
Tantum Collins,
Tim Fist,
Gillian Hadfield,
Alan Hayes,
Lewis Ho,
Sara Hooker,
Eric Horvitz,
Noam Kolt,
Jonas Schuett,
Yonadav Shavit,
Divya Siddarth,
Robert Trager,
Kevin Wolf
Abstract:
Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilit…
▽ More
Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety. Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model's capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed: (1) standard-setting processes to identify appropriate requirements for frontier AI developers, (2) registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and (3) mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models. Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them. We consider several options to this end, including granting enforcement powers to supervisory authorities and licensure regimes for frontier AI models. Finally, we propose an initial set of safety standards. These include conducting pre-deployment risk assessments; external scrutiny of model behavior; using risk assessments to inform deployment decisions; and monitoring and responding to new information about model capabilities and uses post-deployment. We hope this discussion contributes to the broader conversation on how to balance public safety risks and innovation benefits from advances at the frontier of AI development.
△ Less
Submitted 7 November, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Model evaluation for extreme risks
Authors:
Toby Shevlane,
Sebastian Farquhar,
Ben Garfinkel,
Mary Phuong,
Jess Whittlestone,
Jade Leung,
Daniel Kokotajlo,
Nahema Marchal,
Markus Anderljung,
Noam Kolt,
Lewis Ho,
Divya Siddarth,
Shahar Avin,
Will Hawkins,
Been Kim,
Iason Gabriel,
Vijay Bolina,
Jack Clark,
Yoshua Bengio,
Paul Christiano,
Allan Dafoe
Abstract:
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify danger…
▽ More
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
△ Less
Submitted 22 September, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Democratising AI: Multiple Meanings, Goals, and Methods
Authors:
Elizabeth Seger,
Aviv Ovadya,
Ben Garfinkel,
Divya Siddarth,
Allan Dafoe
Abstract:
Numerous parties are calling for the democratisation of AI, but the phrase is used to refer to a variety of goals, the pursuit of which sometimes conflict. This paper identifies four kinds of AI democratisation that are commonly discussed: (1) the democratisation of AI use, (2) the democratisation of AI development, (3) the democratisation of AI profits, and (4) the democratisation of AI governanc…
▽ More
Numerous parties are calling for the democratisation of AI, but the phrase is used to refer to a variety of goals, the pursuit of which sometimes conflict. This paper identifies four kinds of AI democratisation that are commonly discussed: (1) the democratisation of AI use, (2) the democratisation of AI development, (3) the democratisation of AI profits, and (4) the democratisation of AI governance. Numerous goals and methods of achieving each form of democratisation are discussed. The main takeaway from this paper is that AI democratisation is a multifarious and sometimes conflicting concept that should not be conflated with improving AI accessibility. If we want to move beyond ambiguous commitments to democratising AI, to productive discussions of concrete policies and trade-offs, then we need to recognise the principal role of the democratisation of AI governance in navigating tradeoffs and risks across decisions around use, development, and profits.
△ Less
Submitted 7 August, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Generative AI and the Digital Commons
Authors:
Saffron Huang,
Divya Siddarth
Abstract:
Many generative foundation models (or GFMs) are trained on publicly available data and use public infrastructure, but 1) may degrade the "digital commons" that they depend on, and 2) do not have processes in place to return value captured to data producers and stakeholders. Existing conceptions of data rights and protection (focusing largely on individually-owned data and associated privacy concer…
▽ More
Many generative foundation models (or GFMs) are trained on publicly available data and use public infrastructure, but 1) may degrade the "digital commons" that they depend on, and 2) do not have processes in place to return value captured to data producers and stakeholders. Existing conceptions of data rights and protection (focusing largely on individually-owned data and associated privacy concerns) and copyright or licensing-based models offer some instructive priors, but are ill-suited for the issues that may arise from models trained on commons-based data. We outline the risks posed by GFMs and why they are relevant to the digital commons, and propose numerous governance-based solutions that include investments in standardized dataset/model disclosure and other kinds of transparency when it comes to generative models' training and capabilities, consortia-based funding for monitoring/standards/auditing organizations, requirements or norms for GFM companies to contribute high quality data to the commons, and structures for shared ownership based on individual or community provision of fine-tuning data.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Vaccine Credential Technology Principles
Authors:
Divya Siddarth,
Vi Hart,
Bethan Cantrell,
Kristina Yasuda,
Josh Mandel,
Karen Easterbrook
Abstract:
The historically rapid development of effective COVID-19 vaccines has policymakers facing evergreen public health questions regarding vaccination records and verification. Governments and institutions around the world are already taking action on digital vaccine certificates, including guidance and recommendations from the European Commission, the WHO, and the Biden Administration. These could be…
▽ More
The historically rapid development of effective COVID-19 vaccines has policymakers facing evergreen public health questions regarding vaccination records and verification. Governments and institutions around the world are already taking action on digital vaccine certificates, including guidance and recommendations from the European Commission, the WHO, and the Biden Administration. These could be encouraging efforts: an effective system for vaccine certificates could potentially be part of a safe return to work, travel, and daily life, and a secure technological implementation could improve on existing systems to prioritize privacy, streamline access, and build for necessary interoperability across countries and contexts. However, vaccine credentials are not without potential harms, and, particularly given major inequities in vaccine access and rollout, there are valid concerns that they may be used in ineffective or exclusionary ways that exacerbate inequality, allow for discrimination, violate privacy, and assume consent. While the present moment calls for urgency, we must also acknowledge that choices made in the vaccine credentialing rollout for COVID-19 are likely to have long-term implications, and must be made with care. In this paper, we outline potential implementation and ethical concerns that may arise from tech-enabled vaccine credentialing programs now and in the future, and discuss the technological tradeoffs implicated in these concerns. We suggest a set of principles that, if adopted, may mitigate these concerns, forestall preventable harms, and point the way forward; the paper is structured as a deep dive into each of these principles.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Who Watches the Watchmen? A Review of Subjective Approaches for Sybil-resistance in Proof of Personhood Protocols
Authors:
Divya Siddarth,
Sergey Ivliev,
Santiago Siri,
Paula Berman
Abstract:
Most current self-sovereign identity systems may be categorized as strictly objective, consisting of cryptographically signed statements issued by trusted third party attestors. This failure to provide an input for subjectivity accounts for a central challenge: the inability to address the question of "Who verifies the verifier?". Instead, these protocols outsource their legitimacy to mechanisms b…
▽ More
Most current self-sovereign identity systems may be categorized as strictly objective, consisting of cryptographically signed statements issued by trusted third party attestors. This failure to provide an input for subjectivity accounts for a central challenge: the inability to address the question of "Who verifies the verifier?". Instead, these protocols outsource their legitimacy to mechanisms beyond their internal structure, relying on traditional centralized institutions such as national ID issuers and KYC providers to verify the claims they hold. This reliance has been employed to safeguard applications from a vulnerability previously thought to be impossible to address in distributed systems: the Sybil attack problem, which describes the abuse of an online system by creating many illegitimate virtual personas. Inspired by the progress in cryptocurrencies and blockchain technology, there has recently been a surge in networked protocols that make use of subjective inputs such as voting, vouching, and interpreting, to arrive at a decentralized and sybil-resistant consensus for identity. In this article, we will outline the approaches of these new and natively digital sources of authentication -- their attributes, methodologies strengths, and weaknesses -- and sketch out possible directions for future developments.
△ Less
Submitted 13 October, 2020; v1 submitted 26 July, 2020;
originally announced August 2020.
-
COVID, BLM, and the polarization of US politicians on Twitter
Authors:
Anmol Panda,
Divya Siddarth,
Joyojeet Pal
Abstract:
We mapped the tweets of 520 US Congress members, focusing on analyzing their engagement with two broad topics: first, the COVID-19 pandemic, and second, the recent wave of anti-racist protest. We find that, in discussing COVID-19, Democrats frame the issue in terms of public health, while Republicans are more likely to focus on small businesses and the economy. When looking at the discourse around…
▽ More
We mapped the tweets of 520 US Congress members, focusing on analyzing their engagement with two broad topics: first, the COVID-19 pandemic, and second, the recent wave of anti-racist protest. We find that, in discussing COVID-19, Democrats frame the issue in terms of public health, while Republicans are more likely to focus on small businesses and the economy. When looking at the discourse around anti-Black violence, we find that Democrats are far more likely to name police brutality as a specific concern. In contrast, Republicans not only discuss the issue far less, but also keep their terms more general, as well as criticizing perceived protest violence.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.