Skip to main content

IAB Workshop on AI-CONTROL (aicontrolws)

Team Name IAB Workshop on AI-CONTROL
Acronym aicontrolws
State Active
Personnel Chair Suresh Krishnan

Group description

Large Language Models and other machine learning techniques require voluminous input data, and one common source of such data is the Internet -- usually, "crawling" Web sites for publicly available content, much in the same way that search engines crawl the Web.

This similarity has led to an emerging practice of allowing the Robots Exclusion Protocol (RFC 9309) to control the behavior of AI-oriented crawlers.

This emerging practice raises many design and operational questions. It is not yet clear whether robots.txt (the mechanism specified by RFC 9309) is well-suited to controlling AI crawlers. A content creator or host may not be able to distinguish a crawler used for search indexing from a crawler used for LLM ingest – and indeed some crawlers may be used for both purposes. Potential use cases may extend across many different units of content, policies to be signaled, and types of content creators. Before robots.txt becomes a de facto solution to AI crawling opt-out, it is necessary to examine whether it is an appropriate mechanism: in particular, whether the creator of a particular unit of content can realistically and fully exercise their right to opt-out, and the scope of data ingest to which that opt-out applies.

This workshop aims to explore practical opt-out mechanisms for AI, and build an understanding of use cases, requirements, and other considerations in this space. The workshop will focus on mechanisms to communicate the opt-out choice and their associated data models. Technical enforcement of opt-out signals is not in scope.

The IAB is looking for short position papers on the following topics; however, this list is non-exhaustive and should be interpreted broadly:

  • User stories, use cases, and requirements for opting content out of inclusion in large language models, from a variety of sources including but not limited to the Web
  • Interactions between opt-out mechanisms and different use cases for AI
  • Advantages and/or deficiencies of reusing robots.txt for controlling AI crawlers on the Web
  • Comparisons of use cases for crawling opt-out
  • Desired properties of an AI opt-out mechanism
  • Potential developments in AI that may require adjustments in opt-out mechanisms
  • Implications of legal/policy frameworks (e.g., copyright, privacy, research ethics) and requirements on the design of opt-out mechanisms
  • Evolution of opt-out signals

Because robots.txt is emerging as a solution in this space, the discussion will be anchored on it as a starting point, but not limited to that mechanism. Proposals for alternative solutions may be made, but time will not be available for a detailed presentation or discussion.

Interested participants are invited to submit position papers on the workshop topics. Participants can choose their preferred format, including Internet-Drafts, text- or word-based documents, or papers formatted similar as used by academic publication venues. Submission as PDF is preferred. Paper size is not limited, but brevity is encouraged. By default, submissions that are considered relevant will be published on the workshop website. If you wish for your submission to be anonymised or withheld from such publication, please indicate that clearly in the submission.

The organizers will issue invitations based on the submissions received. Sessions will be organized according to the submissions received, and not every accepted submission or invited attendee will have an opportunity to present; the intent is to foster an active discussion and not simply to have a sequence of presentations.

Discussion at the workshop will be held under Chatham House rule, and therefore will not be recorded or minuted. However, a workshop report will be published afterwards. It is anticipated that the workshop report will include:

  • A list of participants (unless they request to be withheld)
  • Documentation of use cases and requirements discussed
  • Recommendations for IETF standards work to be considered (if any)
  • Recommendations for non-IETF standards work to be considered (if any)

The workshop will be by invitation only. Those wishing to attend should submit a position paper to ai-control-workshop-pc@iab.org. Position papers from those not planning to attend the workshop themselves are also encouraged.

Logistics:

  • Submissions Due: 2 August 2024
  • Invitations Issued by: 15 August 2024
  • Workshop Dates: 19-20 September 2024
  • Workshop Location: Wilkinson Barker Knauer, 1800 M Street NW Suite 800N, Washington, DC 20036, USA

Feel free to contact the Program Committee with any further questions: ai-control-workshop-pc@iab.org.

Program Committee

  • Alissa Cooper, IAB, Knight-Georgetown Institute
  • Alvaro Retana, IAB, Futurewei
  • Derek Slater, Proteus Strategies
  • Dhruv Dhody, IAB, Huawei
  • Farzanei Badiei, DigitalMedusa
  • Gary Ilyes, Google
  • Mark Nottingham, Cloudflare
  • Mirja Kühlewind, IAB, Ericsson
  • Neil Lawrence, U. of Cambridge
  • Paul Ohm, Georgetown Law
  • Qin Wu, IAB, Huawei
  • Suresh Krishnan, IAB, Cisco
  • Thom Vaughan, Common Crawl