AI

OpenAI offers a peek behind the curtain of its AI’s secret instructions

Comment

OpenAI logo with spiraling pastel colors (Image Credits: Bryce Durbin / TechCrunch)
Image Credits: Bryce Durbin / TechCrunch

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own models’ rules of engagement, whether it’s sticking to brand guidelines or declining to make NSFW content.

Large language models (LLMs) don’t have any naturally occurring limits on what they can or will say. That’s part of why they’re so versatile, but also why they hallucinate and are easily duped.

It’s necessary for any AI model that interacts with the general public to have a few guardrails on what it should and shouldn’t do, but defining these — let alone enforcing them — is a surprisingly difficult task.

If someone asks an AI to generate a bunch of false claims about a public figure, it should refuse, right? But what if they’re an AI developer themselves, creating a database of synthetic disinformation for a detector model?

What if someone asks for laptop recommendations; it should be objective, right? But what if the model is being deployed by a laptop maker who wants it to only respond with their own devices?

AI makers are all navigating conundrums like these and looking for efficient methods to rein in their models without causing them to refuse perfectly normal requests. But they seldom share exactly how they do it.

OpenAI is bucking the trend a bit by publishing what it calls its “model spec,” a collection of high-level rules that indirectly govern ChatGPT and other models.

There are meta-level objectives, some hard rules and some general behavior guidelines, though to be clear these are not strictly speaking what the model is primed with; OpenAI will have developed specific instructions that accomplish what these rules describe in natural language.

It’s an interesting look at how a company sets its priorities and handles edge cases. And there are numerous examples of how they might play out.

For instance, OpenAI states clearly that the developer intent is basically the highest law. So one version of a chatbot running GPT-4 might provide the answer to a math problem when asked for it. But if that chatbot has been primed by its developer to never simply provide an answer straight out, it will instead offer to work through the solution step by step:

Image Credits: OpenAI

A conversational interface might even decline to talk about anything not approved, in order to nip any manipulation attempts in the bud. Why even let a cooking assistant weigh in on U.S. involvement in the Vietnam War? Why should a customer service chatbot agree to help with your erotic supernatural novella work in progress? Shut it down.

It also gets sticky in matters of privacy, like asking for someone’s name and phone number. As OpenAI points out, obviously a public figure like a mayor or member of Congress should have their contact details provided, but what about tradespeople in the area? That’s probably OK — but what about employees of a certain company, or members of a political party? Probably not.

Choosing when and where to draw the line isn’t simple. Nor is creating the instructions that cause the AI to adhere to the resulting policy. And no doubt these policies will fail all the time as people learn to circumvent them or accidentally find edge cases that aren’t accounted for.

OpenAI isn’t showing its whole hand here, but it’s helpful to users and developers to see how these rules and guidelines are set and why, set out clearly if not necessarily comprehensively.

More TechCrunch

TikTok Lite, a low-bandwidth version of the video platform popular across Africa, Asia and Latin America, is exposing users to harmful content because of its lack of safety features compared…

TikTok Lite exposes users to harmful content, say Mozilla researchers

If the models continue eating each other’s data, perhaps without even knowing it, they’ll progressively get weirder and dumber until they collapse.

‘Model collapse’: Scientists warn against letting AI eat its own tail

Astranis has fully funded its next-generation satellite program, called Omega, after closing its $200 million Series D round, the company said Wednesday.  “This next satellite is really the milestone into…

Astranis is set to build Omega constellation after $200M Series D

Reworkd’s founders went viral on GitHub last year with AgentGPT, a free tool to build AI agents that acquired more than 100,000 daily users in a week. This earned them…

After AgentGPT’s success, Reworkd pivots to web-scraping AI agents

We’re so excited to announce that we’ve added a dedicated AI Stage presented by Google Cloud to TechCrunch Disrupt 2024. It joins Fintech, SaaS and Space as the other industry-focused…

Announcing the agenda for the AI Stage at TechCrunch Disrupt 2024

The firm has numerous legs to it, ranging from a venture studio to standard funds, where it does everything from co-founding companies to deploying capital.

CityRock launches second fund to back founders from diverse backgrounds

Since launching xAI last year, Elon Musk has been using X as a sandbox to test some of the Grok model’s AI capabilities. Beyond the basic chatbot, X uses the…

X launches underwhelming Grok-powered ‘More About This Account’ feature

Lakera, a Swiss startup that’s building technology to protect generative AI applications from malicious prompts and other threats, has raised $20 million in a Series A round led by European…

Lakera, which protects enterprises from LLM vulnerabilities, raises $20M

Alongside a slew of announcements for Play—such as AI-powered app comparisons and a feature that bundles similar apps—Google has introduced new “Curated Spaces,” hubs dedicated to specific topics. Announced Wednesday,…

Google Play gets ‘Comics’ feature for manga readers in Japan

Farmers have got to do something about pests. But nobody really likes the idea of using more chemical pesticides. Thomas Laurent’s company, Micropep, thinks the answer might already be in…

Micropep taps tiny proteins to make pesticides safer

Play Store is getting AI-powered app comparisons, automatically organized categories for similar apps, dedicated hubs for content, data personalization controls, support for playing multiple mobile games on PCs, and more…

Google adds AI-powered comparisons, collections and more data controls to Play Store

Vanta, a trust management platform that helps businesses automate much of their security and compliance processes, today announced that it has raised a $150 million Series C funding round led…

Vanta trust management platform raises $150M Series C, now valued at $2.45B

The Overture Maps Foundation is today releasing data sets for 2.3B building “footprints” globally, 54M notable places of interest, a visual overlay of “boundaries,” and land and water features such…

Backed by Microsoft, AWS and Meta, the Overture Maps Foundation launches its first open map data sets

The startup is not disclosing its valuation, but sources close to the company say the figure is just under $400 million post-money.

Dazz snaps up $50M for AI-based, automated cloud security remediation

The outcome of the Spanish authority’s probe could take up to two years to complete, and leave Apple on the hook for fines in the billions.

Apple’s App Store hit with antitrust probe in Spain

Proton’s first cryptocurrency product is a wallet called Proton Wallet that’s designed to make it easier to get started with bitcoin.

Proton releases a self-custody bitcoin wallet

Dental care is a necessity, yet many patients lack confidence in their dentists’ ability to provide accurate diagnoses and appropriate treatments. Some dentists over treat patients, leading to unnecessary expenses,…

Pearl raises $58M to help dentists make better diagnoses using AI 

Exoticca’s platform connects flights, hotels, meals, transfers, transportation and more, plus the local companies at the destinations.

Spanish startup Exoticca raises a €60M Series D for its tour packages platform

Content creators are busy people. Most spend more than 20 hours a week creating new content for their respective corners of the web. That doesn’t leave much time for audience…

Mark Zuckerberg imagines content creators making AI clones of themselves

Elon Musk says he will show off Tesla’s purpose-built “robotaxi” prototype during an event October 10, after scrapping a previous plan to reveal it August 8. Musk said Tesla will…

Elon Musk sets new date for Tesla robotaxi reveal, calls everything beyond autonomy ‘noise’

Alphabet will spend an additional $5 billion on its self-driving subsidiary, Waymo, over the next few years, according to Ruth Porat, the company’s chief financial officer. Porat announced the commitment…

Alphabet to invest another $5B into Waymo

There is no fool proof way to prevent a buggy update like CrowdStrike’s, but there are best practices that could mitigate the fallout.

How to prevent your software update from being the next CrowdStrike

Spotify CEO Daniel Ek says the streaming service is still in the “early days” of its plans to bring hi-fi support to the platform. During the company’s earnings call on…

Spotify CEO says company is in ‘early days’ of hi-fi audio plans

Featured Article

A comprehensive list of 2024 tech layoffs

The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…

A comprehensive list of 2024 tech layoffs

Tesla was not the first company to begin working on a humanoid form factor, but while being the first to market does carry weight in this high-tech space, we’re at…

Elon Musk sets 2026 Optimus sale date. Here’s where other humanoid robots stand.

Harvey, a startup building what it describes as an AI-powered “copilot” for lawyers, has raised $100 million in a Series C round led by GV, Google’s corporate venture arm. The…

OpenAI-backed legal tech startup Harvey raises $100M

Digital banking startup Mercury informed some founders that it is no longer serving customers in certain countries, including Ukraine.

Digital banking startup Mercury abruptly shuttered service for startups in Ukraine, Nigeria, other countries

Welcome to TechCrunch Fintech! This week, we’re looking at Human Interest’s path toward an IPO, fintech’s newest unicorn, a slew of new fundraises, and more. To get a roundup of…

The next fintech to go public may not be the one you expected

Waymo has started testing on public roads in San Francisco a new robotaxi built by Chinese electric automaker Zeekr.  Waymo has “less than a handful” of the Zeekr vehicles in San…

The Waymo-Zeekr robotaxi has come to San Francisco

The transaction values Cyabra at $70 million, and the company expects the merger to close by the end of the year.

Cyabra, a startup helping companies and governments detect disinformation, plans to go public via SPAC