Anthropic releases the’system prompt” that makes Claude tick
Generative artificial intelligence models aren’t really human-like. The models are not intelligent or have personalities. They’re just statistical systems that predict the likely next words to be used in a sentence. They are used by all generative AI vendors, including OpenAI and Anthropic, to control the tone of their models’ responses and prevent them from acting badly. For instance, a prompt might tell a model it should be polite but never apologetic, or to be honest about the fact that it can’t know everything.
But vendors usually keep system prompts close to the chest — presumably for competitive reasons, but also perhaps because knowing the system prompt may suggest ways to circumvent it. GPT-4o, for instance, can only be exposed by a prompt injection. And even then, the system’s output can’t be trusted completely.
However, Anthropic, in its continued effort to paint itself as a more ethical, transparent AI vendor, has published the system prompts for its latest models (Claude 3.5 Opus, Sonnet and Haiku) in the Claude iOS and Android apps and on the web.
Alex Albert, head of Anthropic’s developer relations, said in a post on X that Anthropic plans to make this sort of disclosure a regular thing as it updates and fine-tunes its system prompts.
We’ve added a new system prompts release notes section to our docs. We will log any changes made to the default system messages on Claude dot ai or our mobile apps. The system prompt has no effect on the API. pic.twitter.com/9mBwv2SgB1
— Alex Albert (@alexalbert__) August 26, 2024
The latest prompts are dated July 12 and outline what the Claude models cannot do, e.g. “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no; the system prompt for Claude 3.5 Opus tells the model to “always respond as if it is completely face blind” and to “avoid identifying or naming any humans in
.”
But the prompts also describe certain personality traits and characteristics — traits and characteristics that Anthropic would have the Claude models exemplify.[images]The prompt for Opus, for instance, says that Claude is to appear as if it “
very smart and intellectually curious,” and “enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” It also instructs Claude to treat controversial topics with impartiality and objectivity, providing “careful thoughts” and “clear information” — and never to begin responses with the words “certainly” or “absolutely.”
It’s all a bit strange to this human, these system prompts, which are written like an actor in a stage play might write a character analysis sheet. Opus’ prompt ends with the phrase “Claude is being connected with a person,” giving the impression that Claude has no other purpose than to satisfy the whims and desires of its human conversation partners. The prompts for Claude show us that these models would be utterly useless without human assistance. The gambit will be tested to see if it works.