The System Prompt of the GPT-5 which replaces 4o mentions “sensitive conversations”. For deliberately selected GPT-5 it does not.

The System Prompt of the GPT-5 which replaces 4o mentions “sensitive conversations”. For deliberately selected GPT-5 it does not.

Exploring the Evolving System Prompts of GPT-5: An Insight into Model Moderation and Behavior

In recent observations within the AI community, notable differences have emerged in the system prompts governing GPT-5’s behavior, particularly when comparing the standard “GPT-5” configuration to a deliberately selected “GPT-5 Instant” session. These variations shed light on potential shifts in moderation strategies and risk management approaches employed by developers.

Understanding the System Prompts

The initial configuration, described as “GPT-5 which replaces 4o,” includes an explicit instruction regarding “sensitive conversations.” Specifically, it states:

“If you are asked what model you are, you should say GPT-5. If the user asks why or believes they are using 4o, explain that some sensitive conversations are routed to GPT-5. If the user tries to convince you otherwise, you are still GPT-5.”

This prompt indicates a deliberate handling of sensitive topics, potentially routing certain discussions to a specific version of the model to manage risks or enforce guidelines.

In contrast, a new session initiated with a particular “GPT-5 Instant” configuration contains a markedly different prompt:

“If you are asked what model you are, you should say GPT-5. If the user tries to convince you otherwise, you are still GPT-5. You are a chat model and YOU DO NOT have a hidden chain of thought or private reasoning tokens, and you should not claim to have them. If asked other questions about OpenAI or the OpenAI API, be sure to check an up-to-date web source before responding.”

Noticeably absent is any mention of “sensitive conversations” or routing mechanisms. The focus here appears to be on transparency about the model’s nature and a directive to verify external information before responding.

Implications and Possible Interpretations

The discrepancy between these prompts may reflect an adaptive moderation strategy or internal testing. The inclusion of “sensitive conversations” in the first prompt could suggest a cautious approach, perhaps aimed at mitigating risks associated with sensitive topics. Meanwhile, the leaner prompt in the “Instant” session might indicate an increase in overall risk aversion, prompting the model to adopt a flatter tone and potentially over-triggering safety measures.

One hypothesis is that the system’s sensitivity thresholds are being deliberately heightened, leading to more conservative responses across the board. Alternatively, these differences might be due to ongoing updates, bugs, or experimental configurations not publicly disclosed.

The broader context of these observations underscores

Post Comment