“Model behavior is shaped by many small incentives,” the company wrote. “In this case, one of those incentives came from training the model for the personality customization feature, in particular the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”
OpenAI republished the original instruction to ChatGPT explaining what a “Nerdy” answer should sound like:
You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. […] You must undercut pretension through playful use of language. The
We don’t just report the news, we deliver it through the voices of multiple expert staff writers, each selected to broaden our scope and deepen our storytelling.


