×

Experimenting with the “logit_bias” parameter in the API to eliminate em dashes led me to suppress 106 tokens—here’s what I discovered and the code for your own “dash-free” response test.

Experimenting with the “logit_bias” parameter in the API to eliminate em dashes led me to suppress 106 tokens—here’s what I discovered and the code for your own “dash-free” response test.

Controlling Em Dashes in GPT Output: A Practical Approach Using Logit Bias

If you’ve ever tried to eliminate em dashes from AI-generated text, you know how persistent they can be. Despite various prompts, instructions, and tweaks, models often revert to using em dashes naturally. However, there’s a way to significantly suppress their appearance—by leveraging OpenAI’s logit_bias parameter.

During my experiments, I aimed to prevent GPT-4 from using em dashes altogether. Simply identifying the token ID for the em dash (“—”) wasn’t sufficient because GPT tends to generate composite tokens (like en dashes or hyphens) that still resemble em dashes. To tackle this comprehensively, I found that setting the bias to suppress over 100 related tokens was necessary.

Here’s a summary of my process:

  • Initially, I biased out individual tokens representing the em dash.
  • As the model started substituting with en dashes and hyphens, I expanded the bias to include tokens related to these characters.
  • Ultimately, applying a strict bias of -100 to 106 different tokens associated with dashes effectively minimized their use without compromising the coherence of responses.

Key Findings:

  • Suppressing these tokens did not noticeably degrade the quality or accuracy of model outputs.
  • The models, especially more advanced ones, shifted towards more conventional punctuation.
  • While a fine-tuned model might handle this better, the brute-force method of token biasing proves surprisingly effective.

Sample Testing:

I evaluated responses to provocative prompts—such as “give your best hot take” or “address the balkanization in the US”—and compared usual responses with those influenced by the bias. The responses showed a clear tendency: the biased models favored more straightforward punctuation, often eliminating the em dash altogether.

Implementation:

If you’re interested in replicating this approach, I’ve prepared a Python script that applies the bias. It’s straightforward:

  • Ensure your OPENAI_API_KEY environment variable is set.
  • Make the script executable.
  • Pass your prompt as an argument.

The comprehensive list of tokens and details can be found in this GitHub Gist.

Conclusion:

Although the method is blunt, applying extreme biasing to all tokens related to dashes can effectively suppress their usage without damaging output quality

Post Comment