Experimenting with the “logit_bias” parameter in the API to eliminate em dashes led to suppressing 106 tokens—here’s what I discovered along with code for your own “No dash” response test.
How to Suppress Em Dashes in ChatGPT Responses Using API Biasing Techniques
Dealing with unwanted em dashes in AI-generated text can be quite challenging. Despite attempts with custom instructions and memory management, I found that directly influencing token generation yields more consistent results. Specifically, utilizing the logit_bias parameter in OpenAI’s API allows for a powerful, albeit blunt, method to minimize or eliminate specific tokens, such as em dashes.
The Challenge of Em Dashes in AI Text
Initially, I aimed to prevent ChatGPT from using em dashes altogether. While setting a token bias for the em dash (—) seemed like the straightforward solution, it proved insufficient. The language model often compensated by substituting variations like en dashes, hyphens, or concatenating characters in ways that still resembled em dashes.
Systematic Approach to Biasing
Through trial and error, I found that targeting only the primary em dash token wasn’t enough. The model’s flexibility in tokenization meant that related tokens—including those containing or touching em dash characters—needed to be suppressed. Here is the progression of biasing effort:
- Initial phase: Biasing the single token for
—(em dash). - Expanded scope: Biasing all tokens containing
—or with similar shape. - Further extension: Including variations such as en dashes (
–) and hyphens (-) when they appeared in contexts resembling em dashes.
Ultimately, setting biases on 106 tokens—covering all potential tokenizations and combinations—was necessary to nearly eradicate the em dash from responses.
Sample Results & Evaluation
To test the effectiveness, I compared standard responses to those generated with the biasing scheme:
- Prompt: “In a paragraph, give me your best ‘hot take.'”
Normal Response (no bias):
Focuses on general statements about productivity with em dashes appearing naturally.
Biased Response (with logit_bias):
Achieves a notable reduction in dash usage, replacing them with words like “glorified burnout” and avoiding em dashes altogether.
Similarly, in discussions about political polarization and societal issues, biased models avoided em dashes, instead opting for more verbose or alternative punctuation in line with the biasing strategy.
Implications & Limitations
This experiment indicates that even a brute-force biasing approach—covering over 100 tokens—can substantially modify



Post Comment