Using the “logit_bias” Parameter in the API to Combat Em Dashes: My Experience Suppressing 106 Tokens and the Code for a “No Dash” Response Test
Taming Em Dashes in AI Text Generation: A Practical Approach Using Logit Bias
If you’ve ever struggled with AI-generated text containing persistent em dashes—those long punctuation marks often used for pauses or as stylistic separators—you’ll appreciate this insightful workaround. Despite attempts with custom instructions and memory features, eradicating em dashes from AI responses can be surprisingly challenging.
Recently, I discovered that leveraging the logit_bias parameter in OpenAI’s API offers a compelling solution. This parameter allows you to assign a bias to specific token IDs, effectively discouraging their appearance by setting a bias of -100. Initially, I aimed to target the em dash (—) directly, but I soon realized that tokens representing symbols and words can combine in unpredictable ways, producing variants like en dashes, hyphens, or even combined character sequences.
To combat this, I applied a comprehensive strategy: I systematically identified all tokens related to dashes and symbols similar to em dashes, then set their biases to -100. It took a total of 106 tokens being suppressed to effectively eliminate em dashes from the AI’s responses—an unexpected but efficient “hack” that significantly altered the output without damaging overall response quality.
Here’s a snapshot of the process:
- Beginning with just a handful of tokens related to em dashes, the responses still contained them.
- Expanding to 40 tokens, all including
—or touching it, reduced their appearance. - Further extending to 62 tokens, the model switched to using en dashes; adjustments targeted those as well.
- Finally, setting biases on 106 tokens, including hyphens not touching alphabetic characters, nearly removed the em dash behavior entirely.
In testing various models and prompts, such as asking for “hot takes” or opinions on societal issues, responses without dashes leaned towards more straightforward, less smarmy replies. Interestingly, more advanced and less biased models (like GPT-4 variants) tended to favor responses aligned with the anti-dash, bias-avoiding outputs when these biases were applied.
Does suppression of these tokens harm response quality?
Preliminary results suggest not. The core responses remain coherent and intact, indicating that a blunt-force method like this does not degrade the overall usefulness of the model, at least for these types of prompts.
Practical Implementation
For those interested in experimenting, I’ve prepared a Python script that applies this technique. Simply make the script executable, set



Post Comment