×

Utilizing the API’s “logit_bias” Feature to Minimize Em Dashes: My Experience with Suppressing 106 Tokens and the Results of a “No Dash” Response Test with Sample Code

Utilizing the API’s “logit_bias” Feature to Minimize Em Dashes: My Experience with Suppressing 106 Tokens and the Results of a “No Dash” Response Test with Sample Code

How to Suppress Em Dashes in ChatGPT Responses Using OpenAI’s API

Dealing with unwanted em dashes in AI-generated text can be a persistent challenge. Despite attempts with custom instructions and settings, eliminating these characters often seems nearly impossible. However, by leveraging the logit_bias parameter in the OpenAI API, it is possible to significantly reduce or even eradicate the appearance of em dashes and similar punctuation.

Understanding the Challenge

Many users have experienced frustration with ChatGPT frequently inserting em dashes () in responses, especially in creative or stylistic prompts. Conventional approaches—such as adjusting system prompts, fine-tuning instructions, or modifying memory—often fall short of suppressing these characters entirely. Their presence is partly due to tokenization complexities: tokens for symbols and words can combine in unexpected ways, making simple suppression techniques less effective.

A Solution Through Logit Bias

The logit_bias parameter allows for influencing the probability that certain tokens will be generated. By setting a bias of -100 for specific token IDs, you drastically reduce the likelihood that the model will choose them. The key steps involve:

  1. Identifying the token IDs for various forms of em dashes, en dashes, hyphens, and related punctuation.
  2. Applying a bias of -100 to these tokens.
  3. Experimenting with the number of biased tokens until the desired suppression level is achieved.

Implementation Journey

The process starts with pinpointing the exact token IDs. A straightforward approach involves analyzing the tokenizer for the model (e.g., GPT-4 or GPT-3.5) to locate tokens like , , and -. Because tokens can combine to form different characters, an exhaustive biasing is sometimes necessary. In practice, it took setting biases for over 100 tokens—specifically 106—to effectively diminish em dash usage.

For instance:

  • Initially biasing tokens that include
  • Expanding biasing to cover en dashes
  • Addressing hyphens used as em dashes

This comprehensive biasing ensures the model’s behavior aligns with the goal: responses free from em dashes.

Testing the Approach

Sample prompts tested include asking ChatGPT for a “hot take” or strategies for resolving political polarization. Responses generated with the logit_bias adjustments show a marked difference:

  • Without biasing, responses often contain em dashes.
  • *With extensive biasing

Post Comment