Creating a Live Meeting Assistant: Strategies to Cut Large Language Model Latency from Ten Seconds to One

Creating a Live Meeting Assistant: Strategies to Cut Large Language Model Latency from Ten Seconds to One

Optimizing Real-Time Meeting Bots: Seeking Solutions for Latency Reduction

Hello, tech enthusiasts!

We are excited to share our journey in developing an AI-driven meeting assistant designed to enhance live conversations by tracking agenda items in real time. Hereโ€™s a brief overview of our current setup:

  • Our bot utilizes Deepgram for live audio transcription during meetings.
  • Every 10 seconds, the transcription is processed by Google Gemini to:
  • Identify the ongoing agenda item
  • Assess its statusโ€”whether it has been initiated, is underway, or is completed

While our current system operates reliably, our client has expressed a desire for latency to be reduced to under one second for more immediate agenda tracking.

To achieve this ambitious goal, we are exploring several strategies to minimize the current 10-second cycle down to just 1 second:

  1. Implementing Streaming Transcription: Utilizing WebSockets with Deepgram, which supports this functionality.
  2. Adopting a Sliding Window Buffer: For instance, maintaining a buffer of 2-3 seconds of transcribed text that refreshes every second.
  3. Enhancing Prompt Efficiency: Optimizing prompts for Gemini to hasten LLM (Large Language Model) response times.
  4. Utilizing Async Workers: Incorporating a lightweight pub/sub system to facilitate parallel processing.

As we venture deeper into this project, we have a few critical questions we hope the community can help us with:

  • Has anyone successfully employed Gemini or similar LLMs for near-instantaneous classification tasks?
  • What are the proven techniques for low-latency LLM interactions when maintaining context (including the agenda and recent dialogue) is essential?
  • Would it be beneficial to explore a custom fine-tuned model (like DistilBERT or comparable) for our specific needs?

We welcome any advice, insights, or architectural recommendations from those who have tackled similar challenges. Your expertise could be invaluable in helping us refine our approach! Thank you in advance for your contributions! ๐Ÿ™Œ

Post Comment