×

Query Your ChatGPT History with Agentic Research & Local LLM

Query Your ChatGPT History with Agentic Research & Local LLM

Enhancing Your ChatGPT Data Management with Local Search and Custom Agents

As AI enthusiasts and developers, many of us have explored the convenience of retrieving our chat histories from platforms like OpenAI. Recently, I discovered that OpenAI allows users to request a comprehensive export of their conversation data in JSON format. This insight prompted me to develop a personalized solution to make this data more accessible and searchable.

In this article, I’ll share a customized approach to ingesting, organizing, and querying your ChatGPT conversation history and other datasets using local machine learning tools. My goal? To create a flexible, efficient, and privacy-conscious system for AI data retrieval and analysis.

Building a Local Searchable Database

The core of this system is a script that takes a folder of documents—be it chat logs, Reddit data, or any other text files—and imports them into a ChromaDB database. Once stored, you can perform iterative, agentic searches that help you extract meaningful insights through natural language queries.

To enhance the querying process, I integrated a local inference model—specifically from Ollama—that formulates comprehensive responses based on your search results. This setup ensures your data remains on your machine, respecting privacy while providing powerful search capabilities.

From JSON to Organized Files

To facilitate easy ingestion and retrieval, I also developed a script that converts your ChatGPT JSON exports into Markdown files organized chronologically. This structure allows for straightforward management of your conversation history and other datasets, including Reddit scrapes or any custom data sources.

Seeking Feedback and Improvements

While the current implementation has already improved upon earlier versions—especially those lacking agentic search—I recognize room for enhancement. For instance, exploring more sophisticated chunking techniques, better data preprocessing methods, or adopting more robust agent frameworks could significantly boost performance.

Next Steps: Automating News Summaries

Looking ahead, I plan to extend this system by integrating an RSS feed scraper. This tool will fetch new articles or updates, generate concise summaries, and store them as Markdown files. These documents can then be ingested into the database, ensuring your knowledge base stays current with minimal manual effort.

Call for Collaboration

My overarching aim is to refine and modularize this approach, making it a versatile tool for personal knowledge management or research. I believe there are even more advanced methods out there—techniques that could elevate this system further.

If you have suggestions, insights, or best practices, I’d love to hear them. Feel free to explore the code and contribute on GitHub:

Post Comment