The Complex Reality of AI-Generated Code: A Cautionary Tale
Yesterday, I found myself deep in the trenches of debugging AI-generated code, and I felt a compelling need to share my experience. While many are excited by the promises of Artificial Intelligence and large language models (LLMs), my encounter revealed some stark realities that can’t be ignored.
A Personal Journey with AI
I must admit that I haven’t extensively used AI or LLMs in my personal projects. I’ve experimented with ChatGPT for planning vacations and occasionally rely on GitHub Copilot for coding assistance. I appreciate the potential of these evolving technologies but remain aware of their limitations.
Currently, my workplace is undergoing a significant transition, moving from SAS to a hybrid system utilizing SQL and Python. This transformation involves converting a substantial amount of code, and a colleague suggested leveraging an LLM to aid in this task. Intrigued, we decided to test its capabilities by providing it with a straightforward coding assignment.
The Task at Hand
As part of this initiative, I took on the responsibility of reviewing the AI-generated output. What followed was a lengthy and frustrating process where I meticulously compared the original code with the LLM’s version, and I was astounded by the myriad shortcomings I encountered.
The Failings of AI
-
Misplaced Logic: The AI created functions intended to replace logic tests but failed to call them when necessary. In their place, it injected nonsensical dummy values, which would technically execute but yield incorrect results.
-
Code Reuse Gone Wrong: In instances where similar code existed, the AI attempted to consolidate them into a single function, blending different logic that ultimately caused confusion.
-
Ignored Syntax: The original code contained some poorly formatted yet technically correct SQL statements that the model overlooked entirely, abandoning critical elements needed for the functionality.
-
Erroneous Data Comparisons: One important test involved comparing the sum of a column against a significantly large number to verify the data load. Instead of using the correct reference, the model introduced an unrelated arbitrary value.
-
Inconsistent Rewrites: My manager circulated two different versions of the code generated by the AI, revealing discrepancies in logic and missed lines. This inconsistency raised concerns about relying on multiple outputs from the model for numerous jobs.
A Painful Conclusion
Ultimately, we were left with a broken version of the code that requires a complete rewrite. While I understand that
Leave a Reply