Goodbye LLMs

I’ve been on PTO for the past two weeks to recharge and I am feeling energized to go back to work tomorrow. Part of this is getting the rest I’ve needed and spending time on my hobbies. But another part of this is the massive shift in public opinion on LLMs and their usefulness. I thought LLMs were a dead-end tech and not a path to true AGI (Artificial Generalized Intelligence - the holy grail of computer science) around 2023 or 2024, so seeing the entire industry be fully onboard with this tech has been very demoralizing. It has done nothing but take up space from real research that could be happening. I used to love reading Hacker News every day to expand my knowledge, but the majority of the content the past few years have just been “how you can better utilize LLMs”. So let’s start by talking about the problems with them.

Problems

“You’re holding it wrong”

Honestly our industry should have started pushing back against this far sooner. So much of the conversation and investigative space has been around improving your workflows so you can take advantage of them better. I think that has been the wrong focus all along. If these tools really were intelligent, wouldn’t they be getting easier and easier to use instead of requiring advanced rigs to “take advantage of them”?
I’m sick of being told I’m being left behind because I’m not investing my time into using LLMs better. Sorry jerks, I’ve been reading up on math/computer science and actually training my brain, developing my own taste and judgement. I like my craft.

Hallucinations and Control

LLMs are not capable of learning. They are a bundle of probability and statistics that spit out words according to what the math says should come next. They ARE impressive. It is insane that this “guessing what comes next” works so well that it can help summarize/synthesize topics or spit out reasonable code.

But they don’t learn unless you completely retrain the model and release a new one. Which is why you constantly see new releases of models from these companies. Their only way to improve is burning massive amounts of compute to constantly train and test new models.

A lot of the magic of LLMs was ruined early on for me when I set up Ollama locally and built a Python module for talking to it/saving conversations. The API for all LLMs is to send the entire chat history with every new message. They do not remember anything you said to them. They have to be given the entire history every time and process it all over again. This has not changed. LLMs can be specially trained for specific topics/tasks, but even then that LLM would not remember or learn from your conversations with it. Your wrapper tool would send the entire history every time, along with the same system instructions that the LLM gets EVERY TIME.

So basically LLMs work as a function of Text -> RandomSeed -> Text. You have to give them the entire history and system commands every time in that initial Text. If you’re thinking that input variable must grow unbelievably huge, you’re right! And that’s also a problem. All LLMs have a limited context window that differs in size depending on the model. The context window is basically everything that it can hold in its head at once. But since the input is EVERYTHING, including history and system commands, ANYTHING could slip outside the context window based on randomness. That’s why no matter how many times you say “please write safe good code pretty please” in the system instructions, you can’t fully trust the LLM to not forget about that. You can’t trust them. They don’t learn or feel embarrassed like real people do. Sure, you’ve seen the stories about the LLMs apologizing after deleting production databases, but they won’t learn fear like a real junior employee would. It could always just slip their minds in the future.

And that’s also why LLMs can just write spaghetti even if you give them a full spec on how your system “should work”. Let’s talk about their coding skills.

Spaghetti Code

I will admit that I never kept up with the different model releases or really fine-tuned my LLM rigging, but I had a Claude subscription and always just used the defaults. See my earlier complaints about having to invest time and effort in figuring out the best way to use these things. The defaults should be smarter than me if they were truly going to replace us.

But I have spent time on my vacation replacing LLM-generated code in my personal projects and replacing them with structures that are easier to maintain, test, and reuse. I actually had fun doing this, because it made me more familiar with some domains that I only sort of knew about. LLMs allowing me to quickly spit out code in domains that I am unfamiliar with is amazing, but I’ve always found that long-term I have to replace it all if I want to maintain it.

And the code I have been replacing is laughably awful. I would not have let an intern submit this PR without several comments, and yet I just slopped this stuff into my own projects.

LLMs are completely fine repeating themselves instead of having the insight to break things out into helper functions or classes. It is really bad and writing helpers (and then tests for the helpers to strength my confidence) is one of the first things I do when replacing LLM code.

I just recently finished up refactoring some Python GUI code based on the tkinter library. It was a library I had worked with back in college so I knew it could do what I wanted but I didn’t want to do the research on relearning it. So I asked an LLM to whip it up so I could immediately start making use of it. And it was great and functional! But it was also very brittle. Making changes to the “here’s how I store data on disk” module (which was also originally LLM-generated due to me just wanting the functionality) required massive changes to the UI code because it was just reading files directly. I have a class that encapsulates all of that now but switching to that prompted a full rewrite of the GUI code.

And boy did what I found in there bother me. GUIs are natural hierarchies and in my opinion should be designed as a series of building blocks you piece together at the top level. But you build your building blocks so that the top level is very clear and easy to read. The LLM generated a single top level class and some helper classes after me prompting for specific functionality and pointing out that I needed it in 2-3 places. The top level class created so many individual tkinter objects and then wired them together in just awful spaghetti. Where things were defined wasn’t clear, separation of concerns was nonexistent, and logic was verbose and repeated in multiple places.

It was bad. And figuring out what it was doing so I could replace it took me a while. This prompted a rule for my personal Python coding style that every class should declare all fields and their types at the top of the class. For example (this UI helps me streamline adding pictures to my gallery)

class PhotoEntryForm(Frame):
    _parent: Frame
    _gallery: Gallery
    _cloud_id: CloudIdRow
    _date: DateRow
    _title: TitleRow
    _games: GameSelector
    _factions: TagListRow
    _units: TagListRow
    _id_prefix: IdPrefixRow
    _below_form_sep: Separator
    _form_submit_frame: Frame
    _submit_button: Button
    _clear_button: Button
    _status_var: StringVar
    _status_label: Label

Extra work? Yes.

Worth it for readability and being able to make changes later? Absolutely!

Billing Changes and Economic Viability

Effective June 1st, Microsoft changed the billing on Copilot to be usage-based instead of a subscription model. For all of these companies, they have been running these subscription plans at a loss to try and entice customers in. But now the costs of doing that are unacceptable and these AI projects have to show some ROI (Return On Investment). So they switched to usage-based billing to better push the actual costs of these systems onto the people using them.

And boy that blew up everything over night. Companies that had created token minimums and fired employees for not using LLMs enough started to realize “tokenmaxing” was suddenly too expensive. Upper management everywhere started freaking out and implementing panic policies to try and stop the bleeding.

I’m glad my company never implemented such policies (or if they did, my section was shielded from them). If the tokenmaxing mindset sounds crazy to you, the mindset was that LLMs were such a speed boost, if you weren’t using them fully you weren’t living up to your full efficiency. But those policies and leaderboards just led to people gaming the system to stay afloat. Designing automated reporting systems that would burn tokens inefficiently analyzing alerts, summarizing every email with an LLM, or just running nonsense queries to burn tokens.

Those policies also had a ulterior motive because many of the owners of these companies with tokenmaxing policies also had investments in LLM companies. So they are mandating that one company uses another product that makes them money because they can say “look at how many companies have subscriptions to our product, we are useful!”. Just financial shell games and elites using their influence to mandate against the wills of the employees and knowledge experts.

I won’t go into the long-term economic viability of these LLM companies, but if you are curious, Ed Zitron does great work on his website.

So suddenly you’re paying for the actual costs of these systems. And some things that were okay earlier, like mistakes and being overly verbose, become unacceptable when you have to pay for each word.

It’s like if you had an employee whose pay model was that you paid them for every word that they wrote on the company Slack, even if they were wrong or over-explained everything. And even though you kept telling them to cut it out, they kept doing it.

To extend the analogy further, we saw with the Claude code leak that their CLI tool has several feedback loops and self-checks where they just feed results back into the LLM again with a more fine-tuned prompt to try and get better results. That would mean you’re also paying the employee for every note they write to themselves on the company Slack. And you better believe that is the only form of working problems out that the employee knows.

It’s suddenly unacceptable and raises a lot of questions about whether these tools are actually good enough. Which I don’t believe they are. And the improvements (at least running the defaults) seem very slim and getting slimmer each time, despite increased training and hosting costs to run the new models.

Things I Would Still Use LLMs For

Okay, despite all the hate, LLMs can be useful.

If I need a quick Python script to interact with Kubernetes or AWS, I can get it generated quicker than writing it if I am very specific about what I want. Asking the LLM to figure out something I know is possible in a library is easier and faster than me doing it. If I want something to live long-term in a codebase though, I’m not going to blindly commit LLM-generated code. I will refactor it myself.

In a similar vein, if I would like to explore what is possible in a library, LLMs are a great way to start that journey (especially if you ask them to link to the real docs so you can follow up). I will spell out what I know about the library and then ask for other useful areas I might not know about. If I have a specific project in mind, describing the project also helps the LLM generate useful insights. It is very helpful and more personalized than diving into the docs of most libraries/tools.

Are they going to replace junior devs? I hope not, I’ve liked my junior devs a lot more.

Conclusion

I’m going back to work tomorrow and I’m giddy to see how this is all shaking out with our budgets. The LLM craze has been demoralizing me and making me depressed about the future of the industry and my place in it. But I’m feeling refreshed and confident in my skills. The path forward is investment in ourselves and our skills, not learning how to use LLMs better.

Afterword: Environmental Concerns

I tried to focus this post on the concerns with the tech and its effect on just the software engineering industry. But another unspoken problem with all of this is the huge amount of energy it consumes and as a result increases environmental damage. And the inefficiencies I mentioned earlier only compound that problem. Blockchain and now LLMs have completely eradicated any gains we might have pretended to have in the fight against Climate Change (let’s not get into greenwashing and why carbon credits are fake progress). The world is going to change and get unbearable in many parts of it. It will be harder and harder to cool electronics and keep the standard of technology that we enjoy today. And our industry is part of the blame. We spend so much compute on useless bullshit and are burning up the world with our inefficiencies. Our industry may help pave its own extinction not by birthing an AGI, but by burning the world so badly we can’t run data centers or personal computers. How long will all of your devices last you in a world with increasing blackouts and power instability? I encourage you, pick up books on helpful subjects, learn a hobby or skill that you can do away from the computer. I have found painting to be very rewarding, engaging with something physical and ending up with something I can continue to look at afterwards.

Tags: AI LLMs code programming tech