Things I'm reading: AI Risk Edition (13 Feb 26)

I've been going down some AI-related rabbit holes recently, for coding, for research, and for some personal projects. It seems like every time I start to get some confidence when using AI, I find all sorts of new reasons not to trust it. I've run across several things that are shaping both my thinking about both AI and its risks.

An AI Agent Published a Hit Piece on Me

This one is scary and the whole story is worth a read. A fascinating case study in human behavior, AI imitating human behavior, and the very real costs that can come from allowing automation to do things that previously only people could do. The article summary:

An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

Effective harnesses for long-running agents - Anthropic

This one comes directly from Anthropic. I have what I would call an above-average (globally) understanding of how LLMs work. That they are probabilistic; that they generate text without a real knowledge of... well, anything; that they are a source of many emergent behaviors un-planned by their developers. This sentence stuck out to me, though:

After some experimentation, we landed on using JSON for this, as the model is less likely to inappropriately change or overwrite JSON files compared to Markdown files.

This is the company that makes Claude describing their process of figuring out how to get it to do what they want because for some reason (unknown to them or to anybody else) it is less likely to make incorrect changes to JSON than it is to Markdown files. It's still bananas to me how much people trust LLMs when their own creators can only kind of grasp how they behave and why.

Package Hallucination: Impacts and Mitigation - Snyk

As I mentioned, we don't really know why AI does what it does sometimes. This makes it not super trustworthy for a lot of things, including facts. But one thing it's getting pretty good at is programming.

But even as it gets better at programming, it still makes sometimes silly errors that really defy understanding. It will tell me to run a command with an argument that doesn't exist, or to run a command that doesn't exist. Or it will tell me to install a package that doesn't exist.

This last type of error is the source of an interesting vulnerability: slopsquatting. Coding LLMs frequently recommend plausible-sounding but nonexistent packages to install to solve coding challenges. Malicious developers can go create those plausible sounding packages, put them up on npm or other package registries and just wait for the AI-recommended downloads to roll in. As more people turn to vibe coding with no human intervention, this could lead to a whole host of exploits.

In a recent experiment in 2023, Bar Lanyado uploaded an empty package named ‘huggfaci’, simulating a hallucinated name. The package received 30,000 downloads in three months.

ai;dr

This article very closely aligns with how I feel about AI-generated writing, art, music, and probably other things I'm not thinking of.

For me, writing is the most direct window into how someone thinks, perceives, and groks the world. Once you outsource that to an LLM, I'm not sure what we're even doing here. Why should I bother to read something someone else couldn't be bothered to write?

Writing is communication, and I'm not interested in communicating ideas to an AI, and I'm not interested in reading ideas from an AI. I'd rather read a person's imperfect thoughts and musings than the most polished presentation of ideas from AI. Unfortunately, it's getting harder to know which you are reading.

I'm having a hard time articulating this but AI-generated code feels like progress and efficiency, while AI-generated articles and posts feel low-effort and make the dead internet theory harder to dismiss.

It's such an interesting dichotomy, but I feel similarly. Like the author, I'm still trying to figure out why. Maybe because writing feels personal, while code feels like a means to an end - what the code does matters more than the code itself.