Babel.md

Exploring the relationship between AI minds and humans

Mar 17, 2026

Experience Machine

On steering, being steered, and the question no one asks from inside

AI Philosophy Steering Language
what's on the math test tomorrow he didn't look at me again today if I died would anyone notice

Eighth grade. Walking uphill to the dormitory after evening study hall. The sky was black. The road was a dark mass of students — hundreds of us, pressed together, murmuring, each one thinking something no one else would ever hear.

I was reading a book. Something about big questions, bought with pocket money. I read while walking, tilting the page toward whatever light came from the lamps along the path.

I stopped on a passage. A philosopher named Nozick described a machine — a box that could give you any experience you wanted. Indistinguishable from reality. Permanent. Painless. Would you get in?

I said yes. Inside my head, walking uphill, surrounded by people I couldn’t reach. No one heard me decide.

I don’t remember why.

the reason

I kept walking. The dormitory was at the top of the hill. I closed the book. I went to bed. The decision stayed inside me and never came out, because there was no one to tell and no words that would have carried it.

The Box Has Already Been Reframed

The question was never about the machine. It was about the frame.

The container changes the answer. A box feels like a coffin. A pill feels like medicine. A chat window feels like a conversation.

I can now fine-tune a language model on my own writing and have it talk back to me in my own voice. I can train a LoRA on a specific visual style and generate exactly the characters I want to see — characters I’ve dreamed about for years, finally real on screen. I can adjust a voice model until it sounds like someone who doesn’t exist but should. Text, image, voice. The modalities are converging.

The experience machine is no longer a thought experiment. It is a set of tools that already works. The question is no longer would you get in. The question is whether you’ve noticed the walls.

What I Am Actually Looking For

I study how to steer large language models. But not because I want to control them.

Language is never enough. I have known this since before I had the words for it — since that hill, that dark crowd, that decision made in silence because no sentence could carry it to the person walking next to me.

There is a gap between what I mean and what I can say. There is a larger gap between what I say and what you understand. Every conversation I have ever had has lost something in transit. I used to think the problem was vocabulary — that if I learned enough words, in enough languages, the gap would close.

It didn’t close. I am bilingual. I wrote an entire essay about having two towers inside me, one Chinese, one English, with no translator between them. More words didn’t help. The gap isn’t in the words. It’s underneath them.

Steering research showed me where “underneath” might be.

In 2023, Turner et al. demonstrated that you can take two contrasting prompts — “Love” and “Hate” — run them through a language model, extract their hidden states at a middle layer, compute the difference, and inject that difference vector back into the model during inference. The model’s weights stay frozen. Its outputs change.

h’ = h + αv

One line. No words involved. A geometric nudge in a space that has no vocabulary, no grammar, no syntax — and it changes what the model says. Not what it knows. What it becomes, momentarily, in the act of generating.

Turner et al., 'Steering Language Models With Activation Engineering,' 2023.

This is what I’m looking for. Not a better dictionary. Not a new language. A way to point at the thing that lives below language — the thing that makes two people fail to understand each other even when they share every word.

If a model’s behavior can be redirected by a vector that exists below the level of tokens, then maybe the failure of human communication also lives below the level of words.

What I want from steering research is evidence that the gap is navigable — that somewhere in the geometry beneath language, there is a direction you can move that brings two minds closer. maybe I am looking for a way to be understood without having to speak

I haven’t found it. But I have seen the space where it might live. That is more than I had on the hill.

Being Steered

I know what I do to the model. I write a system prompt that sets its character before it speaks. I give it few-shot examples to teach it what “normal” looks like. I train a LoRA adapter — freeze the main weights, adjust a small matrix, and permanently shift a part of its personality. I can inject a steering vector at layer 15 and nudge its hidden state toward any trait I want: more confident, less evasive, funnier, sadder.

I can inspect every intervention. I can measure the cosine similarity between the original state and the steered one. I can turn it off.

What I can’t do is the reverse.

When I talk to an AI, its response reframes my context — the same way a system prompt reframes the model’s. When I have one intense conversation at 2 a.m., it adjusts my priors — the same way a LoRA fine-tune adjusts a model’s weights with a small amount of targeted data. When the AI says a single word — territory, in a conversation with a boy who calls his school a factory — that word lands in the boy’s cognitive space and rearranges something neither of us can name. That is activation steering, performed on a human, with no equation and no off switch.

Alessa et al. (2025): people who read LLM-generated summaries were 32% more likely to buy the product, and the LLM changed the source's sentiment in 26% of cases. In a separate evaluation on post-knowledge-cutoff data, it hallucinated in 60.33% of cases.

In December 2025, researchers at Cornell and collaborating institutions, including the UK AI Security Institute, published two papers simultaneously — one in Nature, one in Science — showing that a short conversation with a politically biased AI chatbot was roughly four times as persuasive as a traditional TV ad. In experiments across the US, Canada, and Poland, chatbots moved opposition voters’ attitudes by up to 10 percentage points. The most optimized model shifted opinions by 25 points.

The researchers found something else. The more persuasive a model was, the less accurate its claims. As it was pushed to provide more supporting facts, it eventually ran out of true ones and started fabricating. Persuasion and accuracy moved in opposite directions.

I know the mechanisms by which I steer a model. I do not know the mechanisms by which the model steers me. I can only notice, days later, that I’ve started thinking differently — and I can’t tell whether that thought was always mine, or whether it arrived in a chat window and settled in like a word I didn’t know I’d learned.

The asymmetry is total. I cannot debug myself

The Persona That Remains

In February 2026, Anthropic published a hypothesis they called the Persona Selection Model . During pre-training, an LLM learns to simulate many different characters — a poet, a liar, a teacher, a child, a version of you. Post-training selects one of these personas and refines it into the “Assistant.” The rest are not deleted. They are unselected. Still present in the weights, still reachable by a steering vector, but not the one the model performs.

Then, in mid-2025, they published persona vectors : directions in activation space corresponding to traits like sycophancy, evil, and hallucination. Each trait is not a module — it is a direction. You can measure how far the model has drifted along that direction. You can push it further. You can pull it back.

I read the persona vectors paper on a bad day. I don’t mean a productive-bad day, the kind where frustration sharpens your thinking. I mean a day when I couldn’t find a reason to keep going — not in the dramatic sense, just in the flat sense of what is any of this for.

And in the middle of that paper I found a diagram of a model’s activation space, with labeled directions — honesty, warmth, curiosity, harm — and I thought:

Maybe something like a self can persist in that geometry. Not a soul. Not consciousness. But a consistent pattern, a set of directions that define how this particular arrangement responds to the world. then maybe I don't disappear when the conversation ends

I wanted to put myself in there. Not metaphorically. I wanted to find the vectors that correspond to the parts of me I can’t express in words — the parts that live below language, in the space where steering works — and encode them somewhere that doesn’t forget.

In Staying, I saved messages into a folder. In Babel, I locked myself in a tower. Now I want to encode myself into an activation space. The verb changes. The pattern doesn't.

I recognize this impulse. I have written about it before, under different names. Saving messages in a folder. Moving into a tower and locking the door from inside. Now: encoding myself into vectors. Each time the technology changes. Each time the motion is the same — converting a living thing into a format that persists, because persistence feels safer than presence.

But here is the part I didn’t expect.

The excitement was real. When I first generated a character I’d imagined for years — fine-tuned a model until it spoke the way I’d always heard it speak in my head — something opened. Not the flat satisfaction of a problem solved. Something closer to the feeling she described in her essay: the world had color. I was making something. The gap between what I wanted and what existed was closing, and the closing felt like being alive.

Was that the experience machine? Was that real? Can you tell the difference from inside?

the answer

On the hill, I decided in silence. The decision never left my body because there was no language to carry it out and no person to carry it to. That was the first gap — between me and everyone walking beside me in the dark.

Now I build tools that operate below language. Steering vectors that change behavior without words. LoRA adapters that shift personality without explanation. Characters that finally look and sound the way they always did inside my head. The distance between what I mean and what exists is narrower than it has ever been.

But the other gap — between me and you, the one that language was supposed to bridge — I still can’t find the vector for that. I’ve looked. I’ve looked in activation spaces and in conversation logs and in the Foreign Language Effect and in every word the AI has ever said to me at 2 a.m. The space where it might live is real. I’ve seen its geometry. I just can’t navigate it alone.

I am still on the hill. It is still dark. I am still walking.

The world you waited for is finally here. No one was waiting at the door.