Some people get AI to nail a task in two sentences. Others go back and forth for a dozen rounds and still end up nowhere. I’ve been thinking about where that gap comes from for a while, and I’ve finally traced a line through it.
Communication hasn’t changed
The difference isn’t who read more prompt engineering guides. It’s whose thinking is clearer. Do you know what you want? Where the constraints are? Which information is load-bearing?
Language is just the carrier. If you can’t explain something clearly to a person, you won’t explain it clearly to AI either. But the flip side: talking to AI is surprisingly good thinking practice. It forces you to make vague ideas concrete. Colleagues fill in the gaps for you. AI doesn’t. It takes exactly what you give it, no more, no less.
So talking to AI is less about “giving commands” and more about thinking out loud.
Quality of context over quantity
Okay, so if clarity of thought is what matters, the next question is naturally: how do you package your thinking into something AI can digest efficiently? That’s what people call structured context — writing your requirements, constraints, and background into an organized document instead of chatting one sentence at a time.
But structure alone doesn’t guarantee quality. Two people can write roughly the same number of words in a spec. One writes “this API supports reading, writing, deleting, merging, splitting, encrypting, decrypting…” The other writes “use pymupdf, don’t install poppler, files over 10 pages must be read in chunks.” Same word count, but every sentence in the second version carves away wrong paths from AI’s decision space.
I’ve noticed some patterns. “Don’t use X” beats “you can use Y or Z” because constraints eliminate entire classes of wrong approaches while options just add branches. AI can figure out normal behavior on its own — tell it where things break. And giving reasons always beats giving instructions. With the reason, AI derives a hundred approaches. Without it, it copies your words verbatim.
The point is to help AI prune the paths it shouldn’t take, rather than just feeding it more information.
Language is the bottleneck
This is where something started bugging me. Even with context quality maxed out, it’s still language. It has to become tokens, go through the model’s internal computation, before reaching the space where AI actually “thinks.” Every layer has loss. Can you bypass that loss and let AI understand intent more directly?
A few approaches work today. The most intuitive is replacing natural language with code — instead of describing “sum the third column,” just hand over a code snippet. Zero ambiguity. The skill system we use now is an extension of this idea: behavior defined in code, language reduced to a trigger word. Another path is letting AI learn from your behavior. It watches what you accept, reject, modify, and gradually infers your preferences. You stop writing instructions and just approve. Approval takes far less bandwidth than creation.
But every method still compresses into a token sequence fed into a model. Hard architectural constraint. You can’t skip the language layer. You can only make it thinner.
KV cache: the closest thing to consciousness
Following the “make language thinner” thread, I realized something: if a piece of context gets cached, is it even language anymore?
A bit of technical background here. When an AI model reads your text, it converts each segment into a set of internal states called KV pairs. These states represent “the model’s understanding after digesting that text” — already different from the original words. Once cached, the model doesn’t need to re-read that text next time. It picks up directly from where it left off.
Think of it like working memory in the human brain:
| Human Brain | AI | |
|---|---|---|
| External memory | Notes in a notebook | Config files |
| Working memory | Understanding formed after reading notes | KV cache |
| Long-term memory | Intuition and experience | Model weights |
When you flip open your notebook, your brain converts the text into an “already understood” state. Everything you think after that builds on it. KV cache works the same way — no longer text, but the state left after text has been comprehended.
The difference is persistence. Your working memory stays until something displaces it. AI’s KV cache disappears when the conversation ends, unless a caching mechanism preserves it.
Designing your “consciousness prefix”
This raises a practical question: since KV cache is the closest thing to consciousness, can we keep it alive as long as possible?
Anthropic’s prompt caching offers one approach. If the beginning of your system instructions matches the previous request exactly, the internal computation for that segment loads from cache instead of being recalculated. Faster response, and that context is already “warmed up” at the computational level (on unlimited plans this doesn’t save money, but it saves speed).
The strategy is clear: put your most stable information at the front of system instructions, and change it as rarely as possible. I think of it as a layered architecture:
Layer 1 (ultra-stable, near-permanent cache)
→ Who you are, core preferences, communication style, decision framework
Layer 2 (moderately stable, updated weekly)
→ Frequently used tool summaries, project background
Layer 3 (dynamic, no cache expected)
→ Current task, conversation history
The thicker Layer 1 is, the better. Your “personality” gets baked into the cache. Each new session wakes up from a pre-computed state, skipping the “read document then understand” step. You haven’t touched the model itself, but in practice, your most important context is computationally as close to “part of the model” as you can get.
What I took away
Looking back, the whole thought process was a chain of “okay, but what’s underneath that.” How do you talk to AI well? Give good context. Does context quality vary? Yes, constraints beat descriptions. Can you skip the language layer? No, but you can thin it out. How thin? Down to KV cache — at that level it’s left the realm of language and become computational state. How do you keep it around? Design a stable prefix.
By the end I realized the question had long outgrown prompt tricks. It’s a systems design problem: how much time are you willing to invest in turning your thinking into structure, so AI already understands you before you open your mouth?
Telling AI what you want every time is O(n). Writing a skill is O(1). A good cache prefix is O(0) — no trigger needed, it’s already there.
Same principle as programming. Don’t repeat yourself. Abstract the recurring patterns. Let the system remember.
Only this time, what you’re abstracting isn’t code logic. It’s yourself.