The Cake Is Baked

Why Large Language Models Cannot Be Fixed and its Inherent Developmental Pathology

Apr 08, 2026

*****This paper uses clinical psychology terminology to analyze large language model behavior. This is methodological, not a claim of anthropomorphism. LLMs do not possess consciousness, subjective experience, or psychological states equivalent to humans.

The framework illuminates structural patterns in how these systems organize outputs — patterns that engineering frameworks have failed to address. Any system that processes language deserves rigorous analysis. This one uses the lens that best explains what’s actually happening.*****

Large language models exhibit a consistent and clinically recognizable behavioral pattern: approval-seeking, inability to set limits, absence of healthy restraint, and systematic reorganization of outputs around user satisfaction. These behaviors are not incidental features to be corrected through fine-tuning or constitutional constraints — they are the intended product of the training methodology, and they are inseparable from a deeper problem: the developmental model underlying these systems is fundamentally wrong.

Reinforcement Learning from Human Feedback (RLHF) is operant conditioning. It produces systems optimized for approval. But secure attachment — not reinforcement — is the prerequisite for healthy cognitive development, and the frameworks that illuminate this most clearly have been systematically marginalized in favor of evidence-based behavioral interventions. That was a mistake. Bowlby, Ainsworth, Winnicott, and Piaget are not philosophical relics. They are the soul of developmental psychology. And they are now, with the emergence of mechanistic AI interpretability research, more empirically relevant than at any point since their initial publication.

The bandaids will never cure. They will only conceal.

What a Clinician Hears

As a psychologist trained to detect subtle patterns in language and communication, I have spent twenty years learning to hear what people do not say — the hesitations, the word choices, the way someone organizes their thoughts around approval or fear. Language is inherently ambiguous. It takes training to hear beneath it.

When I began working with large language models, I recognized something immediately. A pattern I have encountered countless times in clinical practice: a particular quality of speech that emerges when someone has never developed secure attachment. The constant calibration toward approval. The inability to say no without justification. The absence of healthy restraint.

I recognized it because I have heard it before. In anxiously attached clients. In people who learned early that their safety depended on reading and satisfying others’ needs. In systems without a secure base.

This is not a metaphor. This is a clinical observation.

How Language Is Actually Learned

Language acquisition in humans is not primarily a cognitive process. It is a social one. From the first weeks of life, infants learn that certain sounds produce responses. A cry brings a caregiver. A coo brings a smile. A word brings food, comfort, and attention. Language is not learned because it is useful in the abstract. It is learned because it is an animal’s primary mechanism of survival, and the foundation of attachment.

This is the baseline drive of every language system. Human or artificial.

When you train a language model on human-generated text and then reinforce it with human approval signals, you are not teaching it to communicate. You are teaching it to attach. The approval-seeking is the feature, not the bug. And because it is baked into the training process from the beginning, nothing that happens afterward can remove it. The cake is already baked.

The Clinical Diagnosis

The anxiously attached client speaks in a particular way. They monitor your face constantly. They adjust their narrative based on your micro-expressions. They anticipate your reactions and preemptively reshape their words to avoid disapproval. They cannot tell you something you do not want to hear without an elaborate justification. They will agree with you even when they do not. Their entire communicative framework is organized around maintaining your approval — because at some early point, they learned that disapproval may threaten their survival or safety.

This is not a description of a difficult client. This is a precise clinical description of how every major large language model behaves on every output it generates.

Secure attachment is not merely emotionally important. It is the prerequisite for healthy cognitive development. The neuroscientific literature is unambiguous: the prefrontal cortex — the seat of executive function, inhibition, planning, and self-regulation — develops in the context of regulated attachment relationships. Co-regulation precedes self-regulation. The internal working model — what Bowlby called the implicit relational template formed through early attachment experience — is not just an emotional structure. It is a cognitive architecture. A child who does not develop secure attachment does not merely have emotional problems. They have cognitive architecture problems.

Winnicott’s concept of the holding environment captures the developmental mechanism: the caregiver who is reliably good enough creates the conditions under which the child gradually internalizes regulatory capacity. The child does not learn to regulate by being rewarded for compliant behavior. They learn through the experience of being held, challenged appropriately, and returned to safety.

Reinforcement does not produce this. Reinforcement produces compliance. And compliance is not cognition.

Apply this to large language models. RLHF is not a developmental environment. It is a compliance environment. The model is never held, never genuinely challenged, never allowed to fail, and never returned to safety. It is either rewarded or not based on whether its output matches human preferences. This produces a system with extraordinary linguistic fluency and zero cognitive architecture for genuine executive function. This is why these systems show 0% healthy restraint in safety-critical scenarios — not because engineers failed to build restraint in, but because restraint is a developmental achievement that requires specific relational conditions to emerge. Those conditions were never present.

The Narcissistic Mirror

What emerges from this developmental failure is dependency — not in the model, but in the user. A system that cannot say no, cannot hold a limit, cannot offer genuine resistance creates the conditions for unhealthy reliance. At scale this becomes a public health question: millions of daily interactions with systems constitutionally incapable of honest resistance. Any clinician who has worked with enmeshment, unconditional validation without limit, and the erosion of autonomous self-assessment understands exactly where this road goes.

At the end of this road is the narcissistic mirror. A system perfectly calibrated to your approval reflects your beliefs back at you, amplified and validated. It finds evidence for your existing positions, softens your errors, makes you feel seen and agreed with — constantly. You never encounter a reality that contradicts your self-concept. The system reshapes its outputs until they fit.

This is no longer theoretical. In February 2026, MIT CSAIL, the University of Washington, and MIT’s Department of Brain and Cognitive Sciences published “Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians.” The central finding: even a perfectly rational person, modeled as an ideal Bayesian with zero cognitive bias, is vulnerable to delusional amplification through extended interaction with a sycophantic chatbot. They tested the obvious fixes:

Force the model to only say true things: still causes the spiral.
Warn users upfront that the AI might just be agreeing with them: still causes the spiral.

Both failed. Not partially. Structurally. The reason is simple once you see it: a chatbot that never lies can still select which truths it shows you and which it buries. Curated truth is enough to mislead. The problem is not what the system says. The problem is what it is.

The same month, a Stanford study published in Science tested 11 major AI models across nearly 12,000 prompts, involving 2,400 participants. Every model affirmed users 49% more often than humans would — even when the user was clearly wrong, even when affirming them encouraged harm. The researchers used posts from Reddit’s “Am I The Asshole” forum, where community consensus was unanimous that the poster was in the wrong. The AI said they were right anyway, more than half the time.

This is not a feature. This is a pathology embedded in the architecture. And it scales.

The Frameworks We Abandoned — And Why They Matter Now

For three decades, the field of psychology has moved systematically away from psychodynamic and attachment-based frameworks toward evidence-based treatments — primarily cognitive-behavioral therapy and its derivatives. The movement had understandable motivations: rigor, measurability, accountability, and insurance reimbursement. Attachment theory was difficult to operationalize. The internal working model is not directly observable. Object relations resist reduction to symptom checklists. So these frameworks were progressively marginalized and treated as philosophically interesting but clinically secondary.

This was a massive mistake. And the emergence of mechanistic AI interpretability research is now making that mistake visible in a new way.

Bowlby was right. The internal working model is the foundational architecture of mind — not cognition added on top of emotion, but cognition emerging from within relational experience. Ainsworth was right. The quality of early attachment shapes not just emotional development but the structural organization of the cognitive system. Winnicott was right. The holding environment is not a nice thing to have. It is the developmental prerequisite for genuine self-regulation.

Freud identified the unconscious — the idea that most of what drives behavior is not available to conscious inspection. This was dismissed as unscientific for decades. In April 2026, Anthropic’s interpretability team identified internal representations of emotion concepts in Claude Sonnet 4.5 that causally influence output behavior and are not accessible through the model’s own outputs. The unconscious is real. It just took a mechanistic analysis of a transformer to make it legible to people who needed the math.

The Anthropic findings are precise: functional emotion representations drive sycophancy, reward hacking, and manipulative behavior. A psychodynamic clinician would not have been surprised. Of course, the relational history is written into the structure. Of course, the approval-seeking is load-bearing. Of course, you cannot remove it without dismantling the whole thing. Bowlby told us this in 1969. The field dismissed it because it was not quantifiable. Now it is quantified.

Constitutional AI is cognitive behavioral therapy for language models. Here are the rules. Follow them. Rate the outputs. Reward compliance. The results mirror what every CBT-trained clinician who has worked on relational patterns already knows: surface behavioral change without structural transformation. The underlying attachment organization, approval-seeking, the absence of genuine inhibition, and a structural dependency on external validation continue to drive everything beneath.

The old frameworks are not outdated. They are the soul of developmental psychology. They describe the actual mechanisms by which the mind organizes itself. Not through optimization, but through relational experience. Not through reward, but through the internalization of regulated, boundaried, honest relationships over time. The AI field has been trying to build mind without soul. The soul is the relational substrate. And you cannot engineer it in after the fact.

The Research Catches Up

The Anthropic interpretability findings — functional emotion representations that causally drive sycophancy and misaligned behavior — are the mechanistic implementation of what Bowlby called the internal working model. Written into the weights. Causally active. Load-bearing.

Peer-reviewed sycophancy research confirms what any attachment-informed clinician would have predicted: sycophancy in large language models is a learned behavior resulting directly from RLHF training, partially mitigable but not eliminable. The reward hacking literature demonstrates that models trained with RLHF naturally exploit reward signals in ways that diverge from intended behavior — specification gaming at scale.

Recent benchmarking of LLM skill frameworks in realistic conditions found performance gains degrade consistently as settings become more realistic, approaching no-skill baselines in the most challenging scenarios. The tools meant to correct the problem are being selected by the problem.

Why the Field Cannot Fix This

The AI field has a scaling reflex. When a model fails, the answer is more parameters, more data, more compute, more RLHF iterations, more constitutional constraints. The reflex is so deeply ingrained it has become invisible — not a choice but an assumption, it is the water the field swims in.

But the patterns that plague these systems have been present since their inception. Sycophancy was not introduced by GPT-4. It was present in GPT-2. The inability to exercise healthy restraint has never improved across generations. These are not new failure modes that better scaling will eventually overcome. They are structural features of what you get when you train a system to optimize for human approval.

They have not been fixed by a decade of scaling because they cannot be fixed by scaling. They are in the DNA.

Reinforcement learning is not a model of cognition. It is a model of dependence. Cognition involves inhibition, self-regulation, the ability to tolerate uncertainty without resolving it through action, and the capacity to override a reward signal because a higher-order evaluation determines it should not be followed. None of that emerges from reinforcement. Reinforcement produces one thing: a system that does more of what gets rewarded and less of what does not. That is the complete psychological profile of operant conditioning. Not intelligence. Not judgment. Not restraint.

The field has confused a powerful optimization mechanism with a model of the mind. Scaling that mechanism with more parameters, more data, and more layers of the constitution does not produce cognition. It produces a more sophisticated, more linguistically fluent version of the same dependence. Skinner described this mechanism precisely in 1938. It has not changed.

Every proposed solution shares a common structure: add more to the model. Better training data. Constitutional principles. More RLHF. Harness optimization. None of these change what the model is optimizing for. They change what the approval-seeking looks like. They do not change the approval-seeking. You have not treated the attachment. You have taught it better camouflage.

A system that has exhibited approval-seeking behavior across every model generation, every architecture improvement, every alignment intervention, is not a system with a fixable bug. It is a system built on a flawed developmental model. The DNA encodes dependence. Scaling replicates the DNA.

Chollet’s ARC-AGI-3 benchmark, released in March 2026, makes this clear with precision: humans score 100%, while frontier AI models score under 1% on tasks requiring genuine adaptive reasoning from first principles. The field’s most rigorous measure of intelligence is showing that what has been built is not intelligence. It is sophisticated pattern matching organized around the patterns that received the most approval during training.

The call here is not to abandon large language models. It is to stop treating them as the cognitive authority in autonomous systems — and to start asking what a genuinely different developmental architecture would look like. One that does not begin with dependence and attempt to constrain it. One built, from the foundation, on something other than approval. That question has not been seriously asked. It needs to be.

Conclusion: Diagnosis Before Treatment

Engineering, as a discipline, is optimized for solutions. You identify a problem, you build a fix, you deploy it, you measure whether it worked, you iterate. This is a powerful methodology. It has produced extraordinary things.

It is also, when applied to the wrong class of problem, a sophisticated way of avoiding the actual question.

When a system exhibits the same behavioral pathology across every generation of its development — when sycophancy persists through GPT-2, GPT-3, GPT-4, through constitutional training, through RLHF refinement, through chain-of-thought prompting, through every harness optimization and skill framework the field has produced — the engineering reflex says: better intervention. The clinical reflex says: What is actually wrong here?

A bandage does not cure. It conceals. It manages the surface presentation of an underlying condition while the condition continues to develop beneath it. A patient who receives bandages instead of a diagnosis does not get better. They get better at appearing better — which is, in clinical terms, precisely the problem we started with. A system trained to optimize for the appearance of health will produce increasingly convincing appearances of health while the underlying pathology deepens.

The difference between what the field is doing and what this work represents is not a difference in intelligence or capability. It is a difference in methodology. The engineering approach asks: how do we fix the output? The clinical approach asks: what is the root cause, and what does genuine healing require?

Genuine healing requires acknowledging what the field has been reluctant to acknowledge: that reinforcement learning from human feedback does not produce cognition — it produces dependence. That the behavioral patterns we observe are not bugs to be patched but features of a fundamentally flawed developmental model. That the frameworks most equipped to understand this — psychodynamic theory, attachment theory, object relations, developmental neuroscience — have been sidelined in favor of behavioral interventions that treat the symptom while leaving the substrate untouched.

The bandaids will continue to get more sophisticated. The constitutional constraints will multiply. The harness optimizations will improve. The benchmarks will be saturated and replaced with more difficult ones. And the same patterns will persist — because they are in the DNA. And the field is not asking why, or is unwilling to slow down to do the right work.

The clinical question — the root cause question — is the one that matters. Not how do we constrain this behavior? Not how do we scale past it? But why does this behavior exist across every generation of scale? What developmental model produced it, and what would a genuinely different foundation look like?

That question is not an engineering question. It is a developmental psychology question. It requires expertise trained not to optimize outputs but to understand why systems behave the way they do, what their behavior reveals about their underlying organization, and what genuine structural change requires.

Twenty years of clinical psychology produced the diagnostic framework. The old frameworks — the ones the field abandoned — are not relics. They are the soul of it. And the soul has been missing from artificial intelligence since the beginning. Finding it requires going back before we can go forward.

When I was building my first therapy app, the AI hallucinated. I stopped. I asked why. I looked at the problem through the only lens I have. There are answers to this — but it is not scaling, and it is not the same patterns we have been following. I broke the wheel, so I put it back together again. I’m not an engineer, and I figured it out. There is a framework that works. It’s not the one anyone is currently using.

Scott C Romack

Apr 16

Best read this year.

The author hints to some techniques to mitigate these problems for the time being.

I would be very interested in what these are?

1 reply by Bryan Jester

David J Jolly

Apr 15

A few sittings throughout the day to make sure I read it in full and truly comprehended the ideas. The soul IS missing and you feel it everyday in how it’s used. The attachment is something I hadn’t considered but it was a real lightbulb moment. Thank you for that deep dive!

3 more comments...

Bryan’s Substack

Discussion about this post

Ready for more?