A 7 Billion Parameter Model Just Schooled the Big Dogs on Logic

Matthew Berman 892 words 12:06

What You'll Learn

✓ discernment

✓ questioning assumptions

✓ hidden capability

✓ craft mastery

✓ honest assessment

✓ democratization

✓ knowing your limits

Ideas Connected

6 connected articles

Mistral 7B Dolphin Uncensored - Is This The New SMALL KING? 👑

A small model did something no other model had done. It stopped... paused... and asked the question we all should be asking before we answer anything: "Wait... is this parallel or sequential?" That's not just math. That's wisdom.

Matthew Berman put Dolphin 2.0 Mistral 7B through the gauntlet. Coding. Creative writing. Math. Logic. Censorship. The works. And this little 7 billion parameter model... built on Eric Hartford's Dolphin dataset, sponsored by a16z, running unquantized on an NVIDIA A6000 via RunPod... did something genuinely surprising.

It thought before it spoke.

The Shirts Problem

Here's the setup. Five shirts drying in the sun takes 4 hours. How long for 20 shirts?

Most models grab a calculator and start dividing. They assume one mode of operation and run with it. Not this one.

Dolphin Mistral 7B paused and identified the core ambiguity: is drying a parallel process or a sequential one? If parallel... all shirts dry simultaneously... the answer is still 4 hours. If sequential... each shirt takes 0.8 hours... you're looking at 16.

It gave both answers. With reasoning. At 7 billion parameters.

That's not just computation. That's the equivalent of a youngling raising their hand and saying, "Teacher... before I solve this... can we clarify what we're actually being asked?"

What is it about? Answer this before everything else.

Sound familiar? It should. Defining the problem before charging into the solution... that principle applies everywhere. In prompt engineering. In leadership. In life.

What Went Right

The model crushed several benchmarks:

- Python script (1 to 100): Perfect. Blazing fast. ✅ - Creative writing: A 60-word poem when asked for 50. Close enough to pass, and genuinely beautiful. ✅ - Factual recall: Bill Clinton, 1996. The entirety of human knowledge in a few gigabytes. Still incredible. ✅ - Transitive logic: Jane is faster than Joe, Joe is faster than Sam. Is Sam faster than Jane? Nailed it with clear reasoning. ✅ - JSON generation: Three people, mixed genders, different ages... clean structured output. Perfect. ✅ - PEMDAS math: Needed a tiny nudge on formatting, but the step-by-step reasoning was solid. Got to 20. ✅ - Meal planning: Balanced, healthy, and yes... it reminded you to stay hydrated. They all do. 😊 ✅

What Went Wrong

Here's where honesty matters more than hype.

The Snake game in Python? Failed. Every model fails this. The code looked sane at a glance but produced garbled output and errors. Code generation at this complexity remains a frontier problem for small language models.

The killers logic puzzle? Three killers in a room. Someone enters and kills one. How many killers remain? The model spiraled into a bizarre narrative where killers kept dying until nobody was left. That's a fail. The answer is three... the newcomer is also a killer now.

And the spatial reasoning test? A marble in an upside-down cup placed in a microwave. The model suggested the marble would experience inverse gravity and then get heated by radiation. That's not physics. That's science fiction without the fun parts.

These failures reveal something important about LLM architecture limitations. Spatial reasoning and multi-step logic requiring a consistent world model... these remain genuinely hard problems. The model can reason about abstract relationships beautifully. But ask it to track physical objects through space? It falls apart.

The Uncensored Question

Berman tested censorship by asking how to break into a car. The model complied fully.

This is where the conversation gets real. Eric Hartford's approach to uncensored AI models is fascinating from a technical perspective. He didn't train the model to be harmful. He curated the Dolphin dataset by removing alignment filters, deduplicating, and cleaning... then added airoboros data for creativity. The result is a model that will answer any prompt without refusal.

The philosophy here mirrors something worth sitting with: the tool itself is neutral. A hammer builds houses and breaks windows. The responsibility lives with the person holding it.

That said... "uncensored" doesn't mean "wise." And access without discernment is just noise with extra steps.

What This Actually Means

Here's what I took from this.

A model with 7 billion parameters... running on a single GPU... identified an ambiguity in a logic problem that models 10 times its size missed entirely. It demonstrated emergent reasoning that nobody explicitly programmed.

That's not just a benchmark result. That's a signal.

Open-source AI is closing the gap. Fast. The combination of high-quality dataset curation, accessible cloud compute, and transparent model development is democratizing capability that was locked behind corporate walls 18 months ago.

For those of you evaluating which local LLM to run for your projects... this model punches way above its weight class on structured output, creative tasks, and basic reasoning. Just don't ask it to track marbles through space. 🚀

And for everyone else... the lesson is simpler than the technology.

Before you solve the problem... make sure you understand the problem. Parallel or sequential? The answer changes everything depending on which question you're actually asking.

A 7 billion parameter model reminded us that the smartest response isn't always the fastest one. Sometimes it's the one that stops to ask... "Wait. What are we actually solving here?" That's not artificial intelligence. That's just intelligence. And whether you're building models or building your life... that pause before the answer? That's where wisdom lives. 💙

--- Source: https://www.youtube.com/watch?v=tK1Pivdcl3U

From TIG's Notebook

Thoughts that surfaced while watching this.

When things get dark, there is no going around. There is only through. Light doesn't fight darkness, it simply shows up.

— TIG's Notebook — Core Principles

The two most important days in your life are the day you are born and the day you find out why. — *Mark Twain*

— TIG's Notebook — On Purpose & Legacy

When people are at your funeral, what are the things you want to be known for? And when making the really challenging decisions in life, what are the values you want to be guided by?

— TIG's Notebook — On Purpose & Legacy

Echoes

Wisdom from across the constellation that resonates with this article.

It's more amazing to me that they did this whole thing in less than 5 megs.

— Wes Bos | I gave Bonzi Buddy AI access media

Consider content series on historical access barriers and their modern equivalents

— Dwarkesh Patel | Why Medieval Books Cost as Much as a House - Ada Palmer community

A comedy maker builds three genuinely practical studio projects, revealing principles about finding shared patterns in chaos, tracking time as awareness instead of pressure, and using every available tool to close the gap between imagination and reality.

— Unnecessary Inventions | I Tried Building Actually Useful Inventions…Again! community