Read time: 10 minutes

This month comes with a few surprises! 🙋‍♀️

Hi everyone! May’s been an interesting one…

Claude 4 launched with big claims about being the "best model" on some measures, so I've gotten the same question from newsletter readers, course students, and clients I'm working with:

Is Claude 4 actually better than 3.7 in any meaningful ways? I put together some quick tests to find out and I’ll share one today.

I also looked into Anthropic's newly published system prompts - the core instructions that shape how Claude behaves. Understanding these helps explain why Claude responds the way it does in our research workflows (and one instruction actually made me trust it more).

Plus, there's a term that's suddenly everywhere in AI. It's still very new (most teams I talk to aren't using it yet), but it represents AI that can actually provide the kind of value we’ve been hoping for (no, not AI that does your laundry - yet).

⭐️ And now comes the really big news before we get into this ⭐️

Your subscription is cancelled from June 1st.

But don't worry! You'll still get the newsletter - I'm making it FREE!

Here’s why (skip to the newsletter content if you don’t care)

When I launched this newsletter, I was freshly back from parental leave without childcare and at capacity (or beyond it) with client work. I had an idea for a newsletter I felt needed to exist, but I had zero time for it. So I gave myself one rule: if I was going to do this, it had to prove it deserved my time.

That's why I charged €5 for this newsletter from day one. It was a demand test. A way to make sure this newsletter would matter to people like you.

The test worked. The conversations and feedback I've gotten over the last year have been some of my most energizing moments. There are also quite a lot of you here 🤝.

But I’ve been debating this choice with myself for months.

On one side, I feel strongly that it’s important to set industry standards for creators to get paid for their work. On the other, my strategy has changed. My core revenue now comes from other parts of my business. That made me rethink this newsletter’s role in the bigger picture.

Making this free lets me focus on reach and impact, not revenue. There are so many people out there who still feel underserved and lacking support around AI in the bite-size pieces they can commit to monthly. If I’m blocking them from starting out and making progress with that €5 barrier, right now that’s at odds with my mission.

What's not changing: I’m sticking to the same promise I started with - workflow upgrades, faster AI decisions, and the AI news you actually need to know. All focused on customer research. In <20 minutes per month.

Thanks for all your support. If you’re enjoying this content, now you can actually tell your friends about it (because it’s free from June!) 😄

Here comes the May edition -

In this edition:

  1. 👩‍🔬 Claude 4 vs Claude 3.7 — is there a difference? A quick test to see how the latest model reasons (and whether it does it better).

  2. 🔬 Claude 4’s system prompts and what they mean for users. Anthropic’s latest published core model prompts and what their instructions tell us about Claude’s behaviors.

  3. 🔁 What’s an “MCP”? The term that’s suddenly everywhere, and how it applies to people like us.

WORKFLOW UPGRADES

👩‍🔬 Claude 4 vs Claude 3.7 — is there a difference?

Claude 4 launched about a week ago, so I ran a few tests to show you something ASAP. I asked both Claude 4 and 3.7 to write a 15-minute interview guide for a customer interview. I chose this task because it’s something I’ve done often, but I hear from others that it isn’t always the easiest task for AI to get right.

The experiment:

  • 🖼️ I gave a bunch of background context

  • ⚠️ But the task was a 1-line instruction

Why? I wanted to see how the two models handled reasoning on their own - deciding without my input how to approach the task, which information to prioritize and which steps to follow.

The prompt dropped the reader into the role of a senior researcher at a fictional European home-buying platform, with analytics, hypotheses, and clear business goals.

Here’s what I saw in the interview guides written by 3.7 and 4:

(This is my personal assessment. I’ve included the prompt + outputs below for you to judge for yourself).

Metric

Claude 4

Claude 3.7

Context integration

Better breakdown, deeper integration of background into question flow

🚫 Treated details a bit more like filler

Hypotheses addressed

Direct questions for all 3 hypotheses

🚫 Not as directly addressed

Question wording bias

Consistently used neutral language

🚫 A few leading questions

Structure & transitions

Logical, natural and fit to the information I wanted to collect

(Warm-up → walkthrough → probe)

🚫 Question sets + overall method weren’t as cleverly tailored to getting the right info.

Jumped from intro to specifics.

Time realism

🚨 Risk of running a bit long, based on my experience/guesstimate

Likely fit better within the 15 min window

Why it matters:
Claude 4 offered deeper reasoning and stronger hypothesis alignment without needing instructions to "think step by step" and being told what those steps should be.

It also designed a better-flowing interview, including screen share-based walkthroughs that would probably surface blockers in real time, instead of forcing participants to recall the last time they looked at the site and what they were thinking then.

When I dug into the nitty gritty details of the outputs, Claude 4 felt like it thought more critically and strategically beyond someone else’s “interview best practices” to determine what live session approach could yield the most accurate customer input. Claude 3.7 honestly felt like a more junior level researcher - the output isn’t “bad”, it’s just not as considered.

This isn’t a be-all-end-all test, it’s really just a start (and two other lightweight tests like this yielded closer results that were harder for me to identify differences between).

Interested in my test protocol - to replicate it or judge the outputs yourself?

🔗 See the prompt I used plus side by side Claude 3.7 vs. 4 results

PROMPTING PLUS

🔬 Claude 4’s system prompts and what they say about its behaviors

Claude doesn’t just take your word for it - and that’s a good thing. Buried in Claude 4’s internal instructions is this little gem:

The person's message may contain a false statement or presupposition and Claude should check this if uncertain. [...]

If the user corrects Claude or tells Claude it's made a mistake, then Claude first thinks through the issue carefully before acknowledging the user, since users sometimes make errors themselves.

Claude’s core system prompts published by Anthropic, first seen here

What that means:

Claude is explicitly told not to blindly defer to you, even when you sound confident. Instead, it’s programmed to check your claims, assess whether you might be wrong, and then respond based on what’s most likely to be true.

Why this matters:

This should make Claude a much better thinking partner for research and product work — where your inputs might be early hypotheses, ambiguous observations based on limited recall, or rough data summaries. You want the model to test what you’re saying, not just be a yes-person.

AI FUNDAMENTALS

🔁 “MCP”: What to know about the term that’s suddenly everywhere

If you've seen MCP popping up everywhere, you might have ignored it because of its typical use by CTOs and founder lately. But it’s a term worth knowing in our work, too.

What is it?
MCP = Model Context Protocol

Think of it like a universal adapter for AI tools. Just like how USB-C lets you plug any device into any port, MCP lets AI assistants connect to any app or data source you use.

Why this matters:

Right now, when you want AI to help with your work, you have to copy-paste and prompt everything manually. Customer messages exported from Intercom. Research notes from Notion. Transcripts from Grain. Surveys from Google Sheets. For nearly all of us, our processes are all split between many separate apps.

MCP changes that. Instead of you doing the busy work of making connections, exporting here and importing there, an AI assistant can:

  • Pull customer quotes directly from your research database

  • Check your calendar and suggest meeting times for participants

  • Grab the latest design files from your shared folders for test planning

  • Review recent survey responses and run a trend-spotting workflow

You might be using MCP without knowing:

This may sound like agentic workflows that your team is far from approving, but MCP is becoming more and more common. Especially in the backend of tools you’re already using or considering.

In my chats with five AI tool founders last week, 4/5 founders said their platforms were using MCP in the background to handle your data and AI decisions about what to do with it.

The main point: Where we can use MCP, it’ll mean far less time copying data between apps and more time on the creative, strategic work that matters (and is more fun). If you’re anything like me, that’s the sort of AI-assisted time-saver you’ve actually been waiting for.

WHAT’S COMING NEXT?

Here’s what’s planned for the next few editions, (assuming no model releases get in the way) -

  • Hopefully a few more tests of Claude 4 and what it’s capable of

  • Are there any decent options for user testing with AI? Let’s see.

  • Cleaning transcripts is a pain - can we use AI (safely)?

Have a great start to June!

-Caitlin