AI x Customer Research
Posts
AI x Customer Research - June '25

AI x Customer Research - June '25

Preventing AI's basic math errors, fast data cleaning, a viral study, a meta-prompting addition worth thinking about, and more...

Caitlin Sullivan
June 30, 2025

Read time: 16 minutes

Hey all!

It’s the first month that this newsletter is FREE. 🥳

But that doesn’t mean this edition is suddenly in “lite” mode - there’s a lot to talk about in June.

In the past month, alongside running my AI Analysis course again, I’ve run workshops with teams like Canva and Vinted 🙇‍♀️ And a few common problems keep showing up...

Despite all the diversity in my client teams—from geolocation and languages used, to tools and democratization levels—there was a common thread:

LLMs make silly math mistakes and no one knows why or how to fix them.

And cleaning data is a giant blocker to AI adoption in research.

I spent more time than expected this month helping teams handle those small-but-hugely-impactful topics, and wanted to share a few quick solutions with you, too.

Plus, a new instruction kept showing up in ChatGPT-generated prompts, and a study went viral. The study asked: Is using AI making us stupid? It seems a little like it is, but I’ll share what else the study says, and how I’ve always thought about maintaining my independent intelligence while being a heavy AI user.

Let’s jump in:

In this edition:

🔢 Why Do LLMs Keep Failing at Simple Math? Here’s why math errors in quantitative tasks happen and how to fix them.
🧹 Using AI to Clean Your Data: One easy way to use AI to speed up a time-consuming prep step for analysis.
🧾 “Do Not Print Your Reasoning” - A new instruction popping up everywhere in my meta prompting. Should we leave it in the prompts?
💔 Is AI Actually Making Us Dumber? What we can take from a frequently mentioned study, and how I try to protect my brainpower.
📰 AI News: 🇪🇺 Nvidia & Perplexity are building “Sovereign AI” in Europe.

WORKFLOW UPGRADES

🔢 Why do LLMs keep failing at simple math? Here’s what’s happening and how to fix it

The issue:
It’s alarmingly common: you ask ChatGPT (or Claude, or Gemini) to do some basic quantitative analysis, and it spits out wildly different math results for the same problem.

Here’s an example: One of my June course participants was using my synthetic survey data and kept getting weird calculations -

The reality: 4/10 users who selected “Sleep issues” as primary reason for downloading/using the Flow app also upgraded to the Premium tier from Free.
Claude’s first calculation: 33% of “Sleep issues”-selectors upgraded
Claude’s second calculation: 62% of “Sleep issues”-selectors upgraded

What is going on here?

Why this happens:

The AI makes hidden assumptions about how to count responses.
There’s no “show your work” step, so you can’t check its math.
Calculations are often done in plain text (not code), so there’s no reliable logic.

How to get accurate results:

Force it to show its work
(Still not foolproof - AI may “show” you steps but still fudge the math.)
Prompt:
“Before calculating percentages, first list out each respondent ID who mentioned this goal, then list which of those respondent IDs upgraded to premium. Show your counting step-by-step.”

—
Force the AI to use code/Python:
Best solution: Calculations done via code are verifiable and reliable - way better than fuzzy text math.

Prompt:
“Use Python or code for all calculations, groupings, and comparisons. Output the code and the final result.”

〰️

🧹 Using AI to Clean Your Data

You can use any LLM to “pre-inspect” transcripts or survey exports before analysis - saving a ton of time and reducing errors. While this isn’t the best solution for everyone, most people I know and have worked with aren’t using AI for any part of cleaning. If you’re open to it, here’s a simple place to start without completely handing over this critical task -

Prompts to get started:

Assess data usability:
“You are a critical research assistant. Review this transcript/survey file. Analyze and list: (a) all data quality issues, (b) barriers to use (e.g., broken language, missed statements, unclear or missing speaker labels, likely inaccuracies, etc.), and (c) anything that would confuse an AI or human analysts in an analysis process.”
Find the highest-impact cleaning steps:
“From your analysis, which three cleaning steps would save the most time and make this file easier to analyze accurately?

Explain why you chose those steps. Be specific and brief.”

Using prompts like this on your raw data can help you identify where the biggest problems are in formatting, broken content and more, so you can clean whatever has the most impact. You don’t always have to clean data perfectly to get much better results with AI.

Tip:
Always ask the AI to explain its reasoning behind each cleaning step it suggests—so you can spot anything it misses (or overthinks).

PROMPTING PLUS

🧾 “Do Not Print Your Reasoning”: Why Is ChatGPT Hiding Its Thought Process?

If you’ve been writing prompts or using “meta prompting” (asking your LLM to improve or generate prompts for you), you may have noticed a new instruction cropping up:

“Do not print your reasoning.”

This statement was coming up in every revised prompt I asked ChatGPT to write for me lately (in GPT 4o, 4.1 and o3). But why is this suddenly the default? And when should you use it (or not)?

What’s actually happening?

By default, LLMs “think aloud.”
If you ask for a summary, a plan, or an answer, you’ll often get pages of step-by-step reasoning.
This burns tokens (costs more) and makes results longer/slower.
The new trend is to “hide” the internal thought process, but still have the model do it internally - just not share it with you.

It seems like ChatGPT is now encouraging users to use prompts that deliver results without the reasoning, possibly because the reasoning models intend to show you (some of) their reasoning by default in expandable sections instead, like this -

An example of the reasoning - which often isn’t enough for me to know if the LLM did the task correctly and thoroughly or not.

Should you tell your LLM to skip the reasoning in outputs?

The problem with prompts like the ones ChatGPT generated is this: often, the reasoning laid out in those expandable sections (see above) aren’t enough. Here’s when I believe you still need to ask your LLM for its reasoning (and remove “do not print reasoning” from any revised prompts) -

❌ Debugging or quality control:
You WANT to see the logic if you’re double-checking AI decisions, or if you’re testing a prompt for consistency.

❌ Training new prompt patterns:
If you want to teach yourself or your team how the AI “thinks,” seeing step-by-step logic is essential.

❌ Sensitive/ambiguous cases:
For research, UX, or customer verbatim analysis, it’s helpful to understand why the AI labeled a statement a certain way, or how it arrived at certain groupings.

❌ Higher stakes work:
You need all the transparency you can get for tricky or high-stakes research work where you’ll need to document the full process and arguments for where you landed.

AI STUDIES

💔 Is AI Actually Making Us Dumber?

Study: “Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task”

Why it matters:
If you use LLMs for research/analysis, you might be worried about outsourcing your brain. I certainly am.

This is something I’ve been thinking about since the earliest days after ChatGPT’s launch and my first experiments. “Will this make my brain incapable of doing customer research without AI in the future?” was top of mind, and still is.

〰️

Here’s what this paper found, and how I think about the topic at hand -

What the experiment did:

54 participants wrote essays
Participants were split into 3 groups, using (a) their brain, or (b) a search engine, or (c) an LLM to write the essays.
18 participants completed a final task, switching from the method they used in the original tasks (e.g. their brain) to another method (e.g. an LLM)
EEG measured brain engagement during essay writing.
Humans & AI both scored the participants’ work produced.

What happened?

LLMs seem to push “single answer” thinking - they synthesize everything for the user to the point that the users don’t have to understand, sift and make sense of things themselves. Low cognitive load, but also low engagement.
Search engines give options that the human brain has to work through and synthesize. More cognitive load, and engagement.
People using LLMs recalled less from their own writing just minutes later.
Brain activity dropped if they started with AI, but spiked if they started solo.

“The LLM group also fell behind in their ability to quote from the essays they wrote just minutes prior”

From the study

⚠️ What I’m most afraid of: The finding in that quote above signals that we don’t use our working memory as much when using LLMs to understand our data. To me, this says we must stay as close to the data as possible. Our work is not just about sifting through data faster, it’s about being close to the customer and our brains don’t actually absorb as much from the data - and remember it later - if we hand everything over to AI.

〰️

How I think about using AI (and this study encourages me to continue that way):

Take notes in customer sessions, on own thoughts during meetings, etc. Start with something that is your perspective, not just AI’s to bring to later AI use
Ask AI to help as a second step after forming own thoughts and perspective - not the other way around.
Use your notes and perspective to challenge AI’s in any process (planning experiments, writing interview guides, running analysis, etc).

Since day one, I’ve continued to take one step - even if small - to use my own brain before using AI. This study makes me think that’s worth continuing.

〰️

🔍 One final note about this study’s size:

54 participants is relatively small for a neuroscience study - but not unusual for EEG research, which is more intensive and expensive than a survey or click test. (And only 18 completed the final task, though the researchers considered this a bonus.)

I see this study as a signal, not a final verdict. The findings are directionally important—especially about “single answer” thinking and memory.

AI NEWS

🇪🇺 Nvidia & Perplexity: Building “Sovereign AI” in Europe

What happened?
Perplexity (the AI-powered search startup) and Nvidia are launching a big push to build “sovereign AI infrastructure” for European countries.

Instead of relying on U.S. or Chinese LLMs and cloud providers, EU governments and companies could run Perplexity’s AI models on local Nvidia hardware, inside the EU.

Why is this generally a big deal?

European governments want control over their AI: where it runs, where data lives, and how models are updated.
“Sovereign AI” means that your data (including customer research and interviews) can stay inside the EU—no U.S. or non-EU company access.

Why this matters to us:

Better coverage of European languages:
The new Perplexity models will handle all 24 official EU languages natively - not just “translate” but actually understand and respond in context, with local nuance.
Fine-tuning for national, cultural context:
Teams in places like Slovakia and Slovenia are already training models on their national languages and cultures—meaning answers, summaries, and insights are less North American by default, more relevant for local customers and users.
Less American bias? Hopefully
I imagine that if these models are trained on European data and not majority-U.S. data, there would be a reduction in the kind of U.S.-centric responses we often get from the major models - sometimes as basic as assuming all our financial calculations should be in $, but often more significant than that.

Source: Wall Street Journal

WHAT’S COMING NEXT?

Here’s what’s coming in the next few editions -

AI tools for prototyping - does one generate the best wireframes and views for user testing?
I’m still planning a big test of ChatGPT Pro to finally test some interesting functionality, like video mode and recording (is ChatGPT a valid note-taker?)
and more 🤓

See you in July!

-Caitlin