AI x Customer Research
Posts
AI x Customer Research - February '25

AI x Customer Research - February '25

My framework for evaluating tools systematically, checking AI privacy policies, and a shiny new model...

Caitlin Sullivan
February 28, 2025

Read time: 10 minutes

Hi, nice to see you again!

In this edition, I’m bringing back a little bit of data privacy help. Plus, I’m sharing my tool evaluation framework - a snippet from my new course on AI Analysis - and an intro to the latest exciting LLM model on the market.

During AI trainings, I often hear: "We're on the enterprise plan for {tool}, so privacy is covered." While enterprise plans do offer stronger protections, my sense was that data privacy still wasn’t automatically bulletproof. So I investigated this further.

Enterprise plans typically include enhanced security and opt-out options, but AI tools vary considerably in how they handle customer data. Some retain inputs longer than necessary, others share certain data with partners, some policies barely tell us anything. And most require careful reading to understand the full picture.

This week, I'm outlining what to verify before entrusting sensitive customer data to any AI tool - and my assessment of a few common tools.

Then, we’re all experimenting with AI tools these days. But without structure, comparisons and final toolkit decisions can feel impossible. That's what I was facing when I started working on a framework for my own testing process. I developed the COMPASS framework—a series of checks for evaluating AI tools more systematically.

In industry news: A significant model was just released, featuring improved reasoning and a more transparent process. The flexibility of this one is potentially a great upgrade for how we run different types of AI-supported analysis.

Let's dig in. 👇

In this edition:

📝 The COMPASS Framework: A system to cut through hype and evaluate whether AI tools actually work for you.
🔒 How Do Your Favorite AI Tools Handle Your Data? – A breakdown of privacy policies from three common tools used by research and design teams (and what the risks are).
📰 AI News: A new LLM model is a “hybrid”. It can think more deeply, and shows you its work. What the benefits could be for customer research work.

WORKFLOW UPGRADES

📝 My COMPASS Framework: How to Quickly Evaluate AI Research Tools

If you’ve ever spent hours testing AI tools, only to wonder, “Is this actually better than doing it manually?”—I feel you.

AI tools for research are multiplying faster than we can keep up with, and the last thing anyone wants is to waste time (or budget) on something that looks cool but fails in real-world use.

Yet if we just spontaneously run some data through a few tools, it can be hard to feel we’ve really compared them well enough to make a final decision about which one to use.

As part of my own testing process, I created my COMPASS framework to evaluate AI tools as consistently as I can manage across tests.

Instead of vague “does this work?” testing, it helps me systematically assess whether a tool is actually reliable, insightful, and scalable—or just an overhyped Beta that fails under real data.

While my framework is best for evaluating AI analysis tools and processes, many of the checks involved are helpful for evaluating other research tasks with AI, too.

C – Consistency (Does it give reliable results?)

A good tool produces similar outputs when analyzing the same dataset multiple times. If the summaries keep changing, it’s not reliable. If you test this with multiple data sets, the output quality level should still be reliably on the same (hopefully high) level, too.

✅ Test this: Run the same data through it three times—are the key insights and level of quality the same?

O – Omissions (Is it missing key insights?)

AI tends to overlook nuance. A great tool captures nearly everything a human would—not just the obvious themes. It needs to make your work process easier.

✅ Test this: Compare AI’s summary to your notes—does it miss anything important? And what kind of details does it not quite get?

M – Meaningful (Are the insights actually useful?)

Some tools pick up patterns that don’t matter. Others surface strategic insights you can act on, listening very well to the background details you fed it.

✅ Test this: If you handed the AI output to a stakeholder, would they instantly get the key takeaways and find them helpful?

P – Precision (Does it stick to the data?)

AI should not overgeneralize or add assumptions. If it starts making things up or misrepresenting what the data shows, it’s likely to do more harm than good.

✅ Test this: Check if AI is accurately summarizing what’s in the data—without injecting extra meaning or mislabeling evidence.

A – Actionable (How usable are the outputs?)

If you spend more time rewriting than using the outputs, AI isn’t actually supporting you.

✅ Test this: Could you mostly copy-paste the AI’s results into a report with minimal edits?

S – Scalability (Can it handle bigger datasets?)

A tool that works for five interviews but crashes with fifty isn’t valuable enough for many of us. Tools that only handle interview recordings are also too narrowly focused to help many of us triangulating multiple sources.

✅ Test this: Try larger datasets or multiple file formats—can your tool handle them? Or does it lose track of your intended focus?

S – Sustainability (Is this tool built to last?)

AI evolves fast—some tools won’t be around in six months.

Secondly, many of us do care about the enormous environmental footprint AI has. If you want to consider this additionally, you’ll need to do a bit more research into whether your chosen tools are trying to minimize their footprint in any way. (Unfortunately, most aren’t).

✅ Check this: Who built the tool? Is it dependent on another AI company’s API? If so, what happens when that API changes?

This was a snippet from my course content.

I know many of you find my AI tool-testing guidance especially useful, so I wanted to share something practical from my new course. Next time you’re evaluating an AI tool for research, run it through the COMPASS checks—you’ll spot the gaps (or the gold) much faster.

DATA PRIVACY

🫣 How Do AI Tools Store Your Customer Data? A Privacy Breakdown + Issues

AI tools are obviously great for speeding up many of our customer research steps. At this point, I can’t really live without them. And yet they come with a cost—the privacy of our data.

Some tools train their models on your inputs, others share data with third-party providers, and some let you opt out (if you dig through their settings).

Most of them aren’t as clear about all the important data privacy factors as they should be.

But how bad is it? Let me break down the privacy policies of Notion AI, Miro AI, and Otter.ai - three tools commonly used for UX research and design work.

The Privacy Breakdown

1️⃣ Notion AI

😐 Data Retention: Stores data as long as your account is active. Backups are still kept for 30–90 days after deletion.

👌 Model Training: By default, Notion AI doesn’t train on your data.

😐 Third-Party Sharing: …but Notion AI sends all AI-related data to OpenAI and Anthropic. They claim these providers don’t train on it—but that might only be contractually enforced for Enterprise users.

👌 Security & Encryption: AES-256 encryption at rest, TLS 1.2+ in transit.

🇺🇸 Data is stored in the U.S. with no regional storage options.

⚠️ Key Risks: Free/Pro users’ data could be used to improve OpenAI models, though they say that they won’t be - it really depends on how this is enforced.

—

2️⃣ Miro AI

😐 Data Retention: Unclear how long AI-generated content is stored. Data in Miro can generally be retained as long as they see fit, including even after account deletion in the case of audits.

👌 Model Training: Miro doesn’t train models on your data.

👌 Third-Party Sharing: Miro uses some self-hosted proprietary models, and models via Microsoft Azure AI. Their contracts prevent training on user data.

👌 Security & Encryption: Industry-standard levels of encryption.

🌏 Data is stored in the U.S, EU or Australia, depending on your location.

⚠️ Key Risks: Not entirely clear data retention policies.

—

3️⃣ Otter.ai

😐 Data Retention: Stores data up to a year after account deletion. Backups exist for an unspecified period.

👎 Model Training: By default, Otter uses your transcripts to train its AI. You must opt out manually via a form.

👎 Third-Party Sharing: Otter Chat sends data to external providers, but they don’t disclose which ones.

👌 Security & Encryption: Industry-standard encryption.

🇺🇸 Data is stored in the U.S. with no regional storage options.

⚠️ Key Risks: Default opt-in to AI training. Unclear whether data is anonymized before use in model training. Unnamed third-party providers process chat data.

〰️

🧐 Figure Out Privacy Policies (or Get AI’s Help)

Most AI tools bury their data policies in walls of text. Whether you want to do this yourself, or use AI to help you dig through them, here are the key factors I check in privacy policies and other help center documents.

Yep, the list below is long. I suggest you pull out the pieces that are most important for you and your team, based on your context.

Key Privacy Factors to Investigate -

1️⃣ Data Storage & Retention

How long is customer data stored?
Can users delete their data permanently?
Is data stored on company servers, third-party cloud services, or user-controlled storage?

2️⃣ Data Usage & Training

Does the AI train on user inputs? If so, are inputs anonymized first?
Are transcripts, documents, or other inputs used to improve the AI model?
Can users opt out of their data being used for training? On all plans, or just Enterprise?

3️⃣ Third-Party Sharing

Does the tool share customer data with partners, vendors, or third-party AI providers?
If it integrates with an LLM (e.g., ChatGPT, Claude), is user data passed to that external model?
Can those companies train models on our data?

4️⃣ Location of Data Storage & Compliance

Where is the data physically stored (e.g., US, EU, AWS, private servers)?
Is the tool GDPR- and CCPA-compliant?
Does it offer regional storage options for compliance-sensitive industries?

5️⃣ Security & Encryption

Is data encrypted in transit and at rest?
Which security measures does the company have in place to protect stored data?

6️⃣ User Control & Transparency

Does the tool provide a clear, user-friendly way to access, delete, or export data?
Does it have a transparent, easy-to-read privacy policy (or is it buried in legal jargon)?

7️⃣ PII & Health-Related Information Handling

Does the tool offer automated redaction or anonymization of PII and health details?
Does the tool automatically anonymize the data for its storage and transfer?

〰️

Prompt Series to Get AI’s Help -

If you choose to use AI here, keep in mind: ChatGPT and other LLMs are just as likely to hallucinate here as in any other process.

Make sure you do a little bit of your own verification work, too. Don’t just trust what the LLM gives you as the truth—a quick control-find search of key terms like “storage” and “training” in privacy policies goes a long way to fight hallucinated misinformation before big toolkit decisions.

Prompt 1 -
“Review the Privacy Policies of common AI tools for research.

Here are the steps we’re going to take together, looking at one tool at a time:  

1. Investigate the privacy policies of each tool on our list, focusing on the key privacy factors.

2. Once you have exhaustively reviewed the policies, present a summary of findings for each tool (concise but detailed enough to be useful).

3. Note any differences in privacy handling between free, pro, and enterprise tiers (if applicable).

4. We’ll review the findings together to decide whether we need to dig deeper in some places.

Let me know if you have any questions about these steps.

Then, I'll send you the types of information you'll be looking for.

Finally, I'll send you the link to the first tool’s privacy policy.”

Prompt 2 -

“Review this link {paste privacy policy link of Tool #1}.

Search in the privacy policy for these Key Privacy Factors:

{ Paste in all your selected privacy factors, or copy-paste my full list from above }”

AI NEWS

📰 Claude 3.7 Sonnet: A Model With Better Options for Customer Research?

Anthropic just rolled out Claude 3.7 Sonnet, and this one’s worth a look. It’s “designed to ‘think’ about questions for as long as users want it to.” What that means: It’s the first hybrid reasoning model that allows users to choose whether Claude should process requests quickly or spend more time thinking.

While the team is currently prioritizing math and coding requests in Claude’s extended thinking options, Claude’s new way of working has potentially big impacts on qualitative work, too.

Why it matters:

Most AI models give you a superficial answer to research analysis prompts and other complex scenarios, unless you spend more time laying out how to think through complex steps in bite-size pieces.

Most of the time, you also have no clue how your AI got to its conclusions. Claude 3.7 Sonnet is trying to change both of those things.

What this update brings for research work:

✅ A visible "scratch pad", so you can see its thought process in real time. That’s critical for validating research steps as we go through complex processes with AI.

✅ Deeper multi-step reasoning – Great for complex analysis where you want your AI partner to dive deeper and take time to answer more thoroughly.

✅ Flexible processing speeds. We can toggle between two modes, like asking for quick summaries of less complicated content (ex: prep fast for a stakeholder meeting by having AI synthesize a document) vs. running deeper insight development for risky projects.

WHAT’S COMING NEXT?

Here’s what’s coming in the next few editions -

Deep testing of an expensive newer LLM model 💸
Can “video mode” in LLMs help us run in-person testing better?
AI tools for wireframing and prototyping - is there a winner?
and more 🤓

See you next time!

-Caitlin