Why One Test Isn’t Enough: Building an Incrementality Testing Program

Back

Level it Up

Carly London

Founder at Sometimes Curly

With hundreds of millions of media managed and 14+ years of growth marketing experience, Carly acts as CMO, Head of Growth, media buyer, or anything in between.

With over 14 years of growth marketing experience and hundreds of millions in media spend under her belt, Carly has seen every attribution model come and go—from last-click to lift tests.

She’s led growth at brands like Dollar Shave Club and LIVELY, and now runs her own consultancy, Sometimes Curly, partnering with brands like DRMTLGY, Pet Honesty, and Kitsch to help them scale smarter through incrementality testing programs.

In this article, Carly shares her approach to incrementality testing—breaking down how she distinguishes between tactical and strategic use cases, how she uncovers the true impact of ad spend, and why one test alone is never enough.

The Single Test Trap

I’ve run dozens of media tests over the years — some in-platform, others using third-party tools like WorkMagic — and I can confidently say: one test doesn’t tell you much.

In fact, that belief shapes the foundation of my entire approach. When I begin working with a brand, the first step is to establish a baseline for incrementality for each channel.

Here’s an example:

We ran an incrementality test for a brand that has millions in monthly ad spend. Incrementality for channel A came in at 1.1x. At first glance, that’s not great. For some leadership teams, it’s enough to hit pause and reallocate budget. I’ve been in rooms where execs were ready to pull the plug on a channel after seeing results like that — especially when they’re wildly different from in-platform numbers.

But here’s the thing: you can’t judge a channel in isolation. You need a full view across your entire mix. So we kept testing. Channel B came in at 0.9x, and Channel C was the weakest at 0.3x. Suddenly, Channel A was looking pretty good.

That’s the power of testing multiple channels for incrementality. It’s not about whether one channel works — it’s about which channel works better, and what to do with that next ad dollar. If you only test TikTok, you’ll never know Meta is outperforming.

If you only test Meta, you might miss that Google Search is burning cash.

Context is everything.

Tactical vs. Strategic Testing — When to Use What

Measurement has come a long way—from the early days of Facebook’s Test & Learn to the open-sourcing of tools like Google Meridian and Meta Robyn. It feels like we’re finally in a coming-of-age moment for marketing measurement.

But with more tools comes more confusion. Here’s how I think about it:

In-platform tests are great for tactical decisions.

I still use Meta's in-platform lift tests all the time — they’re great for fast, tactical questions like:

Is my retention campaign driving incremental DTC conversions, or just showing over-inflated platform numbers?
Is Value Optimization better than Conversion Optimization?
Does Tactic A outperform Tactic B?

You’ll get answers quickly — but there’s a trade-off — you're not going to be able to compare two channels within one platform's measurement tools. That's where a third-party incrementality tests comes in.

Another drawback: in-platform tools rely heavily on pixels and clicks, and only take into account purchases on your website —so purchases on Amazon or offline retail aren’t easily captured. That’s a massive blind spot, especially for mature brands or brands that don’t rely solely on their DTC site.

Geo-lift tests are for bigger, multi-channel questions.

Tools like WorkMagic come in handy for the more strategic, larger questions. For example,

Should I be spending more on Google or Meta?
Which channel has the highest true ROAS?
Are my awareness channels actually driving incrementality?

Platforms aren't able to answer those questions for you.

With WorkMagic, I can test multiple channels, establish that multi-channel incrementality baseline, then confidently compare channels A, B, and C –including Amazon and retail data – and decide their place in my marketing mix.

From Baseline to Action — What Happens Next?

Once you've gotten your baseline measurements in, now what? With those results in hand, you have the context to compare channels and identify areas of improvement. For each channel, I then prepare a list of hypotheses — things that I want to test to improve that channel's performance.

Here are some examples:

If Meta has solid incrementality, can we scale further by targeting mid-funnel actions, like "Add to Cart"?
Should we test optimizing towards click+view through conversions instead of just click-only? How might that affect the way Meta serves my ads?
Would my Meta ads lead to a stronger halo effect on Amazon?

This is also a great time to get other stakeholders like finance and the C-suite involved. With a baseline, you're finally comparing apples to apples. Their input is also helpful when deciding the next tests to run.

Ask:

What are the high-level questions we're trying to answer as a company?
What are our goals?
How can we use an incrementality test to see if they're feasible?

The brilliance of these tools is you can actually test those hypotheses. You’ll know if a change made a meaningful impact — or if it’s back to the drawing board.

Covering Your Blindspots with Incrementality Testing

I still shiver a bit when I hear someone say they’re using last-click attribution. It's especially problematic for awareness channels like YouTube, or for products with longer consideration windows.

The other blindspot we mentioned — conversions that happen outside of your website — can also be addressed with incrementality testing.

Awareness channels like TikTok and YouTube might look like they're only driving "vanity" metrics – views, impressions – but when you layer in lift testing, you’ll often find those channels are driving real conversions. You’re just not seeing them on your DTC site, or may just not be able to measure them with click-only attribution.

Conclusion

Today's marketers can’t rely on a single test or method to tell the full story. That's why I advocate for broad, full-mix testing, a clear baseline, and a focus on uncovering blind spots before making decisions about scaling — or cutting — spend.

Ready to measure the halo effect of your ads and measure their true impact? Book a demo with WorkMagic here.

Need expert help? Sometimes Curly manages $150m+ in annual media spend, helping brands with their media execution and incrementality tests. Let's chat — email us here.

‹ Weaning Your Brand Off Click-based Attribution

Cross-Channel Impact and Attribution Complexity ›