Gabrielle de La Forest - Research & Service Design
Gabrielle de La Forest - Research & Service Design
Gabrielle de La Forest
Research & Service Design

2026 - Synthetic User Testing for an AI-Driven Proactive Banking System

Context & Problem Statement

Starting point

A major French banking group wanted to design a personalised proactive contact system, but had no existing customer data, no prior research, and a very low UX maturity level.

The challenge

How do you design relevant and legitimate AI-driven experiences when mobilising real users is not yet possible?

The methodological answer

Design a research protocol based on synthetic users: behavioural personas capable of simulating differentiated reactions to design stimuli in a controlled way. Synthetic testing allows rapid iteration at an early stage of reflection, without the cost and lead time of traditional recruitment

Production Time

One of the most obvious aspects of this protocol is its efficiency. A comparable traditional protocol would conservatively require 6 to 10x that investment, without accounting for recruitment lead time. This does not position synthetic testing as a replacement for real user research: it positions it as a powerful tool for rapid hypothesis generation, early-stage validation, and design de-risking before committing to heavier research phases.

The foundations: 3 Dimensions tested in combination over 5 Synthetic Users

The protocol was structured around three transversal dimensions:

  • 6 need families: Life moments with financial implications (e.g. "I have a project but I don't know if it's feasible")

  • Signal types: Factual, behavioural, and declarative signals indicating a latent need

  • Entry channels: Push notifications, secure in-app messages, email, video calls with advisors, simulators, checklists…

Five behavioural profiles designed to cover a broad spectrum of attitudes towards banking and personal data:

  • Claire M.: learning stage, low involvement, receptive to educational content

  • Thomas L.: low-involvement client, pragmatic, low tolerance for friction

  • Sophie B.: highly organised, strict control over finances, opposed to automated actions

  • Philippe M.: loyal client, values trust and human relationship above all

  • Marc D.: autonomous and demanding investor, high financial literacy

The foundations: 3 Dimensions tested in combination over 5 Synthetic Users

The protocol was structured around three transversal dimensions:

  • 6 need families: Life moments with financial implications (e.g. "I have a project but I don't know if it's feasible")

  • Signal types: Factual, behavioural, and declarative signals indicating a latent need

  • Entry channels: Push notifications, secure in-app messages, email, video calls with advisors, simulators, checklists…

Five behavioural profiles designed to cover a broad spectrum of attitudes towards banking and personal data:

  • Claire M.: learning stage, low involvement, receptive to educational content

  • Thomas L.: low-involvement client, pragmatic, low tolerance for friction

  • Sophie B.: highly organised, strict control over finances, opposed to automated actions

  • Philippe M.: loyal client, values trust and human relationship above all

  • Marc D.: autonomous and demanding investor, high financial literacy

A Progressive 5-Test Protocol

The synthetic interviews were structured around 4 progressive tests run as collective sessions, the equivalent of synthetic focus groups. A fifth and final test then brought all dimensions together into full conversational scenarios.

Test 1 → Test 2 → Test 3 → Test 4 → Test 5

  • Test 1 · Perception of need families Open qualitative reactions: do the 6 families resonate as real life moments?

  • Test 2 · Relevance of families per profile 1-to-5 scoring: which families are most relevant for each persona?

  • Test 3 · Channel preferences per family 1-to-5 scoring: which contact channel is legitimate and acceptable per family and per profile?

  • Test 4 · Trigger signals 1-to-5 scoring: which signal type is acceptable depending on the profile and the situation?

  • Test 5 · Full conversational scenarios Real-condition validation: 5 scenarios built directly from the preferences and scores expressed across tests 1 to 4 (family × signal × channel), evaluated on relevance, perceived value, tone, and actionability. Each scenario is therefore not hypothetical but grounded in prior synthetic evidence. Followed by an open question on the golden rule and red lines.

Iteration process

Whenever a persona assigned a low score, an iteration was initiated to identify friction points and produce an improved version. A before/after comparative score validated or invalidated the change, with no need to recruit new users.

Research Findings

Tensions & Adjustments : What did not work straight away

Three friction points emerged and forced real adjustments to the protocol.

  1. Accepted in theory, rejected in practice One signal type consistently scored high in isolation but triggered strong resistance when embedded in a concrete scenario.

  2. A family that was hiding two distinct user needs One of the six need families turned out to be too heterogeneous to treat as a single unit.

  3. Scores can lie A 4/5 rating can mask a deep-seated reservation.The numeric rating alone was not surfacing that tension. This is why an open question on red lines was added at the end of the protocol: to capture what scoring cannot.

3 Concrete Design Decisions Produced
Without disclosing client-specific findings, the protocol produced directly actionable orientations:

  1. One channel combination achieved full consensus A single family × signal × channel combination reached a perfect unanimous score across all profiles on relevance, tone, and perceived value.

  2. Algorithmic transparency is non-negotiable Every scenario scoring 4/5 rather than 5/5 shared the same missing element: the absence of a visible explanation of what triggered the contact. This is an acceptability condition. The resulting design decision: every proactive communication must expose its trigger.

  3. One channel was unanimously disqualified A widely-used contact channel was rejected across all profiles for any proactive use case outside of fraud alerts. No future scenario should rely on it as a standard outreach method.

Research Findings

Tensions & Adjustments : What did not work straight away

Three friction points emerged and forced real adjustments to the protocol.

  1. Accepted in theory, rejected in practice One signal type consistently scored high in isolation but triggered strong resistance when embedded in a concrete scenario. The most demanding persona considered it intrusive without explicit opt-in, dropping the score significantly. This forced a reformulation: that signal type is only acceptable when transparent, explainable, and paired with user consent.

  2. A family that was hiding two distinct user needs One of the six need families turned out to be too heterogeneous to treat as a single unit. Testing revealed an internal split between two sub-profiles with fundamentally different expectations. This gap was not resolved within this cycle and stands as an acknowledged limitation of the protocol.

  3. Scores can lie A 4/5 rating can mask a deep-seated reservation. Several personas gave acceptable scores while simultaneously stating non-negotiable conditions. The numeric rating alone was not surfacing that tension. This is why an open question on red lines was added at the end of the protocol: to capture what scoring cannot.

3 Concrete Design Decisions Produced
Without disclosing client-specific findings, the protocol produced directly actionable orientations:

  1. One channel combination achieved full consensus A single family × signal × channel combination reached a perfect unanimous score across all profiles on relevance, tone, and perceived value. It was identified as the natural entry point for any first deployment.

  2. Algorithmic transparency is non-negotiable Every scenario scoring 4/5 rather than 5/5 shared the same missing element: the absence of a visible explanation of what triggered the contact. This is not a comfort feature. It is an acceptability condition. The resulting design decision: every proactive communication must expose its trigger.

  3. One channel was unanimously disqualified A widely-used contact channel was rejected across all profiles for any proactive use case outside of fraud alerts. No future scenario should rely on it as a standard outreach method.

Research Findings

Tensions & Adjustments : What did not work straight away

Three friction points emerged and forced real adjustments to the protocol.

  1. Accepted in theory, rejected in practice One signal type consistently scored high in isolation but triggered strong resistance when embedded in a concrete scenario. The most demanding persona considered it intrusive without explicit opt-in, dropping the score significantly. This forced a reformulation: that signal type is only acceptable when transparent, explainable, and paired with user consent.

  2. A family that was hiding two distinct user needs One of the six need families turned out to be too heterogeneous to treat as a single unit. Testing revealed an internal split between two sub-profiles with fundamentally different expectations. This gap was not resolved within this cycle and stands as an acknowledged limitation of the protocol.

  3. Scores can lie A 4/5 rating can mask a deep-seated reservation. Several personas gave acceptable scores while simultaneously stating non-negotiable conditions. The numeric rating alone was not surfacing that tension. This is why an open question on red lines was added at the end of the protocol: to capture what scoring cannot.

3 Concrete Design Decisions Produced
Without disclosing client-specific findings, the protocol produced directly actionable orientations:

  1. One channel combination achieved full consensus A single family × signal × channel combination reached a perfect unanimous score across all profiles on relevance, tone, and perceived value. It was identified as the natural entry point for any first deployment.

  2. Algorithmic transparency is non-negotiable Every scenario scoring 4/5 rather than 5/5 shared the same missing element: the absence of a visible explanation of what triggered the contact. This is not a comfort feature. It is an acceptability condition. The resulting design decision: every proactive communication must expose its trigger.

  3. One channel was unanimously disqualified A widely-used contact channel was rejected across all profiles for any proactive use case outside of fraud alerts. No future scenario should rely on it as a standard outreach method.

UX AI Competencies Demonstrated

  • Synthetic user design: architecture of differentiated behavioural panels

  • AI-assisted research protocol: 5-stage protocol with mixed quantitative and qualitative metrics

  • Research-oriented prompt engineering: evaluable stimuli with scoring grids

  • AI trust and legitimacy design: assessment of the acceptability conditions of a proactive system

  • Systems thinking: triangulation of family × signal × channel across 5 profiles

UX AI Competencies Demonstrated

  • Synthetic user design: architecture of differentiated behavioural panels

  • AI-assisted research protocol: 5-stage protocol with mixed quantitative and qualitative metrics

  • Research-oriented prompt engineering: evaluable stimuli with scoring grids

  • AI trust and legitimacy design: assessment of the acceptability conditions of a proactive system

  • Systems thinking: triangulation of family × signal × channel across 5 profiles

Create a free website with Framer, the website builder loved by startups, designers and agencies.