Creating Realistic Synthetic Users with Health Conditions for Coaching Agent Interactions

This paper presents a framework for generating synthetic user personas that accurately reflect real-world health conditions, demographics, and behaviors. These synthetic users can then interact with AI coaching agents, allowing for more realistic evaluation and improvement of the agents.

The paper proposes an end-to-end framework for creating synthetic user personas grounded in real data on health conditions, demographics, lifestyle factors, and psychological traits. The key steps are:

1. Obtain real data on human characteristics like demographics, health conditions, behaviors, and personality traits from datasets like LifeSnaps and Project Baseline Health Study.

2. Sample from this data to create a cohort of synthetic users with specified distributions of attributes.

3. Optionally add additional health conditions or rich backstories to the synthetic users using large language models (LLMs), conditioned on the existing attributes.

4. Generate natural language "vignettes" describing each synthetic user's profile, including details like primary health concerns, goals, barriers, and backstories.

5. Use these vignettes in a generative agent-based model like Concordia to simulate interactions between the synthetic users and a coaching agent.

The authors evaluate their framework in two use cases: sleep coaching and diabetes coaching. For sleep coaching, they use the LifeSnaps dataset to generate 68 synthetic users with varying sleep patterns, demographics, and personality traits. **Figure 2** shows that the sleep coaching agent could accurately identify the primary sleep concerns, barriers, and goals communicated by these synthetic users during their interactions.

Figure 2

For diabetes coaching, the authors leverage the Project Baseline Health Study to create 200 synthetic users facing different barriers to diabetes management, as per the COM-B model. **Figure 4** shows that human experts agreed the synthetic users consistently demonstrated their assigned barriers during conversations with the diabetes coaching agent. **Figure 5** further illustrates that these "grounded" synthetic users portrayed barriers more accurately than a baseline using only demographic data.

Figure 5

The authors argue that grounding synthetic users in real data makes them more representative of the target population, enabling better evaluation and iterative improvement of coaching agents through simulated interactions.