CUI '25: Proceedings of the 7th ACM Conference on Conversational User Interfaces
SESSION: Short Papers and Works-in-Progress
Augmentation by Segmenting Audio in Speech Emotion Recognition
Speech Emotion Recognition (SER) suffers from limited training data, hampering model robustness. To address this challenge, we propose a novel data augmentation technique that enhances training diversity by segmenting input audio files into smaller, fixed-length units. Each segment is evaluated to determine its representativeness of the overall emotion label assigned to the full audio sample. Segments deemed representative retain the original emotion label, effectively expanding the dataset while preserving emotional consistency and contextual integrity. We evaluate the method by fine‑tuning Wav2vec 2.0 and training TIM‑Net. Experimental results demonstrate that models trained on the augmented datasets achieve notable performance improvements, with accuracy gains ranging from \( 1\% \) to \( 3\% \) on benchmark datasets, including IEMOCAP, RAVDESS, and SAVEE, compared to models trained exclusively on the original data. These findings underscore the potential of our augmentation method to advance SER performance and bridge the gap caused by limited training data.
Exploring Multi-LLM Collaboration to Power Conversational Recommender System: A Case Study of Dietary Recommendation
Conversational recommender systems (CRS) are promising in delivering personalized recommendations by engaging users to share rich information about themselves, particularly in dietary recommendation, where various factors (e.g., food preferences, eating habits) needs to be considered. However, maintaining a coherent conversation for information collection and recommending healthy dishes tailored to different users remains challenging, even with the emerging large language models (LLMs). In this study, we explore multi-LLM collaboration—where multiple LLMs specialize in subtasks of a complex problem—to enhance a dietary CRS. Through an online experiment (N = 161), we compared multi-LLM collaboration with its single-LLM counterpart during the conversation and recommendation phases, evaluating system performance and participants’ experiences. We found multi-LLM collaboration equipped the conversation manager with greater adaptability to the conversation contexts, while powering the recommendation engine to deliver more nutritionally balanced and wide-range recommendations. Our discussion then focuses on the implications for designing user-centered CRS with LLMs.
Multi-Tool Analysis of User Interface & Accessibility in Deployed Web-Based Chatbots
In this work, we present a multi-tool evaluation of 106 deployed web-based chatbots, across domains like healthcare, education and customer service, comprising both standalone applications and embedded widgets using automated tools (Google Lighthouse, PageSpeed Insights, SiteImprove Accessibility Checker) and manual audits (Microsoft Accessibility Insights). Our analysis reveals that over 80% of chatbots exhibit at least one critical accessibility issue, and 45% suffer from missing semantic structures or ARIA role misuse. Furthermore, we found that accessibility scores correlate strongly across tools (e.g., Lighthouse vs PageSpeed Insights, r = 0.861), but performance scores do not (r = 0.436), underscoring the value of a multi-tool approach. We offer a replicable evaluation insights and actionable recommendations to support the development of user-friendly conversational interfaces.
“Hey Google, How Do I File My Taxes?”: Evaluating Conversational Interfaces for Newcomers Navigating Government Services
We sought to understand the effectiveness of conversational user interfaces (CUIs) in answering questions relevant to immigrants, refugees, and visible minority users navigating key government services. To do this, we conducted a comparative analysis on the responses of ChatGPT, Gemini, and Microsoft Copilot to understand the characteristics, similarities, and differences in responses. We found that these revolved around clarity, tone, depth of information, use of official sources, and accessibility and inclusivity. Furthermore, we observed that ChatGPT provided detailed, user-friendly responses, Microsoft Copilot provided concise information but lacked contextual depth, and Gemini offered users a structured, step-by-step response to navigate the different services. We conclude with design implications for developing more inclusive and effective government-focused CUIs. Future work will involve conducting usability testing to collect additional data, investigate the impact of these variations, and validate the accuracy and effectiveness of the CUIs’ responses.
Motivations, Means, and Meanings: Understanding AI Tool Usage Among Students in the Global South
Integrating Ethical AI Tools into Educational Practices for Enhancing Academic Integrity
As large language models (LLMs) become increasingly integrated into educational settings, concerns about academic integrity, ethical usage, and student engagement are becoming more prominent. While these AI tools can effectively provide personalized learning experiences and support diverse student needs, they also risk overreliance and promote unethical academic practices if used without appropriate safeguards. This paper presents a novel approach that integrates an LLM-based assistant directly into a learning management system (LMS) with carefully designed constraints to encourage active learning, reduce misuse, and preserve academic integrity. We establish core design principles to address the challenges associated with LLMs in education and provide a detailed description of our system’s architecture. Additionally, we conduct a pilot study to assess the tool’s impact on student learning and gather feedback for further improvements. A prototype of the tool is publicly available on Github.
Elmo: An Embodied Conversational Assistant For Community Repair Cafés
Repair cafés provide a community service that supports and empowers item ‘Bringers’ to reinstate traditional values of repair in order to reduce climate impacts of waste. In this paper, we report on a pilot study exploring the use of Elmo (an embodied conversational assistant) by volunteer Repairers within repair café settings. Our findings show the different ways Repairers incorporated this technology probe into their work and the challenges they faced in doing so. Through this, we contribute three areas of consideration for designers looking to implement conversational assistants into community repair settings. These include, contextual awareness of time to undertake suggestions and of the related repair café processes, motivation and customer relations requirements of volunteer Repairers, and enhancing empowerment work through supporting current practices and demonstrations of navigating trouble in repair-resource interactions.
Fitting the Message to the Moment: Designing Calendar-Aware Stress Messaging with Large Language Models
Existing stress-management tools fail to account for the timing and contextual specificity of students’ daily lives, often providing static or misaligned support. Digital calendars contain rich, personal indicators of upcoming responsibilities, yet this data is rarely leveraged for adaptive wellbeing interventions. In this short paper, we explore how large language models (LLMs) might use digital calendar data to deliver timely and personalized stress support. We conducted a one-week study with eight university students using a functional technology probe that generated daily stress-management messages based on participants’ calendar events. Through semi-structured interviews and thematic analysis, we found that participants valued interventions that prioritized stressful events and adopted a concise, but colloquial tone. These findings reveal key design implications for LLM-based stress-management tools, including the need for structured questioning and tone calibration to foster relevance and trust.
Lowering the Barrier: Conversational Interfaces as a Novice-Friendly Path to Data Visualisation
Data visualisation software can often be unintuitive, which frequently leads to reduced efficiency and low user satisfaction. To address this, we have developed a conversational agent, VisuaLingua, which is able to generate bar, line and pie charts based on textual descriptions. To evaluate VisuaLingua, we carried out a user study comparing it to Google Sheets, using speed, number of mistakes, and user satisfaction as performance metrics. Results show that users of VisuaLingua generally had higher speed and fewer user errors than users of Google Sheets, with inconclusive results for user satisfaction. When we consider only beginner Google Sheets users, the difference when compared to the experimental group is even higher, suggesting that a conversational interface-based model would be a particularly suitable beginner-friendly alternative to spreadsheet-based solutions for generating data visualisations.
Towards Enhancing Industrial Training Through Conversational AI
Conversational AI (CAI) has proven effective in educational settings, however its potential in industrial training, where higher precision and reliability are required, remains under-explored. This work-in-progress paper proposes a study to examine how AI persona design (Machine vs. Expert Operator) and voice embodiment (Diegetic vs. Disembodied) influence cognitive load, task efficiency, and usability in industrial training. By training a large language model (LLM) on Standard Operating Procedure (SOP) data, this project aims to develop a CAI assistant that provides real-time, easy-to-access information during task execution, in an attempt to enhance training efficiency and reduce reliance on text-heavy manuals through a user-centered approach.
How Large Language Models Classify and Semantically Explain Facial Expressions from Valence-Arousal Values
Large Language Models (LLMs) primarily operate through text-based inputs and outputs, yet emotion is communicated through both verbal and non-verbal cues, including facial expressions. While Vision-Language Models analyse facial expressions from images, they are resource-intensive and may depend more on linguistic priors than visual understanding. To address this, this study investigates whether LLMs can infer affective meaning from dimensions of facial expressions—Valence-Arousal (VA) values, structured numerical representations. VA values were extracted using Facechannel from images of facial expressions (from IIMI and EMOTIC datasets) and provided to three LLMs in two tasks: (1) classifying facial expressions into basic and complex emotions and (2) generating semantic descriptions of facial expressions. The results indicate that LLMs struggle to classify VA values into discrete emotion categories, particularly for emotions beyond basic polarities. However, LLMs produced semantic descriptions that align closely with human-generated interpretations, demonstrating a stronger capacity for free-text affective inference of facial expressions.
Un-trusting the Chat: Designing for Calibrated Trust in Retrieval-Augmented Conversations
Retrieval-Augmented Generation (RAG) systems are increasingly used in conversational AI to support workplace decision-making. Yet in contexts marked by time pressure or ambiguous information, users risk over-relying on system outputs they cannot fully assess. This paper explores how interface design can support calibrated trust—an alignment between user confidence and system reliability—in RAG-based chat interfaces. Through a co-design workshop with HCI and AI experts, we investigated interaction strategies that promote transparency without overwhelming users. Participants proposed design features such as color-coded source references, non-numeric relevance indicators, and adaptive language for expressing uncertainty. These insights reveal how conversational interfaces can communicate limitations in context-sensitive and user-centered ways. We contribute preliminary design directions for trustworthy generative systems and outline next steps for implementation and evaluation.
AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data
In decision-making conversations, experts must navigate complex choices and make on-the-spot decisions while engaged in conversation. Although extensive historical data often exists, the real-time nature of these scenarios makes it infeasible for decision-makers to review and leverage relevant information. This raises an interesting question: What if experts could utilize relevant past data in real-time decision-making through insights derived from past data? To explore this, we implemented a conversational user interface, taking doctor-patient interactions as an example use case. Our system continuously listens to the conversation, identifies patient problems and doctor-suggested solutions, and retrieves related data from an embedded dataset, generating concise insights using a pipeline built around a retrieval-based Large Language Model (LLM) agent. We evaluated the prototype by embedding Health Canada datasets into a vector database and conducting simulated studies using sample doctor-patient dialogues, showing effectiveness but also challenges, setting directions for the next steps of our work.
Impact Of Disfluent Speech Agent On Partner Models And Perspectve Taking
Speech disfluencies play a role in perspective-taking and audience design in human-human communication (HHC), but little is known about their impact in human-machine dialogue (HMD). In an online Namer-Matcher task, sixty-one participants interacted with a speech agent using either fluent or disfluent speech. Participants completed a partner-modelling questionnaire (PMQ) both before and after the task. Post-interaction evaluations indicated that participants perceived the disfluent agent as more competent, despite no significant differences in pre-task ratings. However, no notable differences were observed in assessments of conversational flexibility or human-likeness. Our findings also reveal evidence of egocentric and allocentric language production when participants interact with speech agents. Interaction with disfluent speech agents appears to increase egocentric communication in comparison to fluent agents. Although the wide credibility intervals mean this effect is not clear-cut. We discuss potential interpretations of this finding, focusing on how disfluencies may impact partner models and language production in HMD.
Designing Dialogic Disclaimers: Principles for Ethical Conversational Design in LLM Interfaces
Current disclaimer approaches in large language model (LLM) systems often disrupt conversational flow with static, one-shot warnings that fail to engage users in deeper ethical reflection and, as a growing body of evidence reveals, are largely ineffective at achieving their intended goals of calibrated trust and informed use. This paper examines how disclaimers can be reimagined as integrated dialogic elements that maintain conversational coherence while effectively communicating necessary ethical information. Through a theoretical synthesis drawing on research in conversational repair, dialogue structure, AI transparency, and empirical studies of user interaction with LLMs, we identify four design principles for dialogic disclaimers: Conversational Continuity, Contextual Adaptation, Layered Communication, and User Agency. We provide illustrative conversation patterns demonstrating how these principles can transform disclaimers from disruptive and often ignored statements into constructive components of ethical dialogue. This work establishes a conceptual framework intended as a theoretical foundation to guide future empirical evaluation and the design of more ethically responsible and conversationally fluent LLM interfaces.
Leveraging Generative and Rule-Based Models for Persuasive STI Education: A Multi-Chatbot Mobile Application
Sexual health is a critical global issue, with African young adults being especially vulnerable. This study presents the design, development and pilot evaluation of a multi-chatbot mobile app combining generative and rule-based solutions, based on the Health Belief Model and Persuasive System Design principles, to educate and motivate African young adults to avoid risky sexual behaviors. Early results showed high usability, positive user experience, strong persuasiveness, and high educational value. Users particularly appreciated the app’s cultural elements, gamified modules, and LLM-based generative chatbot. Areas for improvement included UI enhancements and removing barriers to user engagement. This work contributes to advancing knowledge on healthcare chatbots and provides insights into designing mobile health apps for sexual health education and behavior change.
Towards Voice-enabled Postnatal Development Tracking: Opportunities for Conversational Voice Assistants to Aid Child Development Monitoring
The postnatal period is a critical change for mothers, and is exacerbated by increased parental responsibilities and heavier cognitive loads. However, voice assistants (VAs) provide a promising potential in assisting mothers with postnatal care and managing child-related growth and developmental progress. To obtain an impression of where VA applications currently stand in the postnatal childcare sector and to establish the groundwork for forthcoming studies, we performed a thematic analysis of pertinent VA applications, primarily following an inductive coding approach [6]. We identified the features available and feature gaps, including recording medical information, statistical and graph analyses of child data, and childcare recommendations. Utilizing these discoveries, we suggest a VA tool that can track child-related health information, child-health reminders and alerts, and personalized recommendations for child development and growth. This paper provides an overview of the present condition of this area of study and highlights the need for additional research and advancements for better child development tracking.
Motion-mimicking Avatar Communication: A Pilot Study on Dyadic Creativity
With the growing popularity of metaverse applications, avatar communication is becoming increasingly prevalent. In avatar communication, it is not necessary to reflect a user’s real status. In addition, factors that facilitate or inhibit actual communication can be emphasized or eliminated, respectively. In this pilot study, we explored the effects of behavioral mimicry on dyadic creativity in avatar communication. The results show that behavioral mimicry enhances creativity in mixed-gender pairs (male–female), but not in same-gender pairs (male–male and female–female). These findings suggest that behavioral mimicry in avatar communication enhances creativity, which is an important factor in productivity and innovation, in cooperative situations, particularly in mixed-gender groups. Although this study opens a new avenue for using avatar communication systems to improve performance in workplace environments, it also indicates that various factors such as gender composition should be considered to maximize benefits.
PaperPal: An AI-Powered Platform for Collaborative Reading and Discussion
Collaborative learning with research papers is crucial for academic growth, yet the applications available today rarely provide the engaging and effective experience needed for collaborative learning. We propose a solution that simplifies working with academic papers while making the experience interactive and helpful. This paper introduces a platform designed to transform how groups collaborate on research papers, making the process more dynamic, intuitive, and productive. The system integrates real-time collaborative annotation, AI-generated tools, layered explanations, adaptive quizzes, and structured discussion workflows, all powered by large language models (LLMs). It tracks engagement and highlights key focus areas based on collective user behaviour. Our early implementation demonstrates the potential of generative AI to meaningfully enhance academic group learning by making it more inclusive, intelligent, and participatory. Through this work, we aim to explore how large language models and AI integration can transform collaborative academic tools, fostering deeper collaboration and understanding.
CitePeek: Contextual Citation Exploration Within Research Papers
Academic reading requires frequent citation engagement, but traditional PDF readers do not provide contextual support to understand cited works. Current tools require readers to navigate away from the main text to access citation details. This disrupts the reading flow, highlighting a significant gap in seamless citation exploration. We present CitePeek, an interactive PDF assistant that transforms citations in a research paper into interactive elements. Clicking a citation reveals a pop-up with metadata and an abstract for quick critical insights. A comprehensive sidebar offers Surface and Deep Dive information levels and cited paper results. Surface provides key points, Deep Dive gives narrative explanations with visuals, and Results highlight significant experimental outcomes with data visualizations. CitePeek calculates contextual relevance through similarity scoring, presenting relevant paragraphs and relevance-based highlighting in original PDFs. CitePeek identifies related papers within the reference network, delivering essential citation details directly within the reading environment.
ResumeGenAI: Supporting Job Seekers with LLM-Driven Resume Feedback
Rising unemployment and rapid advances in AI have disrupted the job market, making it more important than ever for job seekers to present themselves effectively—starting with their resumes. While prior research has explored resume evaluation, little attention has been given to leveraging AI for delivering personalized, detailed, instant and actionable feedback. We present ResumeGenAI, an online platform designed to support job seekers create resumes, personalize them to specific job roles, compare them, and seek help through a conversational interface. Through a three-week study (N = 34), we evaluated its effectiveness in helping job seekers build resumes. We found that the platform significantly improved resume quality in areas such as skills, grammar, and overall impact, as assessed by expert reviewers. Participants also highlighted features like resume personalization, export analysis, and resume templates as particularly valuable.
Exploring the role of metacognitive abilities and trust in interaction with Generative AI
Generative AI enables users to explore limitless possibilities, but its open-ended nature introduces ambiguity that differs from traditional GUIs. As general users integrate GenAI into their personal and professional workflows, challenges around prompting and usability have emerged. This study examines these challenges through the lens of metacognition, specifically the metacognitive abilities of monitoring and control - self-awareness and task decomposition, during intent-based interactions with a GenAI tool, exploring how these abilities influence control over three types of tasks. Our findings reveal that self-awareness is more critical in simpler tasks, while task decomposition becomes crucial as task complexity and output novelty increase. Additionally, we investigate underlying trust in AI, finding contradictions between user’s metacognitive awareness and their faith across tasks, revealing the role of output evaluation with domain knowledge. Based on these insights, we offer recommendations for enhancing metacognitive support in GenAI tools and suggest directions for future research.
Remembering Things Makes Chatbots Sound Smarter, but Less Trustworthy - A Pilot Study
Memory is a key aspect of human interaction, allowing us to remember facts about our interlocutors and personalise the way we interact. This pilot study investigates how a semantic long-term memory affects user assessments of chatbot likeability, perceived intelligence, and perceived safety, addressing limitations in Large Language Model (LLM) memory. We introduce MemoryGraph, a knowledge graph-based memory system for LLMs that enables visual inspection of memories by users. A user study compared interactions with a chatbot under three conditions: no memory, memory alone, and memory with visualisation. User ratings for likeability, intelligence, and safety were recorded. Preliminary findings show that adding a memory without visualisation reduced positive assessments compared to the baseline, whereas combining memory with visualisations improved these ratings. This tentatively suggests that the observability of memory functions influences key user perceptions of memory-augmented conversational AI, although more research is needed to confirm these initial results.
Feedstack: Layering Structured Representations Over Unstructured Feedback to Scaffold Human–AI Conversation
Many conversational user interfaces facilitate linear conversations with turn-based dialogue, similar to face-to-face conversations between people. However, digital conversations can afford more than simple back-and-forth; they can be layered with interaction techniques and structured representations that scaffold exploration, reflection, and shared understanding between users and AI systems. We introduce Feedstack, a speculative interface that augments feedback conversations with layered affordances for organizing, navigating, and externalizing feedback. These layered structures serve as a shared representation of the conversation that can surface user intent and reveal underlying design principles. This work represents an early exploration of this vision using a research-through-design approach. We describe system features and design rationale, and present insights from two formative (n=8, n=8) studies to examine how novice designers engage with these layered supports. Rather than presenting a conclusive evaluation, we reflect on Feedstack as a design probe that opens up new directions for conversational feedback systems.
Designing Child-Safe Conversational AI: Three Dilemmas for Responsible Design
As AI-driven conversational user interfaces become increasingly integrated into children's daily lives, there is an urgent need for responsible design that protects children's safety, developmental needs, and well-being. While recent guidelines have outlined high-level principles for ethical AI, practical design dilemmas around conversational AI interfaces for children remain underexplored. This paper addresses that gap by identifying and analyzing three recurring tensions in the design of child-safe conversational AI interfaces: (1) safety versus engagement, (2) personalization versus privacy, and (3) autonomy versus protection. The paper argues that these are not binary choices but enduring trade-offs that require situated judgement, developmental sensitivity, and participatory design. It then proposes a framework to help address these dilemmas. By framing enduring design tensions as “dilemmas” rather than quick fixes, this paper seeks to help foster realistic and nuanced dialogue around the ethics of child-centered conversational AI interfaces – and responsible CUI design for children more broadly.
Beyond Face Value: Visual and Auditory Signals in Human and Machine Trust Judgments
As conversational agents become increasingly multimodal, they invite human-like evaluations—especially in trust-sensitive contexts. Building on the human tendency to form rapid judgments from subtle visual and auditory cues, we explore how trust is constructed from faces and voices. In a behavioral experiment, 150 participants rated bimodal stimuli across four trust congruence conditions. We then trained a multimodal model using HuBERT and ResNet-50 with late fusion to predict trust scores. To examine alignment between human and model judgments, we applied Permutation Feature Importance (PFI) to compare the most influential features. Our results highlight the dominance of auditory cues in both human and model trust evaluations, while revealing subtle but meaningful differences in feature weighting across modalities and conditions.
Voice Profiles vs. Text Transcripts, Do Data Modality Layers in Voice Recordings Matter to Alexa Users? The Role of Modality in Voice Data Privacy
This study explores user responses to two distinctive data units /components (i.e., text transcripts and voice profiles) identified within a single data source (i.e., voice recordings). Participants were randomly assigned to one of three scenarios in which Amazon Alexa presented data used for personalization differently (i.e., voice recordings vs. voice profiles + text transcripts vs. text transcripts only). Users’ perceived information sensitivity across different data types was also surveyed. The main findings include that users distinguish voice interaction data from other types of personal information, and those who report higher data sensitivity over voice interaction data assess its privacy risks higher. Furthermore, only using text transcripts (without voice profiles) for personalization alleviated users’ perceived privacy risk. This study informs future privacy implications for data transparency and design to incorporate modality differences in user data collected, stored, and processed towards personalization of voice-enabled CUIs (like OpenAI ChatGPT and Google Gemini) that extend traditional voice assistants (like Amazon Alexa and Apple Siri).
SESSION: Interactivity and Creativity
DeTAILS: Deep Thematic Analysis with Iterative LLM Support
Qualitative thematic analysis (TA) is valuable but challenging to scale because of the time, cost, and complexity involved. Existing tools offer limited automation and often lack flexibility, trustworthiness, and usability with unstructured data. To address this gap, we developed DeTAILS 1 - an interactive toolkit that integrates large language models into reflexive TA workflows. Built on a modern architecture, DeTAILS supports qualitative data analysis through iterative, LLM-in-the-loop interactions. Next, we plan to conduct a formal evaluation with domain experts to assess its usability, trust, and analytic value. By embedding interactive LLM-in‑the‑loop processes, DeTAILS strives to extend TA to far larger datasets while preserving analytic depth and researcher reflexivity-going beyond the throughput limits of manual coding in a given time.
Eternagram: Inspiring Climate Action Through LLM-based Conversational Exploration of a Post-Devastation Climate Future
Climate action is difficult to persuade because we tend to perceive climate change as remote and disconnected from daily life. Instead of traditional informational engagements, game-based interventions can create narratives that immerse the visitor in situations where their actions have tangible consequences. To make these narratives engaging, we used a speculative scenario of an alien stumbling upon social media to obliquely address climate change through a text-based adventure game installation. Mimicking visitors’ natural dialogue in social media apps, we designed an LLM-based chatbot with knowledge of post-climate devastated world that mirrors our own planet Earth. In discovering the world’s downfall through interactive chatting and posted images, players begin to realize that their own actions can make a difference on impacts of climate change in this distant world, fostering pro-environmental attitudes. Previously published at CHI, this game installation demonstrates the potential of LLM-based creative narratives in exploring speculative worlds driving social change.
Designing a Simulated Patient System for Medical Education
The foundation of successful patient-doctor relationships lies in effective communication, which promotes patient autonomy, trust, and comprehension. As a result, medical education has placed growing emphasis on improving clinical communication skills. Medical students gain experience through interactions with standardized patients (SPs) - trained individuals who simulate specific patient scenarios in controlled educational settings. However, recruiting SP actors can be difficult, and students tend to have limited interactions with them (i.e., during weekly classes or final examination).
In this work, we develop an LLM-driven simulated patient system in collaboration with an R1 research university’s SP training program. The system consists of the patient agent, which simulates a SP case, alongside an educational agent, which can provide helpful suggestions and feedback to the user. We hope that this system can augment pre-existing standardized patient programs to improve clinical competency among medical students and professionals.
Values in the Loop: Designing Interactive Optimization with Conversational Feedback
We present an interactive system for value-guided decision-making in constrained optimization problems, enabling users to discover, articulate, and refine their personal values through conversational reflection and visual feedback. Rooted in a generative theory of values (as evolving principles that guide action) our system empowers users to explore trade-offs in resource allocation tasks, such as coordinating carpools for extracurricular activities. The core contribution is a human-in-the-loop architecture that combines a natural language interface with dynamic value visualizations and constraint-based filtering. Users express their priorities through conversation, which the system interprets to infer value importance scores and translate them into structured optimization constraints. These constraints filter a live dataset of potential matches, visualized on an interactive map and calendar. As users revise their choices, the system updates its internal model and refreshes the solution space in real time, closing the loop between expression, computation, and reflection. This approach supports adaptive, user-centered solutions where logistical feasibility intersects with personal concerns.
SESSION: Workshops and Tutorials
Personas Evolved: Designing Ethical LLM-Based Conversational Agent Personalities
The emergence of Large Language Models (LLMs) has revolutionized Conversational User Interfaces (CUIs), enabling more dynamic, context-aware, and human-like interactions across diverse domains, from social sciences to healthcare. However, the rapid adoption of LLM-based personas raises critical ethical and practical concerns, including bias, manipulation, and unforeseen social consequences. Unlike traditional CUIs, where personas are carefully designed with clear intent, LLM-based personas generate responses dynamically from vast datasets, making their behavior less predictable and harder to govern. This workshop aims to bridge the gap between CUI and broader AI communities by fostering a cross-disciplinary dialogue on the responsible design and evaluation of LLM-based personas. Bringing together researchers, designers, and practitioners, we will explore best practices, develop ethical guidelines, and promote frameworks that ensure transparency, inclusivity, and user-centered interactions. By addressing these challenges collaboratively, we seek to shape the future of LLM-driven CUIs in ways that align with societal values and expectations.
ToMinHAI at CUI’2025: Theory of Mind in Human-CUI Interaction
New AI developments are enabling CUIs to take on diverse social roles to facilitate interactions with humans. To support such increasingly complex and social interactions, researchers draw from Theory of Mind (ToM)—our ability to attribute mental states like intentions, goals, and emotions to ourselves and others for seamless communication. Given ToM’s importance in human interaction, AI and HCI researchers explore both building ToM-like capabilities in CUIs and understanding how humans attribute mental states to CUIs. These perspectives form the emerging paradigm of Mutual Theory of Mind (MToM) in human-CUI interaction, where both parties iteratively interpret each other’s internal states. Building on the success of the 1st ToMinHAI workshop at CHI 2024, this installment invites researchers from AI, ML, HCI, and related fields to discuss ToM in human-CUI interactions to inform the future design of conversational AI.
Designing conversational agent(ic) systems
While researchers have been studying and building AI agents and autonomous systems for decades, generative AI and novel architectures have enabled a new generation of agentic systems. New frameworks are making it easier to create agentic systems in which agents powered by generative AI support more complex and ambiguous tasks with less human intervention. As agentic systems become easier to build and deploy, it is important to understand the capabilities, benefits, risks, and design considerations around these systems. In this tutorial, we introduce concepts and considerations behind designing building agentic systems in particular in the context of conversational systems. We address not only the basics of modern agentic systems, but also how these new systems relate to existing knowledge around the design of conversational interfaces.
Bias and Fairness in Conversational User Interfaces
Conversational User Interfaces (CUIs), including chatbots, virtual agents and social robots, are increasingly shaping how we communicate, seek support and access services. Yet, as these systems grow more sophisticated, concerns about bias and fairness in their design and deployment have become increasingly urgent. We propose a multidimensional approach to bias and fairness in CUIs that spans four interconnected themes: conceptual grounding, verbal communication, multimodal expression and interactional dynamics. Rather than framing bias merely as a technical flaw, we argue that it should be understood as a relational, interactional and design-based phenomenon. Accordingly, in this workshop, we aim to foster critical discussion around how CUIs encode social norms, perpetuate or mitigate exclusion, and shape perceptions of fairness through their language, embodiment and behaviour. By bringing together researchers, designers and policymakers, the workshop will explore pathways towards more equitable and transparent CUIs. The goal is to promote a relational understanding of fairness, one that centres user experience and social context, to guide future work in conversational AI.
Building Conversational User Interfaces: An Architectural Exploration with Meta Glasses for Developers and Researchers
Meta and Ray-Ban developed Meta Glasses, a wearable technology breakthrough. It creates hands-free Conversational User Interfaces (CUIs) through audio input, cameras, and audio output. These capabilities make them well-suited for professional applications in fields like veterinary medicine and quality assurance, where users must efficiently capture images, organize workflows, and access critical information without interrupting their tasks. This tutorial addresses the gap between academic research and practical applications of CUIs for wearable devices. It explains the Meta Glasses architecture and its potential for other systems. We focus on connecting the glasses to third-party applications, leveraging community-driven tools without an official SDK. Outcomes include practical strategies for connecting own systems, API limitations and privacy concerns. Participants will learn to design and implement CUI solutions for Meta Glasses in industrial and academic contexts. As we bridge theoretical concepts to real-world applications, the session empowers developers to innovate new solutions for wearable conversational technology.
Designing Age-Inclusive Interfaces: Emerging Conversational and Generative AI to Support Interactions across the Life Span
We are concurrently witnessing two significant shifts: voice and chat-based conversational user interfaces (CUIs) are becoming ubiquitous (especially more recently due to advances in generative AI and LLMs - large language models), and older people are becoming a very large demographic group (and increasingly adopting of mobile technology on which such interfaces are present). However, despite the recent increase in research activity, age-relevant and inter/cross-generational aspects continue to be underrepresented in both research and commercial product design. Therefore, the overarching aim of this workshop is to increase the momentum for research within the space of hands-free, mobile, and conversational interfaces that centers on age-relevant and inter- and cross-generational interaction. For this, we plan to create an interdisciplinary space that brings together researchers, designers, practitioners, and users, to discuss and share challenges, principles, and strategies for designing such interfaces across the life span. We thus welcome contributions of empirical studies, theories, design, and evaluation of hands-free, mobile, and conversational interfaces designed with aging in mind (e.g. older adults or inter/cross-generational). We particularly encourage contributions focused on leveraging recent advances in generative AI or LLMs. Through this, we aim to grow the community of CUI researchers across disciplinary boundaries (human-computer interaction, voice and language technologies, geronto-technologies, information studies, etc.) that are engaged in the shared goal of ensuring that the aging dimension is appropriately incorporated in mobile / conversational interaction design research.
Conversational Health Interfaces in the Era of LLMs: Designing for Engagement, Privacy, and Wellbeing
As Large Language Models (LLMs) revolutionize Conversational User Interfaces (CUIs) in health and wellbeing, these technologies offer unprecedented potential to enhance user wellbeing by improving physical health, psychological resilience, and social connectivity. However, the integration of such advanced AI into everyday CUI health applications brings substantial challenges, including privacy, user agency, and the psychological impacts of AI interactions. This workshop will provide a platform for collaborative dialogue to explore leveraging these advancements to improve health outcomes while addressing the ethical challenges and risks. Through presentations, breakout sessions, and collaborative discussions, participants will delve into themes such as designing multimodal CUI interventions, structuring conversational interventions for privacy and engagement, personalizing user experiences, and developing proactive and context-adaptive CUI strategies. These discussions aim to develop effective, user-centered CUI strategies that ensure the benefits of LLM-driven innovations are realized without compromising user wellbeing.
SESSION: Trust, Human Behaviour, and CUIs
Navigating the Fog: How University Students Recalibrate Sensemaking Practices to Address Plausible Falsehoods in LLM Outputs
LLM interfaces, such as ChatGPT, are widely used by students in higher education. However, their reliability is compromised by the tendency to generate plausible yet factually inaccurate content. This issue is particularly critical as the HCI community shows growing interest in designing LLM-based educational technology. Despite this interest, we have yet to learn how plausible falsehoods disrupt students’ real-time sensemaking of outputs from imperfectly reliable LLMs, and how students currently attempt to mitigate these negative effects. Thus, we conducted a case study of 15 university students using ChatGPT through think-aloud tasks and semi-structured interviews. We identified recurring patterns of sensemaking, with students facing challenges such as relying on intuitive guesses and feeling overwhelmed by LLM’s lengthy, sycophantic, and overconfident responses. They adapted by inducing inconsistencies from the LLM’s responses and strategically dividing tasks between themselves and the LLM. Lastly, our study highlights several design implications for future reliable LLM interfaces.
Exploring Artists’ and Art Viewers’ Perspectives for Art Chatbots: Implications for a Design Framework
Recent advances in large language models (LLMs) and conversational user interfaces (CUIs) unlock new ways to help art viewers get answers about artworks. To clarify the roles that artists and viewers envision for art chatbots, we conducted two empirical studies in the domain of traditional Chinese painting, given its cultural depth. First, we interviewed five artists about how they currently respond to viewer inquiries and their attitudes toward chatbots. Second, we asked art viewers (N=102) to pose questions to either an artist or a chatbot. Results show that artists see chatbots as useful for factual or repetitive queries but hesitate to entrust emotive or personal discussions to them. Viewers also favor chatbots for efficiency but desire human input for deeper or personal topics. Based on these insights, we propose a design framework that balances the perspectives of both artists and viewers, contributing to the CUI community's understanding of domain-specific chatbot design.
When AI Joins the Negotiation Table: Evaluating AI as a Moderator
Negotiation is a crucial decision-making process where parties seek to resolve differences and optimize outcomes. While prior research has focused on maximizing negotiation outcomes, fostering a collaborative atmosphere is essential for long-term relationship-building. This study explores the role of AI-assisted moderation in negotiations that emulate high-stress environments. We developed a text-based AI moderator and evaluated its usability and effectiveness in a two-phase study: a pilot study with 14 participants followed by a final user study with 16 participants. To provide an initial point of comparison, we assessed trust, respect, and equitability in AI-moderated versus non-moderated negotiations. Quantitative findings indicate a negative effect of AI-assisted moderation on relationship-building, while qualitative insights suggest that AI moderation fosters collaboration. However, the cognitive load of text-based facilitation hinders its effectiveness. These results highlight the importance of seamless AI integration and contribute to the broader discourse on AI’s role in behavior change and mediated communication.
How Managers Perceive AI-Assisted Conversational Training for Workplace Communication
Effective workplace communication is essential for managerial success, yet many managers lack access to tailored and sustained training. Although AI-assisted communication systems may offer scalable training solutions, little is known about how managers envision the role of AI in helping them improve their communication skills. To investigate this, we designed a conversational role-play system, CommCoach, as a functional probe to understand how managers anticipate using AI to practice their communication skills. Through semi-structured interviews, participants emphasized the value of adaptive, low-risk simulations for practicing difficult workplace conversations. They also highlighted opportunities, including human-AI teaming, transparent and context-aware feedback, and greater control over AI-generated personas. AI-assisted communication training should balance personalization, structured learning objectives, and adaptability to different user styles and contexts. However, achieving this requires carefully navigating tensions between adaptive and consistent AI feedback, realism and potential bias, and the open-ended nature of AI conversations versus structured workplace discourse.
Towards Teams being Led by a Conversational Agent
Rapid advances in conversational agents are making it possible for them to lead human teams. However, considerably less is known about how human teams perform under their guidance. To investigate this, we devised a Wizard-of-Oz study where teams of 3 participants (ground players) were tasked with locating and corralling targets in a virtual desert environment. A fourth player (operator), had access to a top-down map view of the task environment. Importantly, the operator could solely provide oral guidance to the ground players on the task. In half of the trials, participants were led by a human operator, whereas in the other half of the trials they were led by a conversational agent whose responses were controlled by the same human operator (Wizard). Although teams led by the human operator communicated significantly more and ground players preferred the human operator, no differences in team performance were observed.
Hearing Ambiguity: Exploring Beyond-Gender Impressions of Artificial Ambiguous Voices
Voice perception plays a fundamental role in all types of interactions, from human-to-human communication to human-technology interaction. When it comes to technology, we sometimes have the option to choose the type of voice we want to hear. But why is the default (almost) always a feminine or masculine voice? In this research, we evaluated user perceptions of gender-ambiguous voices, a relatively unexplored option. In our novel comparative study, we evaluated six gender-ambiguous voices with participants of diverse gender identities (men, women, and non-binary individuals), with 74 participants in each group. Additionally, half of the participants were told in advance that the voices had been designed to be gender-ambiguous, and half were not. We aimed to move beyond subjective perceptions of voice gender by exploring how such voices are perceived across different dimensions: trustworthiness, appeal, comfort, anthropomorphism, and aversion. Our findings reveal that while men and women had similar perceptions, non-binary participants rated the voices more negatively, with lower trust and higher aversion. Interestingly, priming participants about the voices’ ambiguity did not significantly affect overall perceptions, though it increased critical evaluations from non-binary individuals. These findings contribute to growing research on gender-ambiguous voices by providing perceptual comparisons of multiple voices and highlighting the need for more inclusive voice designs that appeal to non-binary users.
The Art of Talking Machines: A Comprehensive Literature Review of Conversational User Interfaces
Conversational User Interfaces (CUIs) enable human-like interactions via voice, text, and multimodal communication, driven by natural language processing and machine learning. Prior literature reviews have primarily focused on specific application domains or design aspects, lacking an integrated, multi-dimensional analysis. This study addresses this gap by providing a structured framework synthesizing CUI research into interface design, system development, and ethical considerations. Our analysis highlights advancements in CUI design, such as dialogue structure, multimodal interactions, and adaptability. It also reveals persistent challenges, including bias in persona design, trust calibration, and data privacy. System development benefits from improvements in NLP, conversation memory, and multilingual capabilities. Ethical considerations, including social bias, user autonomy, and transparency, remain central to discussions on responsible CUI design. By analyzing existing research, we identify key gaps and suggest future directions, including multilingual and culturally adaptive CUIs, privacy-preserving AI techniques, and enhanced reasoning mechanisms for context-aware interactions.
SESSION: Designing CUIs
Design Activity Simulation: Opportunities and Challenges in Using Multiple Communicative AI Agents to Tackle Design Problems
Large Language Models (LLMs) can enhance structured design thinking, yet existing copilot approaches integrate them into human workflows rather than exploring their autonomous potential. This paper investigates how LLM-based communicative AI agents can independently tackle open-ended design problems and how their strengths and limitations inform human-AI collaboration. We iteratively design a system where AI agents play different roles and simulate human design activity through conversational turns. The agents investigate user needs, identify design constraints, and explore the design space, with useful insights emerging from their interactions. To assess reasoning quality, we conducted a human jury evaluation with five HCI researchers and explored potential applications through a contextual inquiry with seven professionals. Our findings demonstrate that integrating human design thinking techniques enhances AI reasoning. AI agents effectively tackle design problems, generating low-novelty yet well-grounded and practical solutions that meet key design requirements.
Towards Age-Inclusive Conversational interfaces: Understanding Requirements Across Age Groups
As Conversational User Interfaces (CUI) become integrated into daily life, users’ diversity, particularly in age, is increasing. However, older adults often encounter challenges interacting with CUI. Although some of these challenges can be mitigated through age-specific design, many mass-market CUI systems (e.g., smart speakers) are intended for a broad range of consumers. Designing such interfaces that support users of all ages is predicated on a clear understanding of age-based similarities and differences when interacting with CUI. Prior research primarily focused on differences in interaction behaviours. However, we still lack a formal understanding of similarities and differences not only in behaviours, but also in expectations and needs for interacting with CUI. In this paper, we first present an age-based comparison of CUI-related user behaviours, expectations, and needs, synthesized around seven major themes based on a systematic literature review. We then reflect on the implications these have for age-inclusive adaptive CUI design.
Transparent Conversational Agents: The Impact of Capability Communication on User Behavior and Mental Model Alignment
When a user interacts with a conversational agent for the first time, they may not be aware of the agent’s capabilities, leading to suboptimal use or interaction breakdowns. To avoid a mismatch with the actual capabilities, the agent’s capabilities have to be made transparent to the user. To investigate whether communication of an agent’s capabilities during interactions enhances transparency and improves the user’s mental model, we conducted a user study with 56 participants. Each participant had three speech-based interactions with an agent that communicated its capabilities or an agent that did not. Our results suggest that the communication led to a change in user behavior with significantly longer utterances. However, the users’ mental models of the agent’s capabilities were not significantly different between the conditions. Participants were able to significantly improve their knowledge of the agent’s capabilities by aligning their mental model over time in both conditions.
Mitigating Response Delays in Free-Form Conversations with LLM-powered Intelligent Virtual Agents
We investigated the challenges of mitigating response delays in free-form conversations with virtual agents powered by Large Language Models (LLMs) within Virtual Reality (VR). For this, we used conversational fillers, such as gestures and verbal cues, to bridge delays between user input and system responses and evaluate their effectiveness across various latency levels and interaction scenarios. We found that latency above 4 seconds degrades quality of experience, while natural conversational fillers improve perceived response time, especially in high-delay conditions. Our findings provide insights for practitioners and researchers to optimize user engagement whenever conversational systems’ responses are delayed by network limitations or slow hardware. We also contribute an open-source pipeline that streamlines deploying conversational agents in virtual environments.
TacTalk: Personalizing Haptics Through Conversation
Haptic experiences are highly personal, but despite prior work exploring interfaces enabling personalization, we don’t know what process drives the personalization of haptics. To enable a study of this process, including users’ mental models and vocabularies, we introduce TacTalk, a conversational system enabling real time tuning of virtual haptic experiences. We present an application using TacTalk in a popular racing video game, Forza Horizon 5. Through an empirical study, we find that tracking user preference profiles may improve TacTalk’s ability to cater to individual differences, and that TacTalk is more usable than an existing slider-based personalization tool. A thematic analysis of participant interviews reveals an archetypal process of conversational personalization - starting with real-world experiences and domain-specific metaphors, then subsequently inspecting specific aspects of the experience including in-game events and the game controller.
Understanding the Challenges and Design Opportunities of Using Voice Assistants to Support Postpartum Mothers in Brazil
Postpartum period is a crucial time for physical and mental adjustment for a mother, which can worsen through increased demands and mental load. In Brazil, as in many Latin American countries, the unequal division of childcare responsibilities increases mothers’ risks of postpartum depression and anxiety while preventing mothers from focusing on their recovery. While today’s Voice Assistants (VAs) is promising to offer a hands-free, eyes-free and on-demand support, it remains unclear how VAs can be designed to effectively support mothers and their associated tasks during the postpartum period. To address this challenge, we conducted an online survey study with 55 Brazilian mothers to investigate how VAs support postpartum mothers and their current usage in childcare-related tasks. We identified key challenges preventing VAs from effectively supporting Brazilian mothers, including language barriers, lack of personalized information retrieval, and missing features tailored to postpartum care and early childhood needs. We then proposed a set of design considerations for how VAs could meet mothers’ needs for greater adoption in Brazil.
Visual-Conversational Interface for Evidence-Based Explanation of Diabetes Risk Prediction
Healthcare professionals need effective ways to use, understand, and validate AI-driven clinical decision support systems. Existing systems face two key limitations: complex visualizations and lack of grounding in scientific evidence. We present an integrated Decision Support System that combines interactive visualizations with a conversational agent for explaining diabetes risk assessments. We propose a hybrid prompt handling approach combining fine-tuned language models for analytical queries with general Large Language Models (LLMs) for broader medical questions, a methodology for grounding AI explanations in scientific evidence and a feature range analysis technique to support deeper understanding of feature contributions. We conducted a mixed-methods study with 30 healthcare professionals and found that the conversational interactions helped healthcare professionals build a clear understanding of model assessments, while the integration of scientific evidence calibrated trust in the system’s decisions. Most participants reported that the system supported both patient risk evaluation and recommendation.
SESSION: In-context and Context-aware CUIs
ActionaBot: Structuring Metacognitive Conversations towards In-Situ Awareness in How-To Instruction Following
People often rely on shared procedures and tips to handle unfamiliar tasks, but following tutorials can be challenging. Individuals may skip steps, alter actions, or miss information, leading to mistakes or task failure. Tutorials are often based on personal experiences and may omit important details, which vary with context. Furthermore, when others attempt to follow these tutorials, differing situations can make it hard to follow the steps or track progress. Inspired by how coworkers discuss work status and work approach in-situ through metacognitive conversations, we propose Action-a-bot, a chatbot framework that transforms static tutorials into interactive, structural, step-by-step guidance. Action-a-bot drives users to focus on each step, review what they’ve completed, and anticipate the next steps, while adapting actions and solving problems. Our study explores how human-chatbot interaction can improve task completion and make tutorials more actionable by increasing user engagement and awareness of the work situation. We discuss the potential of chatbots in supporting instructional communication and task execution.
Chat with the 'For You' Algorithm: An LLM-Enhanced Chatbot for Controlling Video Recommendation Flow
The rise of short-form video platforms like TikTok, driven by algorithmic recommendations, fosters immersive flow experiences. While users value personalization and engagement, they also seek greater agency over their For You recommendations. This paper designs, prototypes, and evaluates TKGPT, an LLM-enhanced conversational interface that helps users articulate their interests and understand recommendations. Through qualitative interviews and a user study, we examine how the TKGPT influences algorithmic folk theories and the sense of agency. Findings show that users primarily use TKGPT to seek relevant videos, explain preferences, and exert control over the algorithm. The resulting For You videos better reflect user interests, enhance the understanding of algorithm, improve content relevance, and reduce feelings of exploitation. Notably, users’ sense of agency is significantly associated with their improved understanding of how the algorithm works. We discuss the opportunities and challenges of using conversational user interfaces to enhance user control over video recommendations.
DesignMinds: Enhancing Video-Based Design Ideation with a Vision-Language Model and a Context-Injected Large Language Model
Ideation is a critical component of video-based design (VBD), where videos serve as the primary medium for design exploration and inspiration. The emergence of generative AI offers considerable potential to enhance this process by streamlining video analysis and facilitating idea generation. In this paper, we present DesignMinds, a prototype that integrates a state-of-the-art Vision-Language Model (VLM) with a context-enhanced Large Language Model (LLM) to support ideation in VBD. To evaluate DesignMinds, we conducted a between-subject study with 35 design practitioners, comparing its performance to a baseline condition. Our results demonstrate that DesignMinds significantly enhances the flexibility and originality of ideation, while also increasing task engagement. Importantly, the introduction of this technology did not negatively impact user experience, technology acceptance, or usability.
From Goals to Actions: Designing Context-aware LLM Chatbots for New Year's Resolutions
When pursuing new goals, people often struggle to determine what actions to take. Large-language-model (LLM) chatbots can provide information and interactivity, and combining them with context awareness could enhance the relevance and proactivity of action recommendations. However, there is a gap in understanding the role that such technologies can play in taking a holistic view of the user’s multiple goals, complex contexts, and constraints over time. We developed a technology probe of a personalized context-aware LLM chatbot and deployed it with 14 participants for 2-4 weeks for their 2024 New Year’s resolutions. We observed users achieve a high adoption rate of actions and greater success in the pursuit of goals in the first week, as well as the rapidly evolving user needs over time. We discuss how to best leverage context-awareness for AI agent design, and the novel roles that AI could adopt for an ecosystem of services and agents.
NeuroChat: A Neuroadaptive AI Chatbot for Customizing Learning Experiences
Generative AI is reshaping education by enabling personalized, on-demand learning experiences. However, current AI systems lack awareness of the learner’s cognitive state, limiting their adaptability. In parallel, electroencephalography (EEG)-based neuroadaptive systems have shown promise in enhancing engagement through real-time physiological feedback. This paper introduces NeuroChat, a neuroadaptive AI tutor that integrates real-time EEG-based engagement tracking with a large language model to adapt its conversational responses. By continuously monitoring learners’ cognitive engagement, NeuroChat dynamically adjusts content complexity, tone, and response style in a closed-loop interaction. In a within-subjects study (n = 24), NeuroChat significantly increased both EEG-measured and self-reported engagement compared to a non-adaptive chatbot. However, no significant differences in short-term learning outcomes were observed. These findings demonstrate the feasibility of real-time brain–AI interaction for education and highlight opportunities for deeper personalization, longer-term adaptation, and richer learning assessment in future neuroadaptive systems.
Learn, Explore and Reflect by Chatting: Understanding the Value of an LLM-Based Voting Advice Application Chatbot
Voting advice applications (VAAs), which have become increasingly prominent in European elections, are seen as a successful tool for boosting electorates’ political knowledge and engagement. However, VAAs’ complex language and rigid presentation constrain their utility to less-sophisticated voters. While previous work enhanced VAAs’ click-based interaction with scripted explanations, a conversational chatbot’s potential for tailored discussion and deliberate political decision-making remains untapped. Our exploratory mixed-method study investigates how LLM-based chatbots can support voting preparation. We deployed a VAA chatbot to 331 users before Germany’s 2024 European Parliament election, gathering insights from surveys, conversation logs, and 10 follow-up interviews. Participants found the VAA chatbot intuitive and informative, citing its simple language and flexible interaction. We further uncovered VAA chatbots’ role as a catalyst for reflection and rationalization. Expanding on participants’ desire for transparency, we provide design recommendations for building interactive and trustworthy VAA chatbots.
WatchWithMe: LLM-Based Interactive Guided Watching of Review Videos
Videos are a popular way for viewers to follow topics of interest. In areas such as product and technology reviews, videos often present in-depth perspectives in a compact fashion, driving viewers to look for additional explanations. We propose WatchWithMe, an automatic approach that provides viewers in-context guided watching during video playback. Powered by large language models, WatchWithMe generates guided materials from the video transcript as if creating a reading guide, including summaries, highlights, and question prompts. WatchWithMe reveals relevant information responsive to the spoken content in a review video. Viewers skim and prompt in our text-based conversational UI, to which we automatically expand the video viewing context to the model for contextual responses. We evaluated WatchWithMe with public videos and collected feedback from 20 participants. Findings showed that our method encouraged viewers to seek out viewpoints or confirmations related to the video topics.
SESSION: CUIs for Health and Wellbeing
The Impact of a Chatbot's Ephemerality-Framing on Self-Disclosure Perceptions
Self-disclosure, the sharing of one’s thoughts and feelings, is affected by the perceived relationship between individuals. While chatbots are increasingly used for self-disclosure, the impact of a chatbot’s framing on users’ self-disclosure remains under-explored. We investigated how a chatbot’s description of its relationship with users, particularly in terms of ephemerality, affects self-disclosure. Specifically, we compared a Familiar chatbot, presenting itself as a companion remembering past interactions, with a Stranger chatbot, presenting itself as a new, unacquainted entity in each conversation. In a mixed factorial design, participants engaged with either the Familiar or Stranger chatbot in two sessions across two days, with one conversation focusing on Emotional- and another Factual-disclosure. When Emotional-disclosure was sought in the first chatting session, Stranger-condition participants felt more comfortable self-disclosing. However, when Factual-disclosure was sought first, these differences were replaced by more enjoyment among Familiar-condition participants. Qualitative findings showed Stranger afforded anonymity and reduced judgement, whereas Familiar sometimes felt intrusive unless rapport was built via low-risk Factual-disclosure.
Understanding the Multimodal Voice Assistant as an Informal and Social Care Support Tool in the UK
Telecare devices help deliver health and care outside of clinical settings. However, digital infrastructure modernisation in the UK could render previously relied-on telecare devices for social care obsolete. With local councils and informal carers struggling to contain social care costs and provide quality health and care, there is a need to provide technology-enabled care using updated digital infrastructure, and there is promise in using cheap and widely available voice assistants as care devices. We examine the feasibility of Amazon Echo Show as a care device for recipients of social care in the UK. Differences between the ten households in receipt of care meant that the functionality and experience of using Echo Show varied over the three months of the qualitative study, however we captured promising use cases such as direct access to carers in an emergency, despite some negative experiences like the exacerbation of cognitive limitations.
PITCH: Designing Agentic Conversational Support for Planning and Self-reflection
Effective planning and reflection are essential for knowledge workers’ productivity and well-being, yet many struggle with them. While conversational agents (CAs) have shown promise, existing approaches rely on repetitive check-in without variance. We designed PITCH, a CA that checks in twice daily for morning planning and evening reflection while considering the morning conversation. A two-week field study with 12 graduate students demonstrated that engagement with PITCH increased their perceived well-being over time. We also evaluated a rotation strategy, which cycles through diverse topics every day, hypothesizing that rotation would mitigate wear-out effects and offer new perspectives. The results revealed that the specificity of a randomly chosen goal was perceived as being out of context and authoritarian, with most preferring the non-rotation version for consistency and flexibility. These findings highlight the potential of CAs to support knowledge workers and offer design considerations for varying conversations to provide topical diversity.
SmartEats: Investigating the Effects of Customizable Conversational Agent in Dietary Recommendations
In conversational recommender systems (CRS), the communication characteristics exhibited by the conversational agent (CA) can greatly shape user experience and their perceptions of the recommendation quality. Yet, prior work often adopts a one-size-fits-all approach, leaving the potential benefits of CA customizability—allowing users to tailor agent traits to their preferences—largely unexplored. We examine this gap in the context of dietary recommendations by introducing SmartEats, a CRS featuring a CA that can be customized by users. Through a between-subjects experiment (N = 214), we compared SmartEats to a non-customizable baseline, and followed up with participants after one week to understand whether and how the recommendations affect their food choices. We found that CA customizability directly improved participants’ immediate experience and indirectly enhanced their ability to later recall the recommendations. Reflecting on the findings, we discuss opportunities for CRS to enhance health and well-being by leveraging the customizability of emerging AI technologies.
Beyond Functionality: Co-Designing Voice User Interfaces for Older Adults' Well-being
The global population is rapidly aging, necessitating technologies that promote healthy aging. Voice User Interfaces (VUIs), leveraging natural language interaction, offer a promising solution for older adults due to their ease of use. However, current design practices often overemphasize functionality, neglecting older adults’ complex aspirations, psychological well-being, and social connectedness. To address this gap, we conducted co-design sessions with 20 older adults employing an empathic design approach. Half of the participants interacted with a probe involving health information learning, while the others focused on a probe related to exercise. This method engaged participants in collaborative activities to uncover non-functional requirements early in the design process. Results indicate that when encouraged to share their needs within a social context, older adults revealed a range of sensory, aesthetic, hedonic, and social preferences and, more importantly, the specific personas of VUIs. These insights inform the relative importance of these factors in VUI design.
Linguistic Diversity and Mental Well-Being: Co-Designing Custom AI Chatbots with Multilingual Mothers
Language plays a crucial role in mental health, from expressing emotions to challenging mental health stigma. While large language models provide significant potential to help address the global mental health crises, it remains unclear how conversational AI chatbots could support linguistic diversity. We investigate this problem space by focusing on women’s mental well-being and presenting the findings of co-design sessions with multilingual mothers. While participants described mental well-being related lexical gaps in non-English languages and cognitive effort in expressing emotions in English, they reported frequent code-switching and preferences to regulate emotions using their first language. When evaluating co-created AI chatbots in non-English languages, participants criticised inappropriate language styles, translation errors, and a lack of dialect-specific nuances. We discuss the importance of co-designing multilingual conversational user interfaces with linguistically diverse groups to mitigate health inequalities and support people’s unique and changing language preferences in self-managing their health and well-being.
PlanFitting: Personalized Exercise Planning with Large Language Model-driven Conversational Agent
Creating personalized and actionable exercise plans often requires iteration with experts, which can be costly and inaccessible to many individuals. This work explores the capabilities of Large Language Models (LLMs) in addressing these challenges. We present PlanFitting, an LLM-driven conversational agent that assists users in creating and refining personalized weekly exercise plans. By engaging users in free-form conversations, PlanFitting helps elicit users’ goals, availabilities, and potential obstacles, and enables individuals to generate personalized exercise plans aligned with established exercise guidelines. Our study—involving a user study, intrinsic evaluation, and expert evaluation—demonstrated PlanFitting’s ability to guide users to create tailored, actionable, and evidence-based plans. We discuss future design opportunities for LLM-driven conversational agents to create plans that better comply with exercise principles and accommodate personal constraints.
SESSION: Usability, UX, and Evaluation of CUIs
User Preferences in Conversational AI for Healthcare: Insights from an Interview Study
Chatbot-based symptom diagnosis apps are becoming increasingly popular, yet concerns remain around usability and user trust. This study explores user preferences regarding chatbot characteristics using a rhetorical structure in symptom diagnosis chatbots. We conducted 16 semi-structured interviews across two use-case groups (varying in symptom severity) and analyzed 69 user reviews from four chatbot applications. Findings show that users consistently valued logos (clear explanations, structured dialogue) and ethos (consistency, next steps), while pathos (emotional support) became more important in high-severity scenarios. Similarly, logos-based characteristics were pivotal in all phases, but ethos became prominent in the third phase – diagnosis delivery. Interviews uncovered various themes around dialogue management, interaction design, and personalization needs. App reviews supported these findings, highlighting gaps in transparency, empathy, and usability. Based on these insights, we propose design guidelines and visualize interaction concepts that align with rhetorical strategies to improve trust and effectiveness in health-focused conversational agents.
Multimodal Silent Speech-based Text Entry with Word-initials Conditioned LLM
Although exhibiting great potential in enabling seamless communication between humans and conversational agents, large vocabulary recognition is still challenging for silent speech interfaces. In this research, we propose a novel interaction technique that combines silent speech and typing to enable more efficient text entry while preserving privacy. This technique allows users to use abbreviated phrase input while still ensuring high accuracy by leveraging visual information. By fine-tuning a large language model with a visual speech encoder, we condition the models to decode the speech content with word initials as hints. Evaluations on existing datasets show that our model can reduce the Word Error Rate from 20.3% to 9.19%, compared to state-of-the-art visual speech recognition models. Results from a user study demonstrated significant improvements in input speed and keystroke saving. Participants reported that our prototype, LipType, leads to an overall lower perceived workload, particularly in the effort and physical demand dimension.
A Pragmatics-based Approach to Proactive Digital Assistants for Data Exploration
Recent advances in Natural Language Interfaces (NLIs) and Large Language Models (LLMs) have transformed the way we tackle NLP tasks, shifting the focus towards a more Pragmatics-based perspective. This shift enables more natural interactions between humans and voice assistants, which have historically been difficult to achieve. Pragmatics involves understanding how users often speak out of turn, interrupt one another, or provide relevant information without being explicitly asked (maxim of quantity). To explore this, we developed a digital assistant that continuously listens to conversations and proactively generates relevant visualizations during data exploration tasks. In a within-subject study, participants interacted with both proactive and non-proactive versions of a voice assistant while exploring the Hawaii Climate Data Portal (HCDP). Results suggest that interaction with the proactive assistant increased the total number of utterances and discoveries, facilitated quicker and more reliable insights, and led to greater usage of the system’s chart capabilities. Our study highlights the potential of proactive AI in NLIs and identifies key challenges in its implementation, offering insights for future research.
Exploring LLMs for Automated Generation and Adaptation of Questionnaires
Effective questionnaire design improves the validity of the results, but creating and adapting questionnaires across contexts is challenging due to resource constraints and limited expert access. Recently, the emergence of LLMs has led researchers to explore their potential in survey research. In this work, we focus on the suitability of LLMs in assisting the generation and adaptation of questionnaires. We introduce a novel pipeline that leverages LLMs to create new questionnaires, pretest with a target audience to determine potential issues and adapt existing standardized questionnaires for different contexts. We evaluated our pipeline for creation and adaptation through two studies on Prolific, involving 238 participants from the US and 118 participants from South Africa. Our findings show that participants found LLM-generated text clearer, LLM-pretested text more specific, and LLM-adapted questions slightly clearer and less biased than traditional ones. Our work opens new opportunities for LLM-driven questionnaire support in survey research.
From Synthetic to Human: The Gap Between AI-Predicted and Actual Pro-Environmental Behavior Change After Chatbot Persuasion
Pro-environmental behavior (PEB) is vital to combat climate change, yet turning awareness into intention and action remains elusive. We explore large language models (LLMs) as tools to promote PEB, comparing their impact across 3,600 participants: real humans (n=1,200), simulated humans based on actual participant data (n=1,200), and fully synthetic personas (n=1,200). All three participant groups faced either personalized chatbots, standard chatbots, or static statements, employing four persuasion strategies (moral foundations, future self-continuity, action orientation, or ”freestyle” chosen by the LLM). Results reveal a ”synthetic persuasion paradox”: synthetic and simulated participants significantly change their post-intervention PEB stance, while human attitudes barely shift. Simulated participants better approximate human behavior but still overestimate effects. This disconnect underscores LLM’s potential for pre-evaluating PEB interventions but warns of its limits in predicting human responses. We call for refined synthetic modeling and sustained and extended human trials to align conversational AI’s promise with tangible sustainability outcomes.
Writing with AI Lowers Psychological Ownership, but Longer Prompts Can Help
The feeling of something belonging to someone is called “psychological ownership.” A common assumption is that writing with generative AI lowers psychological ownership, but the extent to which this occurs and the role of prompt length are unclear. We report on two experiments to examine the relationship between psychological ownership and prompt length. Participants wrote short stories either completely by themselves or wrote prompts of varying lengths. Results show that when participants wrote longer prompts, they had higher levels of psychological ownership. Their comments suggest they thought more about their prompts, often adding more details about the plot. However, benefits plateaued when prompt length was 75-100% of the target story length. To encourage users to write longer prompts, we propose augmenting the prompt submission button so it must be held down a long time if the prompt is short. Results show that this technique is effective at increasing prompt length.
Outcomes, Perceptions, and Interaction Strategies of Novice Programmers Studying with ChatGPT
Large Language Model (LLM) conversational agents are increasingly used in programming education, yet we still lack insight into how novices engage with them for conceptual learning compared with human tutoring. This mixed‑methods study compared learning outcomes and interaction strategies of novices using ChatGPT or human tutors. A controlled lab study with 20 students enrolled in introductory programming courses revealed that students employ markedly different interaction strategies with AI versus human tutors: ChatGPT users relied on brief, zero‑shot prompts and received lengthy, context‑rich responses but showed minimal prompt refinement, while those working with human tutors provided more contextual information and received targeted explanations. Although students distrusted ChatGPT’s accuracy, they paradoxically preferred it for basic conceptual questions due to reduced social anxiety. We offer empirically grounded recommendations for developing AI literacy in computer science education and designing learning‑focused conversational agents that balance trust‑building with maintaining the social safety that facilitates uninhibited inquiry.
SESSION: Provocations
Beyond the Illusion: LLMs and the Case for Pragmatic Cues in Conversation
Conversational agents are becoming increasingly adept at interacting with humans in a very natural manner. They incorporate subtle linguistic and paralinguistic cues: changes in tone and style, emotional expressions, or fillers like ‘mm-hm’. In human communication, such cues serve pragmatic functions that support mutual understanding and communicative success. This raises the question: do we want conversational agents to blindly mimic these cues, or can we use them more purposefully to serve a communicative function? We argue that the role of pragmatic cues in interaction with conversational user interfaces remains underexplored. A deeper understanding of how to strategically use them in appropriate contexts and their impact on human-machine interactions is crucial to enhance mutual understanding in conversations with artificial agents. Through this provocation, we propose a research agenda to spark discussion on how future research can address this.
Fake Friends and Sponsored Ads: The Risks of Advertising in Conversational Search
Digital commerce thrives on advertising, with many of the largest technology companies relying on it as a significant source of revenue. However, in the context of information-seeking behavior, such as search, advertising may degrade the user experience by lowering search quality, misusing user data for inappropriate personalization, potentially misleading individuals, or even leading them toward harm. These challenges remain significant as conversational search technologies, such as ChatGPT, become widespread. This paper critically examines the future of advertising in conversational search, utilizing several speculative examples to illustrate the potential risks posed to users who seek guidance on sensitive topics. Additionally, it provides an overview of the forms that advertising might take in this space and introduces the “fake friend dilemma,” the idea that a conversational agent may exploit unaligned user trust to achieve other objectives. This study presents a provocative discussion on the future of online advertising in the space of conversational search and ends with a call to action.
Crossing the Line? The Paradox of Human-Like Design in Conversational Agents
Since the early development of conversational agents (CAs), human-likeness has been a central design focus. Numerous studies have highlighted the benefits of more human-like CAs, including user experience, engagement, and trust improvements. As a result, researchers have proposed guidelines for designing CAs that closely resemble human communication styles. However, a growing body of research argues against excessive human-likeness, citing concerns about setting unrealistic expectations, facilitating overtrust, and enabling manipulation. To mitigate these risks, some researchers advocate for design choices that clearly differentiate CAs from humans, such as using synthetic voices or robotic visual representations to signal their artificial nature. This provocation paper explores the paradox between these two perspectives. Does the very act of making CAs interact in human-like ways inherently contradict efforts to maintain transparency about their artificial nature? We invite discussion on the implications this contradiction holds for the future of CA design.
Aye, Robot: What Happens When Robots Speak Like Real People?
In daily life, we interact with each other using the social, regional, and ethnic communication styles typical of our local communities. Successful communication further rests on our ability to seamlessly adjust to our interlocutors following the norms and expectations of our local social setting as well as conversational context and goals. However, despite significant advances in speech technology, most artificial speech systems—particularly, most social robots—still use a single, “standard”, non-local communication style for all users, social settings and interaction goals. Recent research has shown that when they interact with digital agents, humans transfer and adapt their sociolinguistic behaviours, including communication bias. Despite this, the barriers set up by this inherent communication bias have never been systematically studied for HRI; and the potential benefits to user engagement from socially inclusive, diverse communication styles have not been explored. We argue that social robotics researchers should also consider sociolinguistic factors constraining human interaction. To explore the implications, we describe two hypothetical robots designed to support the local communication style of two regions of the United Kingdom, and we consider the potential sociolinguistic impact each robot might have on its conversational partners and the wider society.
The Ethics of Psychological Manipulation in Adversarial Conversational AI: Confronting the Recognition-Behaviour Gap
Conversational AI systems, powered by advanced Large Language Models, have rapidly developed human-like persuasion capabilities that raise concerns about psychological manipulation. This provocation examines the ethical problems that arise when these systems exploit cognitive biases and social compliance mechanisms during interactions with users. Building on established theoretical work and recent empirical research, we identify a particularly concerning pattern: the recognition-behaviour gap, where users consciously identify manipulative strategies yet fail to protect themselves accordingly. Current ethical frameworks fall short in addressing these sophisticated risks in conversational contexts. Rather than proposing yet another comprehensive framework, we identify five essential dimensions that extend existing approaches to address this recognition-behaviour gap: preserving user autonomy through structural design, implementing safeguards beyond awareness, developing context-sensitive ethics, ensuring persona consistency and transparency, and establishing continuous vulnerability monitoring. This paper confronts these ethical challenges directly and calls for practical protective measures to safeguard user autonomy as conversational AI becomes increasingly prevalent in everyday life.