CUI '24: ACM Conversational User Interfaces 2024

CUI '24: ACM Conversational User Interfaces 2024

Full Citation in the ACM Digital Library

SESSION: Session 1: CUIs in human behaviour interventions

Exploring VUI-Supported Mindfulness Techniques for Smoking Cessation

  • Simon Bak Kjaerulff
  • Stefan Bjerregaard Pedersen
  • Tummas Johan Sigvardsen
  • Niels Van Berkel
  • Eleftherios Papachristos

This study investigates the effectiveness of Voice User Interfaces (VUIs) in supporting mindfulness techniques for smoking cessation. We conducted a month-long between-subject study involving nine participants, comparing a VUI on smart speakers against an augmented VUI (a blend of VUI and Graphical User Interface) on mobile devices. Specifically, we evaluated how these interfaces support individuals in quitting smoking through mindfulness practices. Our results include qualitative insights on participants’ experiences with mindfulness, their smoking cessation motivation, and engagement with the VUI prototypes, alongside quantitative data on their usage patterns. Our findings offer insights into the potential application of VUIs in smoking cessation and suggest design guidelines for future health-oriented applications. The study underscores the importance of device context in designing effective health interventions and sets the direction for future work in HCI and mindfulness applications.

Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth

  • Riku Arakawa
  • Hiromu Yakura

Chatbots’ role in fostering self-reflection is now widely recognized, especially in inducing users’ behavior change. While the benefits of 24/7 availability, scalability, and consistent responses have been demonstrated in contexts such as healthcare and tutoring to help one form a new habit, their utilization in coaching necessitating deeper introspective dialogue to induce leadership growth remains unexplored. This paper explores the potential of such a chatbot powered by recent Large Language Models (LLMs) in collaboration with professional coaches in the field of executive coaching. Through a design workshop with them and two weeks of user study involving ten coach-client pairs, we explored the feasibility and nuances of integrating chatbots to complement human coaches. Our findings highlight the benefits of chatbots’ ubiquity and reasoning capabilities enabled by LLMs while identifying their limitations and design necessities for effective collaboration between human coaches and chatbots. By doing so, this work contributes to the foundation for augmenting one’s self-reflective process with prevalent conversational agents through the human-in-the-loop approach.

Can a Funny Chatbot Make a Difference? Infusing Humor into Conversational Agent for Behavioral Intervention

  • Xin Sun
  • Isabelle Teljeur
  • Zhuying Li
  • Jos A. Bosch

Regular physical activity is crucial for reducing the risk of non-communicable disease (NCD). With NCDs on the rise globally, there is an urgent need for effective health interventions, with chatbots emerging as a viable and cost-effective option because of limited healthcare accessibility. Although health professionals often utilize behavior change techniques (BCTs) to boost physical activity levels and enhance client engagement and motivation by affiliative humor, the efficacy of humor in chatbot-delivered interventions is not well-understood. This study conducted a randomized controlled trial to examine the impact of the generative humorous communication style in a 10-day chatbot-delivered intervention for physical activity. It further investigated whether user engagement and motivation act as mediators between the communication style and changes in physical activity levels. 66 participants engaged with the chatbots across three groups (humorous, non-humorous, and no-intervention) and responded to daily ecological momentary assessment questionnaires assessing engagement, motivation, and physical activity levels. Multilevel time series analyses revealed that an affiliative humorous communication style positively impacted physical activity levels over time, with user engagement acting as a mediator in this relationship, whereas motivation did not. These findings clarify the role of humorous communication style in chatbot-delivered interventions for physical activity, offering valuable insights for future development of intelligent conversational agents incorporating humor.

Designing a Couples-Based Conversational Agent to Promote Safe Sex in New, Young Couples: A User-Centred Design Approach

  • Divyaa Balaji
  • Gert-Jan De Bruijn
  • Tibor Bosse
  • Carolin Ischen
  • Margot Van Der Goot
  • Reinout Wiers

The uptake of conversational agents (CAs) to deliver digital sexual health interventions is growing. While current CAs only address one user at a time, research suggests that couples-based interventions may be more effective at promoting safe sex in non-casual relationships by improving relationship functioning. In this paper, we describe user-centred design activities undertaken towards the design of a couples-based chatbot to address safe sex in new, young couples. A two-step approach was undertaken, in which young people were interviewed about their preferences and ideas, and sexual health professionals took part in a design thinking workshop. The design activities yielded a rich set of design guidelines from both groups, as well as a paper-and-pen prototype of the proposed CA from the workshop. As expected, trust was raised by both stakeholders as an important determinant of use and therefore heavily informs the design guidelines.

WorkFit: Designing Proactive Voice Assistance for the Health and Well-Being of Knowledge Workers

  • Shashank Ahire
  • Benjamin Simon
  • Michael Rohs

Prior research has designed and evaluated Voice Assistance (VA) for different settings such as the home, school, and public spaces. Office environments have been relatively understudied, leaving a gap in understanding the essential factors for designing a VA specifically for work settings. In this study, we developed the WorkFit VA specific for the office environment, focusing on the health and well-being of knowledge workers. WorkFit was designed to monitor knowledge workers for sedentary behavior, inconsistent hydration, and stress, and to deliver proactive voice interventions followed by a health recommendation to mitigate those issues. We evaluated WorkFit in a field study with 15 knowledge workers for 5 working days. In the study, we determined challenges and opportunities for voice interactions in work settings. We identified contextual factors for identifying inopportune moments for voice interactions in an office setting. We found that 92% of knowledge workers accepted WorkFit’s hydration interventions while 79% of them engaged in walking breaks. Moreover, breathing exercises recommended by WorkFit significantly stabilized the heart rate of knowledge workers during stress. Based on our findings, we propose five design recommendations for the development of VA customized to office settings.

Seeking Truth, Comfort, and Connection: How Conversational User Interfaces can help Couples with Dementia Manage Reality Disjunction

  • Yvon Ruitenburg
  • Minha Lee
  • Wijnand IJsselsteijn
  • Panos Markopoulos

Reality disjunction, when people present contradicting worldviews as true, frequently occurs in conversations between people with dementia and their partners due to memory gaps, distortions, fabrications, or unsuccessful communication. These couples can feel isolated and frustrated when trying to reconcile their memories or diverging perceptions of the world. Conversational User Interfaces (CUIs) can be used to help people with dementia with their cognitive needs, but their role in helping with reality disjunction is unexplored. Through semi-structured interviews, we examined how six couples (composed of people with dementia and their partners) experience reality disjunction and explored CUIs’ potential roles in mediating these situations through speculative designs. We found that addressing reality disjunction depended on what couples valued in their interaction. Each person may value the truth of what happened, but may let go of this when aiming for comfort, and empathise with their partner when pursuing connection. Potential roles for CUIs to help couples explore these values include helping them check and explain their memories, reducing their distress during reality disjunction, and helping people practice different responses to reality disjunction.

SESSION: Session 2: Interaction Design

Examining Humanness as a Metaphor to Design Voice User Interfaces

  • Smit Desai
  • Mateusz Dubiel
  • Luis A. Leiva

Voice User Interfaces (VUIs) increasingly leverage ‘humanness’ as a foundational design metaphor, adopting roles like ‘assistants,’ ‘teachers,’ and ‘secretaries’ to foster natural interactions. Yet, this approach can sometimes misalign user trust and reinforce societal stereotypes, leading to socio-technical challenges that might impede long-term engagement. This paper explores an alternative approach to navigate these challenges—incorporating non-human metaphors in VUI design. We report on a study with 240 participants examining the effects of human versus non-human metaphors on user perceptions within health and finance domains. Results indicate a preference for the human metaphor (doctor) over the non-human (health encyclopedia) in health contexts for its perceived enjoyability and likeability. In finance, however, user perceptions do not significantly differ between human (financial advisor) and non-human (calculator) metaphors. Importantly, our research reveals that the explicit awareness of a metaphor’s use influences adoption intentions, with a marked preference for non-human metaphors when their metaphorical nature is not disclosed. These findings highlight context-specific conversation design strategies required in integrating non-human metaphors into VUI design, suggesting tradeoffs and design considerations that could enhance user engagement and adoption.

Participatory Design with Domain Experts: A Delphi Study for a Career Support Chatbot

  • Marianne Wilson
  • David Brazier
  • Dimitra Gkatzia
  • Peter Robertson

We present a study of collaboration with expert participants for the purpose of the responsible design of a conversational agent. The Delphi study was used to identify and develop design and evaluation criteria for an automated career support intervention. Career support tasks present complex design problems as they are highly personalized and the definition of success for a single intervention is ambiguous. The study engaged domain experts in a structured communication process to explore the opportunities and risks of introducing a conversational agent to complement existing services provided to young people. Three rounds of questionnaires were used to build consensus across the expert panel. The questionnaire design incorporated design fictions, qualitative data from the panel, and requirement statements. The study produced a validated set of criteria that can be used for the design and evaluation of a conversational agent, that aligns with professional ethics and intended outcomes for a career support intervention. Our approach demonstrates the value of mixed method Delphi studies to facilitate participatory design of conversational user experiences by bridging knowledge gaps between technical and domain experts. The resulting evaluation criteria establish a meaningful foundation for future human-centered conversation design for career support.

Exploring User Engagement Through an Interaction Lens: What Textual Cues Can Tell Us about Human-Chatbot Interactions

  • Linwei He
  • Anouck Braggaar
  • Erkan Basar
  • Emiel Krahmer
  • Marjolijn Antheunis
  • Reinout Wiers

Monitoring and maintaining user engagement in human-chatbot interactions is challenging. Researchers often use cues observed in the interactions as indicators to infer engagement. However, evaluation of these cues is lacking. In this study, we collected an inventory of potential textual engagements cues from the literature, including linguistic features, utterance features, and interaction features. These cues were subsequently used to annotate a dataset of 291 user-chatbot interactions, and we examined which of these cues predicted self-reported user engagement. Our results show that engagement can indeed be recognized at the level of individual utterances. Notably, words indicating cognitive thinking processes and motivational utterances were strong indicators of engagement. An overall negative tone could also predict engagement, highlighting the importance of nuanced interpretation and contextual awareness of user utterances. Our findings demonstrated initial feasibility of recognizing utterance-level cues and using them to infer user engagement, although further validation is needed across different content-domains.

Using Speech Agents for Mood Logging within Blended Mental Healthcare: Mental Healthcare Practitioners' Perspectives

  • Orla Cooney
  • Kevin Doherty
  • Marguerite Barry
  • David Coyle
  • Gavin Doherty
  • Benjamin R. Cowan

Mood logging, where people track mood-related data, is commonly used to support mental healthcare. Speech agents could prove beneficial in supporting mood logging for clients. Yet we know little about how Mental Healthcare Practitioners (MHPs) view speech as a tool to support current care practices. Through a thematic analysis of semi-structured interviews with 15 MHPs, we show that MHPs see opportunities in the convenience, and the data richness that speech agents could afford. However, MHPs also saw this richness as noisy, with using speech potentially diminishing a client’s focus on mood logging as an activity. MHPs were wary of overusing AI-based tools, expressing concerns around data ownership, access and privacy. We discuss the role of speech agents within blended care, outlining key considerations when using speech for mood logging in a blended mental healthcare context.

SESSION: Session 3: AI and automation

Automating the Development of Task-oriented LLM-based Chatbots

  • Jesús Sánchez Cuadrado
  • Sara Pérez-Soler
  • Esther Guerra
  • Juan De Lara

Task-oriented chatbots are increasingly used to access all sorts of services – like booking a flight, or setting a medical appointment – through natural language conversation. There are many technologies for implementing task-oriented chatbots, including Dialogflow, Watson, and Rasa. They rely on an explicit definition of the user intents, conversation flows, and chatbot outputs, which is costly to specify, and sometimes results in suboptimal user experiences and artificial conversations with limited diversity of chatbot responses.

Recently, the advances in generative artificial intelligence fostered by Large Language Models (LLMs) have enabled a new range of open-domain chatbots, like ChatGPT, able to converse fluently on any topic. However, they are general-purpose, and therefore not directly usable to solve specialised tasks reliably.

In this paper, we study the power of LLMs to build task-oriented chatbots, resulting in lighter specifications – no intent definition required – and more natural conversations than in intent-based approaches. To this end, we propose a lightweight domain-specific language based on YAML to specify chatbots using modules of different types (e.g., menus, question-answering, data gathering). These specifications are compiled into structured LLM prompts that use the ReAct framework to inform our runtime how to interpret the user input and coordinate the tasks that the chatbot must perform. The paper presents the design and realisation of our framework, and an assessment that encodes a set of existing intent-based chatbots using our approach, showing its benefits in terms of specification size, conversation flexibility and output diversity.

Multimodal Dialog Act Classification for Digital Character Conversations

  • Philine Witzig
  • Rares Constantin
  • Nikola Kovacevic
  • Rafael Wampfler

Dialog act classification is essential for enabling digital characters to understand and respond effectively to user intents, leading to more engaging and seamless interactions. Previous research has focused on classifying dialog acts from transcriptions alone due to missing multimodal data. We close this gap by collecting a new multimodal (i.e., text, audio, video) dyadic dialog dataset from 60 participants. Based on our dataset, we developed a novel multimodal Transformer-based dialog act classification model. We show that our model can predict dialog acts in real-time on four classes with a Macro F1 score up to 80.81, outperforming the unimodal baseline by <Formula format="inline"><TexMath><?TeX $1.24\%$?></TexMath><AltText>Math 1</AltText><File name="cui24-8-inline1" type="svg"/></Formula>. Our analysis shows that the segments of a sentence associated with the highest acoustic energy are most predictive. By harnessing our new multimodal dataset, we pave the way for dynamic, real-time, and contextually rich conversations that enhance the experience of interactions with digital characters.

Generative AI-Enabled Conversational Interaction to Support Self-Directed Learning Experiences in Transversal Computational Thinking

  • Abdessalam Ouaazki
  • Kristoffer Bergram
  • Juan Carlos Farah
  • Denis Gillet
  • Adrian Holzer

As computational thinking (CT) becomes increasingly acknowledged as an important skill in education, self-directed learning (SDL) emerges as a key strategy for developing this capability. The advent of generative AI (GenAI) conversational agents has disrupted the landscape of SDL. However, many questions still arise about several user experience aspects of these agents. This paper focuses on two of these questions: personalization and long-term support. As such, the first part of this study explores the effectiveness of personalizing GenAI through prompt-tuning using a CT-based prompt for solving programming challenges. The second part focuses on identifying the strengths and weaknesses of a GenAI model in a semester-long programming project. Our findings indicate that while prompt-tuning could hinder ease of use and perceived learning assistance, it might lead to higher learning outcomes. Results from a thematic analysis also indicate that GenAI is useful for programming and debugging, but it presents challenges such as over-reliance and diminishing utility over time.

Towards Interactive Guidance for Writing Training Utterances for Conversational Agents

  • David Piorkowski
  • Rachel Ostrand
  • Kristina Brimijoin
  • Jessica He
  • Erica Albert
  • Stephanie Houde

Improving conversational agents that are trained with supervised learning requires iteratively refining example intent training utterances based on chat log data. The difficulty of this process hinges on the quality of the initial example utterances used to train the intent before it was first deployed. Creating new intents from scratch, when conversation logs are not yet available, has many challenges. We interviewed experienced conversational agent intent trainers to better understand challenges they face when creating new intents, and their best practices for writing high quality training utterances. Using these findings and related literature, we developed an intent training tool that provided interactive guidance via either language feedback or sample utterances. Language feedback notified the user when training utterances could be linguistically improved, while sample utterances were crowdsourced and provided examples of end user language prior to deploying an intent. We compared these two types of guidance in a 187-participant between-subject study. We found that participants in the language feedback condition reported limited creativity and higher mental load and spent more time on the task, but were more thoughtful in crafting utterances that adhered to best practices. In contrast, sample utterance participants leveraged the samples to either quickly select examples or use them as a springboard to develop new utterance ideas. We report on differences in user experience in the strategies that participants took and preferences for or against the different types of guidance.

SESSION: Session 4: UX

Say What? Real-time Linguistic Guidance Supports Novices in Writing Utterances for Conversational Agent Training

  • Rachel Ostrand
  • Kristina Brimijoin
  • David Piorkowski
  • Jessica He
  • Erica Albert
  • Stephanie Houde

Writing utterances to train conversational agents can be a challenging and time-consuming task, and usually requires substantial expertise, meaning that novices face a steep learning curve. We investigated whether novices could be guided to produce utterances that adhere to best practices via an intervention of real-time linguistic feedback. We conducted a user study in which participants were tasked with writing training utterances for a particular topic (intent) for a conversational agent. Participants received one of two types of linguistic guidance in real-time to shape their utterance-writing: (1) feedback on the lexical and syntactic properties and the variety of each utterance, or (2) sample utterances written by other users, to select or inspire the writing of new utterances. All participants also completed a control condition, in which they wrote utterances for a different intent without receiving any guidance. We investigated whether linguistic properties of the utterances differed as a function of whether the participant had received guidance, and if so, which type. Results showed that participants wrote longer and better quality utterances, with greater lexical and syntactic diversity, in both guidance conditions compared to when they received no guidance. These results demonstrate that giving novices explicit linguistic guidance can improve the quality of the training utterances they write, suggesting that this could be an effective way of getting new utterance writers started with much less training than most current practices require.

Improving Conversational User Interfaces for Citizen Complaint Management through enhanced Contextual Feedback

  • Kai Karren
  • Michael Schmitz
  • Stefan Schaffer

As cities transform, disrupting citizens’ lives, their participation in urban development is often undervalued despite its importance. Citizen complaint systems exist but are often limited in fostering meaningful dialogue with municipalities. Meanwhile, smart cities aim to improve living standards, efficiency, and sustainability by integrating digital twins with physical infrastructures, potentially enhancing transparency and enriching communication between cities and their inhabitants with real-time data. Complementing these developments, technologies realizing Conversational User Interfaces (CUIs) are becoming more capable in providing a conversational and feedback-oriented approach such as complaint management processes.

The improvement of CUIs for citizen complaint management through enhanced contextual feedback is explored in this work. The term contextual feedback has been developed and defined as all information (for example, background, conditions, explanations, timelines, and the existence of similar complaints) related to a complaint and or the underlying problem that could potentially be relevant for the user. The solution proposed in this paper gathers data from users about their issues via a CUI, which subsequently queries various data sources to obtain relevant contextual information. Following this, a Large Language Model processes the collected data to produce the corresponding feedback. In the study, a static CUI without contextual data as the baseline has been compared to a CUI that includes contextual data, analyzing their impact on pragmatic and hedonic quality, reuse intention, and potential influence on the citizens’ trust in their municipality. The study has been conducted in cooperation with the German municipality of Wadgassen. The good performance of the baseline system shows the general potential of LLMs in the citizen complaint domain even without data sources. The results show that contextual feedback performed better overall, with significant improvements in the pragmatic and hedonic quality, attractiveness, reuse intention, feeling that the complaint is taken seriously, and the citizens’ trust in their municipality.

Understanding User Preferences of Voice Assistant Answer Structures for Personal Health Data Queries

  • Bradley Rey
  • Yumiko Sakamoto
  • Jaisie Sin
  • Pourang Irani

Voice assistants (VAs) are becoming ubiquitous within daily life, residing in homes, personal smart-devices, vehicles, and many other technologies. Designed for seamless natural language interaction, VAs empower users to ask questions and execute tasks without relying on graphical or tactile interfaces. A promising avenue for VAs is to allow people to ask personal health data questions. However, this functionality is currently not widely available and answer preferences to such questions have not been studied. We implemented a pseudo-VA that handles personal health data questions, answering in three unique styles: minimal, keyword, and full sentence. In two online user studies, 82 unique participants interacted with our VA, asking varying personal health data questions and ranking answer structures given. Our results show a strong preference for full sentence responses throughout. We find that even though full sentence answers have the longest mean response time, they are still found to provide high quality and optimal behaviour, while also being comprehensible and efficient. Furthermore, participants reported that for personal health question and answering, VAs should provide technical and efficient interactions rather than being social.

Beyond Functionality: Unveiling Dimensions of User Experience in Embodied Conversational Agents for Customer Service

  • Li Lin
  • Xuan Du
  • Mengdi Tang
  • Jian Gao
  • Shouyu Wang

Embodied Conversational Agents (ECAs) are increasingly being deployed in the customer service field. However, their user experience post-adoption remains under-researched. Through interviews with customer service practitioners and review of existing ECA research, we identified eighteen key items of ECA experience. Based on these items, we conducted a survey among users who have interacted with ECAs. Utilizing exploratory factor analysis and confirmatory factor analysis, we developed a five-dimension model: trustworthy, approachable, humanized, engaging, and supportive experiences. We found that the evaluation of experience importance varied according to users’ educational backgrounds and annual incomes. Users with higher education and income levels exhibited higher expectation for trustworthy experience. Additionally, our research suggests that the design features of ECAs contribute distinctively to these five user experience dimensions. These insights provide novel directions for adopting a user-centered design approach to improve ECA interactions.

Cross-Cultural Validation of Partner Models for Voice User Interfaces

  • Katie Seaborn
  • Iona Gessinger
  • Suzuka Yoshida
  • Benjamin R. Cowan
  • Philip R. Doyle

Recent research has begun to assess people’s perceptions of voice user interfaces (VUIs) as dialogue partners, termed partner models. Current self-report measures are only available in English, limiting research to English-speaking users. To improve the diversity of user samples and contexts that inform partner modelling research, we translated, localized, and evaluated the Partner Modelling Questionnaire (PMQ) for non-English speaking Western (German, n=185) and East Asian (Japanese, n=198) cohorts where VUI use is popular. Through confirmatory factor analysis (CFA), we find that the scale produces equivalent levels of “goodness-to-fit” for both our German and Japanese translations, confirming its cross-cultural validity. Still, the structure of the communicative flexibility factor did not replicate directly across Western and East Asian cohorts. We discuss how our translations can open up critical research on cultural similarities and differences in partner model use and design, whilst highlighting the challenges for ensuring accurate translation across cultural contexts.

SESSION: Session 5: Beyond Semantics

The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product Recommendations

  • Sabid Bin Habib Pias
  • Ran Huang
  • Donald S. Williamson
  • Minjeong Kim
  • Apu Kapadia

Voice Assistants (VAs) can assist users in various everyday tasks, but many users are reluctant to rely on VAs for intricate tasks like online shopping. This study aims to examine whether the vocal characteristics of VAs can serve as an effective tool to persuade users and increase user engagement with VAs in online shopping. Prior studies have demonstrated that the perceived tone, age, and gender of a voice influence the perceived persuasiveness of the speaker in interpersonal interactions. Furthermore, persuasion in product communication has been shown to affect purchase decisions in online shopping. We investigate whether variations in a VA voice’s perceived tone, age, and gender characteristics can persuade users and ultimately affect their purchase decisions. Our experimental study showed that participants were more persuaded to make purchase decisions by VA voices having positive or neutral tones as well as middle-aged male or younger female voices. Our results suggest that VA designers should offer users the ability to easily customize VA voices with a range of tones, ages, and genders. This customization can enhance user comfort and enjoyment, potentially leading to higher engagement with VAs. Additionally, we discuss the boundaries of ethical persuasion, emphasizing the importance of safeguarding users’ interests against unwarranted manipulation.

Do Your Expectations Match? A Mixed-Methods Study on the Association Between a Robot's Voice and Appearance

  • Martina De Cet
  • Martina Cvajner
  • Ilaria Torre
  • Mohammad Obaid

Both physical appearance and voice can elicit mental images of what someone and/or something should sound and look like. This is particularly relevant for human-robot interaction design and research since any voice can be added to a robot. Therefore, it is important to give robots voices that match users’ expectations. In this paper, we examined the voice-appearance association by asking participants to match a robot image with a voice (Experiment 1, N = 24), and vice versa, a voice with a robot image (Experiment 2, N = 24), in two mixed-methods studies. We looked at participants’ differences that could influence the voice-robot association (gender and nationality) and at voice and robot features that could influence participants’ voice preferences (voice gender, pitch and robot’s appearance). Results show that nationality influenced participants’ association with a robot image after hearing its voice. Furthermore, a content analysis identified that when creating a voice mental image, participants looked at robots’ gendered characteristics and height and they paid special attention to human-like and gender-specific cues in a voice when forming a mental image of a robot. Sociological differences also emerged, with Swedish participants suggesting the use of gender-neutral voices to avoid strengthening existing stereotypes, and Italians saying the opposite. Our work highlights the importance of individual differences in the robot voice-appearance association and the importance of involving the end user in designing the voice.

Language Cues for Expressing Artificial Personality: A Systematic Literature Review for Conversational Agents

  • Alexander Dregger
  • Maximilian Seifermann
  • Andreas Oberweis

Users attribute artificial personality (AP) to conversational agents (CAs) based on perceived language respectively verbal cues. This review synthesizes studies on this topic, encompassing research not only on chat- and voicebots but also on social robots, drawing from interdisciplinary databases. This approach led to an identification of 200 verbal signals, nearly four times more as in previous reviews. The signals were classified according to the personality dimensions of the BFM as well as its facets. Besides, the relevance of theories of personality other than the BFM are discussed. Furthermore, six methodological challenges in the empirical study of verbal cues expressing AP are identified. Practical implications include providing practitioners an overview of verbal signals, while offering opportunities for research improvement based on identified challenges. Enhanced understanding of verbal signals related to AP aids in evaluating implementation quality, not only in rule-based CAs but also in LLM-based systems.

Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality Infusion

  • Nikola Kovacevic
  • Tobias Boschung
  • Christian Holz
  • Markus Gross
  • Rafael Wampfler

Equipping chatbots with personality has the potential of transforming user interactions from mere transactions to engaging conversations, enhancing user satisfaction and experience. In this work, we introduce dynamic personality infusion, a novel intermediate stage between the chatbot and the user that adjusts the chatbot’s response using a dedicated chatbot personality model and GPT-4 without altering the chatbot’s semantic capabilities. To test the effectiveness of our method, we first collected human-chatbot conversations from 33 participants while they interacted with three LLM-based chatbots (GPT-3.5, Llama-2 13B, and Mistral 7B). Then, we conducted an online rating survey with 725 participants on the collected conversations. We analyze the impact of the personality infusion on the perceived trustworthiness of the chatbots and the suitability of different personality profiles for real-world chatbot use cases. Our work paves the way for dynamic, personalized chatbots, enhancing user trust and real-world applicability.

Beyond Words: Infusing Conversational Agents with Human-like Typing Behaviors

  • Jijie Zhou
  • Yuhan Hu

Recently, large language models have facilitated the emergence of highly intelligent conversational AI capable of engaging in human-like dialogues. However, a notable distinction lies in the fact that these AI models predominantly generate responses rapidly, often producing extensive content without emulating the thoughtful process characteristic of human cognition and typing. This paper presents a design aimed at simulating human-like typing behaviors, including patterns such as hesitation and self-editing, as well as a preliminary user experiment to understand whether and to what extent the agent with human-like typing behaviors could potentially affect conversational engagement and its trustworthiness. We’ve constructed an interactive platform featuring user-adjustable parameters, allowing users to personalize the AI’s communication style and thus cultivate a more enriching and immersive conversational experience. Our user experiment, involving interactions with three types of agents—a baseline agent, one simulating hesitation, and another integrating both hesitation and self-editing behaviors—reveals a preference for the agent that incorporates both behaviors, suggesting an improvement in perceived naturalness and trustworthiness. Through the insights from our design process and both quantitative and qualitative feedback from user experiments, this paper contributes to the multimodal interaction design and user experience for conversational AI, advocating for a more human-like, engaging, and trustworthy communication paradigm.

Comparing Perceptions of Static and Adaptive Proactive Speech Agents

  • Justin Edwards
  • Philip R. Doyle
  • Holly P. Branigan
  • Benjamin R. Cowan

A growing literature on speech interruptions describes how people interrupt one another with speech, but these behaviours have not yet been implemented in the design of artificial agents which interrupt. Perceptions of a prototype proactive speech agent which adapts its speech to both urgency and to the difficulty of the ongoing task it interrupts are compared against perceptions of a static proactive agent which does not. The study hypothesises that adaptive proactive speech modelled on human speech interruptions will lead to partner models which consider the proactive agent as a stronger conversational partner than a static agent, and that interruptions initiated by an adaptive agent will be judged as better timed and more appropriately asked. These hypotheses are all rejected however, as quantitative analysis reveals that participants view the adaptive agent as a poorer dialogue partner than the static agent and as less appropriate in the style it interrupts. Qualitative analysis sheds light on the source of this surprising finding, as participants see the adaptive agent as less socially appropriate and as less consistent in its interactions than the static agent.

SESSION: Session 6: When things go wrong

Identifying Breakdowns in Conversational Recommender Systems using User Simulation

  • Nolwenn Bernard
  • Krisztian Balog

We present a methodology to systematically test conversational recommender systems with regards to conversational breakdowns. It involves examining conversations generated between the system and simulated users for a set of pre-defined breakdown types, extracting responsible conversational paths, and characterizing them in terms of the underlying dialogue intents. User simulation offers the advantages of simplicity, cost-effectiveness, and time efficiency for obtaining conversations where potential breakdowns can be identified. The proposed methodology can be used as diagnostic tool as well as a development tool to improve conversational recommendation systems. We apply our methodology in a case study with an existing conversational recommender system and user simulator, demonstrating that with just a few iterations, we can make the system more robust to conversational breakdowns.

Explaining the Wait: How Justifying Chatbot Response Delays Impact User Trust

  • Zhengquan Zhang
  • Konstantinos Tsiakas
  • Christina Schneegass

In human communication, responding to a question very slowly or quickly influences our trust in the answer. As chatbots evolve to increasingly mimic human speech, response speed can be artificially varied to create certain impressions on users. However, studies remain inconclusive, potentially due to the absence of contextual cues that allow for interpretation of the delay. Thus, this study explores textual explanations that justify the instant and dynamic – dependent on answer length – response delays. We derive five design variations based on prior work and evaluate their impact on the chatbot’s perceived social presence and transparency (N = 10). In a between-subject online study (N = 194), we then evaluate the influence of the highest-rated justification on users’ perceptions of chatbot transparency, social presence, and trust for the two delay conditions. Results demonstrate that while such justifications enhance perceived transparency and trust in the immediate response scenario, they show no effect in the dynamic delay context.

System and User Strategies to Repair Conversational Breakdowns of Spoken Dialogue Systems: A Scoping Review

  • Essam Alghamdi
  • Martin Halvey
  • Emma Nicol

Spoken Dialogue Systems (SDSs) are critical in facilitating natural and efficient human-machine interaction through speech. SDSs frequently encounter challenges in managing complex dialogues, resulting in communication breakdowns, which include misunderstandings— where the system misunderstands user input— and non-understandings— where the system fails to interpret the input at all. Strategies to repair these breakdowns have been investigated across multiple disciplines; despite this interest, the findings from these studies are inconsistent and hinder comparative analysis due to the use of diverse methodologies and terminologies. To address this gap, this scoping review systematically examines SDS and user repair strategies within a broad spectrum of literature. Based on 36 papers out of 818 found, we provide two comprehensive frameworks: one categorising SDS system-repair strategies into six distinct categories and the other user-repair strategies into five categories. Our analysis reveals a disparity in the literature’s focus on repair strategies, highlighting, in particular, the lack of research on less explored strategies, such as Information and Disclosure repair strategies, providing potential avenues for future research directions in this area.

Voice Assistants' Accountability through Explanatory Dialogues

  • Fatemeh Alizadeh
  • Peter Tolmie
  • Minha Lee
  • Philipp Wintersberger
  • Dominik Pins
  • Gunnar Stevens

As voice assistants (VAs) become more advanced leveraging Large Language Models (LLMs) and natural language processing, their potential for accountable behavior expands. Yet, the long-term situational effectiveness of VAs’ accounts when errors occur remains unclear. In our 19-month exploratory study with 19 households, we investigated the impact of an Alexa feature that allows users to inquire about the reasons behind its actions. Our findings indicate that Alexa's accounts are often single, decontextualized responses that led to users’ alternative repair strategies over the long term, such as turning off the device, rather than initiating a dialogue about what went wrong. Through role-playing workshops, we demonstrate that VA interactions should facilitate explanatory dialogues as dynamic exchanges that consider a range of speech acts, recognizing users’ emotional states and the context of interaction. We conclude by discussing the implications of our findings for the design of accountable VAs.

SESSION: Provocation, Posters and Demos

A Not So Chatty “Chatbot”: Co-designing to support First-Time Parents in South Africa and Portugal

  • Leina Meoli
  • Francisco Nunes
  • Beatriz Félix
  • Joana Couto da Silva
  • Xolani Ntinga
  • Melissa Densmore

Innovations in chatbot technology have greatly accelerated in recent years, yet challenges persist in implementing them effectively for healthcare in diverse socio-economic contexts, especially in the global south. We engaged in two sets of co-design workshops that gave insight into the preferences of parents regarding chatbot design modalities but also uncovered constraints for our design including working with low-resource languages and limited internet connectivity. Though we set out for a chatbot to support first-time parents, our co-design in Portugal and South Africa resulted in the development of a "not-so-chatty" chatbot. Our intervention, ParentCoach App, is a question-and-answer informational resource presented in a chat-like user interface with search and menus for content exploration. We discuss the challenges of designing and implementing chatbots across different geographic and socio-economic contexts, presenting our resulting intervention and the preliminary findings from a two-week feasibility pilot with first-time parents in the two countries.

A Pilot Study on Multi-Party Conversation Strategies for Group Recommendations

  • Matthias Kraus
  • Stina Klein
  • Nicolas Wagner
  • Wolfgang Minker
  • Elisabeth André

Current research on conversational recommender systems (CRS) focuses mainly on single-user interactions, neglecting situations where multiple users need to receive suggestions on a shared goal. This requires a CRS to conduct a multi-party dialogue and participate in the negotiation process, which presents various research challenges. In this paper, we investigate how actively a CRS needs to participate in a multi-party negotiation dialogue by conducting a WoZ study. The study compares conversational strategies of a CRS – conversation-leading and -following – regarding task effectiveness and efficiency, user experience, and group dynamics. The user study highlighted both the advantages and disadvantages of employing a conversation-leading strategy in a CRS. The strategy sped up decisions and lessened chatbot moderation but had drawbacks: bias towards dominant individuals and lower group performance. Our findings also highlighted how users’ openness to experience affected their views on the strategy’s perspicuity and dependability.

Advancing Faithfulness of Large Language Models in Goal-Oriented Dialogue Question Answering

  • Abigail Sticha
  • Norbert Braunschweiler
  • Rama Sanand Doddipatla
  • Kate M Knill

Goal-oriented dialogue systems, such as assistant chatbots and conversational AI systems, have gained prominence for their question-answering capabilities, often utilizing large language models (LLMs) as knowledge bases. However, these systems face limitations when knowledge outside their intrinsic scope is required. In this paper we address these limitations by designing more faithful and useful systems that can accurately respond to users based on external information. Guided by reference-free evaluation metrics instead of traditional word-overlap metrics, we present two novel methods to prompt LLMs which surpass the baselines in accuracy, linguistic quality, and faithfulness. The first method employs a reranking technique using LLMs to rank document relevance without the need for fine-tuning. The second system builds upon the ReAct framework by incorporating a self-reflection mechanism, ensuring answers are grounded in retrieved content. Overall, our methods advance few-shot prompting as a way to learn to condition on external evidence, and significantly reduce hallucinations.

Automatic Generation of Conversational Interfaces for Tabular Data Analysis

  • Marcos Gomez-Vazquez
  • Jordi Cabot
  • Robert Clarisó

Tabular data is the most common format to publish and exchange structured data online. A clear example is the growing number of open data portals published by public administrations. However, exploitation of these data sources is currently limited to technical people able to programmatically manipulate and digest such data. As an alternative, we propose the use of chatbots to offer a conversational interface to facilitate the exploration of tabular data sources, including support for data analytics questions that are responded via charts rendered by the chatbot. Moreover, our chatbots are automatically generated from the data source itself thanks to the instantiation of a configurable collection of conversation patterns matched to the chatbot intents and entities.

Beyond Individual Concerns: Multi-user Privacy in Large Language Models

  • Xiao Zhan
  • William Seymour
  • Jose Such

In this paper, we explore the nuanced and increasingly relevant issue of Multi-user Privacy (MP) in the context of Large Language Models (LLMs). Addressing the gap in current research, we examine how LLMs can inadvertently compromise the privacy of multiple users, particularly in scenarios involving advanced multimodal capabilities. We highlight the challenges in mitigating these privacy concerns, stemming from the complexities of shared data permissions, varying user perceptions of privacy, and the dynamic nature of LLM interactions. The paper advocates for a collaborative approach, encompassing targeted research, ethical AI development, informed policy-making, and enhanced user awareness, to address these emerging privacy challenges in the realm of LLMs.

Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents

  • Guangzhi Sun
  • Xiao Zhan
  • Jose Such

The incorporation of Large Language Models (LLMs) such as the GPT series into diverse sectors including healthcare, education, and finance marks a significant evolution in the field of artificial intelligence (AI). The increasing demand for personalised applications motivated the design of conversational agents (CAs) to possess distinct personas. This paper commences by examining the rationale and implications of imbuing CAs with unique personas, smoothly transitioning into a broader discussion of the personalisation and anthropomorphism of CAs based on LLMs in the LLM era.

We delve into the specific applications where the implementation of a persona is not just beneficial but critical for LLM-based CAs. The paper underscores the necessity of a nuanced approach to persona integration, highlighting the potential challenges and ethical dilemmas that may arise. Attention is directed towards the importance of maintaining persona consistency, establishing robust evaluation mechanisms, and ensuring that the persona attributes are effectively complemented by domain-specific knowledge.

ChatFive: Enhancing User Experience in Likert Scale Personality Test through Interactive Conversation with LLM Agents

  • Jungjae Lee
  • Yubin Choi
  • Minhyuk Song
  • Sanghyun Park

Personality assessments provide insights into understanding individual differences. In HCI, personality assessments are used to model user behavior or tailor user interfaces. However, conventional Likert-scale personality tests face issues in user engagement and capturing comprehensive personality nuances. Building upon prior work using conversational user interfaces for personality prediction, we delve deeper into personalized personality tests. Through a formative study (n=4), we identified three design goals for user engagement. Informed by these goals, we propose a novel architecture integrating multiple large language model agents to support free-form conversation-based personality assessment. Our system, ChatFive, predicts users’ Big Five traits through real-time personalized dialogue. Evaluations from our user study (n=20) revealed that ChatFive significantly improved conveying true responses and felt more engaged, though requiring longer response times and different validation. We discuss the limitations on the validity of ChatFive and its implications.

Dual-Mode Interventions: Giving Agency to Knowledge Workers in Proactive Health Interventions

  • Shashank Ahire
  • Saeid Othman
  • Michael Rohs

In the domain of health and well-being, proactive voice interventions have demonstrated their efficacy. However, users often encounter privacy concerns and social embarrassment due to the lack of control over these proactive interventions, especially in formal and social settings. This study introduces a novel approach called “dual-mode intervention.” It begins with primary interventions using different modalities (like graphical, tactile, or auditory). If users do not respond to these primary interventions, the system delivers voice interventions after a short interval. We conducted a study employing a within-subjects design, which involved 15 participants. The study compared dual-mode interventions with direct voice interventions in office settings, focusing on addressing health and well-being issues. Our findings indicate that knowledge workers preferred dual-mode interventions over direct voice interventions. Moreover, direct voice interventions received significantly lower ratings compared to dual-mode interventions. Also, we identify user preferences for different dual-intervention modalities. Our findings reveal that the user preferences depend on the type of health intervention. Vibration emerged as the preferred modality, followed by graphical output, auditory icons, and ringing interventions.

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy

  • Efe Bozkir
  • Süleyman Özdel
  • Ka Hei Carrie Lau
  • Mengdi Wang
  • Hong Gao
  • Enkelejda Kasneci

Advances in artificial intelligence and human-computer interaction will likely lead to extended reality (XR) becoming pervasive. While XR can provide users with interactive, engaging, and immersive experiences, non-player characters are often utilized in pre-scripted and conventional ways. This paper argues for using large language models (LLMs) in XR by embedding them in avatars or as narratives to facilitate inclusion through prompt engineering and fine-tuning the LLMs. We argue that this inclusion will promote diversity for XR use. Furthermore, the versatile conversational capabilities of LLMs will likely increase engagement in XR, helping XR become ubiquitous. Lastly, we speculate that combining the information provided to LLM-powered spaces by users and the biometric data obtained might lead to novel privacy invasions. While exploring potential privacy breaches, examining user privacy concerns and preferences is also essential. Therefore, despite challenges, LLM-powered XR is a promising area with several opportunities.

Exploratory Study: How the Usage of a WhatsApp-based Chatbot Influences Data Collection in Sub-Saharan Africa

  • Lukas Mueller
  • Jackson Mughuma
  • Ulrich von Zadow

In the past, web-based surveys in Sub-Saharan Africa have suffered from connectivity and device ownership issues, and consequently, most surveys in this area are still administered using paper-based methodology. In recent years, however, smartphones and instant messengers have become more prevalent, and therefore, chatbot-based surveys have become an option - one that has, up to now, seldom been used despite large potential benefits.

We are interested in exploring this space: How does the workflow change? How do participants view this? What additional opportunities appear? In this publication, we present an open source tool for WhatsApp-based surveys and an initial qualitative study (N=6) that evaluates practical data collection in Sub-Saharan Africa using this tool. Results indicate a strong increase in convenience for the study participants, while researchers see opportunities for scaling, significant time savings, and the potential for richer media-based data collection (e.g., using audio, images, and GPS coordinates) as benefits.

Exploring Lexical Alignment in a Price Bargain Chatbot

  • Zhenqi Zhao
  • Mariët Theune
  • Sumit Srivastava
  • Daniel Braun

This study investigates the integration of lexical alignment into text-based negotiation chatbots, including its impact on user satisfaction, perceived trustworthiness, and potential influences on negotiation results. Lexical alignment is the phenomenon where participants in a conversation adopt similar words. This study introduces a chatbot architecture for price negotiation, consisting of components such as intent and price/product extractors, dialogue management, and response generation using OpenAI’s API, with a lexical alignment feature. To evaluate the effects of lexical alignment on negotiation outcomes and the user’s perception of the chatbot, a between-subject user experiment was conducted online. A total of 52 individuals participated. While the results do not show statistical significance, they suggest that lexical alignment might positively influence user satisfaction. This finding indicates a potential direction for enhancing user interaction with chatbots in the future.

Findings from Studies on English-Based Conversational AI Agents (including ChatGPT) Are Not Universal

  • Casey C. Bennett

A common, but largely untested, assumption in artificial intelligence (AI) and speech systems is that results from experiments using only the English language as the communication medium will hold true across any other language or cultural context. We argue here, based on emerging recent scientific evidence, that such an assumption appears to be invalid. In fact, there appear to be stark differences across languages and cultures when experiments are conducted using the same artificial speech system setup to be able to communicate in more than one language. Moreover, using those AI systems with bilingual human speakers shows that their behavior, social cues, and communication patterns change when language "code-switching" occurs within the same experiment session. To illustrate our point further, in the second half of the paper we give the specific example of ChatGPT (as the backbone speech content for artificial speech systems) being used for older adults with dementia and Alzheimer’s, who often have altered speech patterns (e.g. slurred pronunciation). There are emerging reports from such research of severe limitations of ChatGPT in such contexts, which highlights the dangers of assuming findings from a narrow range of linguistic and/or cultural contexts can fully capture some universal truths about human communication with artificial agents. Finally, we point out that the reluctance of scientific journals and conferences to publish negative results means many of those emerging reports are only being reported anecdotally, which is problematic for the field of conversational user interfaces (CUI).

Futuring Machines: An Interactive Framework for Participative Futuring Through Human-AI Collaborative Speculative Fiction Writing

  • Jordi Tost
  • Marcel Gohsen
  • Britta Schulte
  • Fidel Thomet
  • Mattis Kuhn
  • Johannes Kiesel
  • Benno Stein
  • Eva Hornecker

Imagining future scenarios arising from events and (in)actions is crucial for democratic participation, but is often left to experts who have in-depth knowledge of, for example, social, political, environmental or technological trends. A widely accepted method for non-experts to think about future scenarios is to write fictional short stories set in speculative futures. To support the writing process and thus further lower the barrier for this form of participation, we introduce Futuring Machines, a framework for collaborative writing of speculative fiction through instruction-based conversation between humans and AI. Futuring Machines is specifically designed to stimulate reflection on future scenarios in both participatory workshops and individual use.

Generating Proactive Suggestions based on the Context: User Evaluation of Large Language Model Outputs for In-Vehicle Voice Assistants

  • Lesley-Ann Mathis
  • Can Günes
  • Kathleen Entz
  • David Lerch
  • Frederik Diederichs
  • Harald Widlroither

Large Language Models (LLMs) have recently been explored for a variety of tasks, most prominently for dialogue-based interactions with users. The future in-car voice assistant (VA) is envisioned as a proactive companion making suggestions to the user during the ride. We investigate the use of selected LLMs to generate proactive suggestions for a VA given different context situations by using a basic prompt design. An online study with users was conducted to evaluate the generated suggestions. We demonstrate the feasibility of generating context-based proactive suggestions with different off-the-shelf LLMs. Results of the user survey show that suggestions generated by the LLMs GPT4.0 and Bison received an overall positive evaluation regarding the user experience for response quality and response behavior over different context situations. This work can serve as a starting point to implement proactive interaction for VA with LLMs based on the recognized context situation in the car.

HASI: A Model for Human-Agent Speech Interaction

  • Nima Zargham
  • Vino Avanesi
  • Thomas Mildner
  • Kamyar Javanmardi
  • Robert Porzel
  • Rainer Malaka

In recent years, the widespread adoption of voice user interfaces (VUIs) highlighted the growing significance of speech interaction in our everyday activities. However, researchers, designers, and developers lack a dedicated interaction model tailored to the intricacies of communication with speech agents. This paper proposes a novel interaction model specifically crafted for speech interaction to better align with the evolving landscape of these systems. Incorporating traditional elements such as sender, message, and receiver, our model also integrates dynamic factors like context, user preferences, and evolving agent capabilities. Drawing from communication models and human-computer interaction (HCI) frameworks, this model aims to deepen our understanding of the process of human-agent speech interaction in real-world scenarios. By initiating discourse on a dedicated speech interaction model, this work serves as a basis for future exploration and refinement, adaptable to evolving technologies and user needs.

Large Language Models and Video Games: A Preliminary Scoping Review

  • Penny Sweetser

Large language models (LLMs) hold interesting potential for the design, development, and research of video games. Building on the decades of prior research on generative AI in games, many researchers have sped to investigate the power and potential of LLMs for games. Given the recent spike in LLM-related research in games, there is already a wealth of relevant research to survey. In order to capture a snapshot of the state of LLM research in games, and to help lay the foundation for future work, we carried out an initial scoping review of relevant papers published so far. In this paper, we review 76 papers published between 2022 to early 2024 on LLMs and video games, with key focus areas in game AI, game development, narrative, and game research and reviews. Our paper provides an early state of the field and lays the groundwork for future research and reviews on this topic.

MAI - A Proactive Speech Agent for Metacognitive Mediation in Collaborative Learning

  • Justin Edwards
  • Andy Nguyen
  • Marta Sobocinski
  • Joni Lämsä
  • Adelson de Araujo
  • Belle Dang
  • Ridwan Whitehead
  • Anni-Sofia Roberts
  • Matti Kaarlela
  • Sanna Jarvela

We introduce MAI - a proactive speech agent aimed at enhancing metacognitive awareness among learners in collaborative learning settings. Background is presented around Socially Shared Regulation of Learning and the role of metacognition in learning. Next, the design of the rules that MAI uses to prompt learners and mediate metacognition are introduced. We describe the conditions in which MAI has been piloted thus far, including as a Wizard of Oz prototype and as a fully functional prototype using natural language processing. We discuss the ethical considerations that went into the prototyping and testing of MAI. Finally, we describe our next steps for understanding the interactions learners had with MAI already, planned design changes, and the future of testing the agent.

MagiChat: An AI-based Card-Guessing Chatbot

  • Dimitra Anastasiou
  • Benjamin Gâteau
  • Luc Vandenabeele
  • Patrick Gratz
  • Johannes Hermen

The integration of CUIs in games is a relatively new trend and not widely reported in the literature yet. In this paper we position a CUI within a magic show format, which can be fully customized for educational, informational, and entertainment formats. Infotainment, a combination of information and entertainment, is very important for broad appeal and engagement, i.e. to attract attention of the wider public, including younger audiences who are more interested in game formats. This paper presents MagiChat, a card guessing game based on a chatbot created with the RASA framework. The interactive dialogue system, which is available in two languages (English, German), combined with the dynamic visualization of cards, it constitutes a creative and attractive game which can be used for infotainment purposes at conferences and exhibitions.

Our Dialogue System Sucks - but Luckily we are at the Top of the Leaderboard!: A Discussion on Current Practices in NLP Evaluation

  • Anouck Braggaar
  • Linwei He
  • Jan De Wit

Currently, leaderboards are often used to evaluate natural language processing (NLP) systems and in particular large language models. In this paper we argue why we should step away from leaderboards and follow a more inclusive approach both in developing as well as in evaluating models. The focus of evaluation should be on the complete context in which the system operates. To accomplish this, researchers should take an inclusive approach and take note of developments in multiple scientific fields (from NLP to communication science).

Prompt Engineering an LLM into Roleplaying a Management Coach: a Short Guide by and for Non-NLP Experts

  • Melissa Guyre
  • Liz Holland
  • Nirva Shah
  • Rahul R. Divekar

Large Language Models (LLMs) combined with prompt engineering have democratized the creation of chatbots, thereby making it possible for domain experts to be directly part of the chatbot-building process. This paper describes the process followed by management domain experts who built a chatbot to coach new managers. We describe the information sources we used as context, the prompts, and the edge cases that led to a viable management coach chatbot. We describe our process as we recognize that while many role-play chatbots exist, their creators rarely share knowledge about the process or the prompts, thereby hindering replicability. In our paper, we share this knowledge so any management expert or researcher can create a chatbot customized to reflect their management values rather than relying on an opaque product built by a third party. Further, we share results from our pilot tests, where external management experts and to-be managers reviewed our chatbot.

SUStory: Usability Evaluation of Conversational Interfaces for Children with a Narrative on a Game Board

  • Shanshan Chen
  • Panos Markopoulos
  • Jun Hu

The increasing application of conversational agents as assistants and playmates for children brings about the need for evaluation methods tailored for children. The System Usability Scale (SUS) for kids is a well-established instrument for measuring subjective aspects of the usability for children. However, using the scale for children presents difficulties relating to children’s concentration and the biases observed when they respond to questionnaires. To make the questionnaire completion enjoyable for children and improve their engagement with the questionnaire, we rendered the adaptation of SUS for children in a game-board format. This paper motivates the presentation of the questionnaire in this way and discusses the use of this instrument in a case study with 35 children aged eight to twelve. We argue that this presentation helped engage with children and there was little evidence of extreme response bias. We conclude that this board format is a more appropriate way to present the SUS questionnaires to children in usability tests, which may also apply to different surveys including rating scales.

Situated Conversational Agents for Task Guidance: A Preliminary User Study

  • Alexandra W.D. Bremers
  • Manaswi Saha
  • Adolfo G. Ramirez-Aristizabal

Multimodal large language models have enabled a new generation of Conversational Agents (CA), leveraging language structure in human discourse to encode-decode multimedia formats (e.g., video-to-audio). These next-generation CAs can be useful in task guidance scenarios, where the user’s attention space is limited and verbal instructions can be overwhelming. In this paper, we explore the role of non-verbal conversational cues in identifying and recovering from errors while performing various assembly tasks. Findings from an exploratory Wizard-of-Oz study (N=8) indicate individual differences and preferences for auditory guidance. Combining these initial findings with our early exploration of the task monitoring system, we discuss implications for the emerging area of situated multimodal CAs for physical task guidance, where conversational interactions are based on inputting visual task actions and generating auditory feedback.

(Social) Trouble on the Road: Understanding and Addressing Social Discomfort in Shared Car Trips

  • Alexandra W.D. Bremers
  • Natalie Friedman
  • Sam Lee
  • Tong Wu
  • Eric Laurier
  • Malte F Jung
  • Jorge Ortiz
  • Wendy Ju

Unpleasant social interactions on the road can negatively affect driving safety. At the same time, researchers have attempted to address social discomfort by exploring Conversational User Interfaces (CUIs) as social mediators. Before knowing whether CUIs could reduce social discomfort in a car, it is necessary to understand the nature of social discomfort in shared rides. To this end, we recorded nine families going on drives and performed interaction analysis on this data. We define three strategies to address social discomfort: contextual mediation, social mediation, and social support. We discuss considerations for engineering and design, and explore the limitations of current large language models in addressing social discomfort on the road.

Speculating About Multi-user Conversational Interfaces and LLMs: What If Chatting Wasn't So Lonely?

  • William Seymour
  • Emilee Rader

The advent of LLMs means that CUIs are cool again, but what isn’t so cool is that we’re doomed to use them alone. The one user, one account, one device paradigm has dominated the design of CUIs and is not going away as new conversational technologies emerge. In this provocation we explore some of the technical, legal, and design difficulties that seem to make multi-user CUIs so difficult to implement. Drawing inspiration from the ways that people manage messy group discussions, such as parliamentary and consensus-based paradigms, we show how LLM-based CUIs might be well suited to bridging the gap. With any luck, this might even result in everyone having to sit through fewer poorly run meetings and agonising group discussions—truly a laudable goal!

The Voice: Lessons on Trustworthy Conversational Agents from 'Dune'

  • Philip Gregory Feldman

The potential for untrustworthy conversational agents presents a significant threat for covert social manipulation. Taking inspiration from Frank Herbert’s Dune [12], where the Bene Gesserit Sisterhood uses the Voice for influence, manipulation, and control of people, we explore how generative AI provides a way to implement individualized influence at industrial scales. Already, these models can manipulate communication across text, image, speech, and most recently video. They are rapidly becoming affordable enough for any organization of even moderate means to train and deploy. If employed by malicious actors, they risk becoming powerful tools for shaping public opinion, sowing discord, and undermining organizations from companies to governments. As researchers and developers, it is crucial to recognize the potential for such weaponization and to explore strategies for prevention, detection, and defense against these emerging forms of sociotechnical manipulation.

Toward a Third-Kind Voice for Conversational Agents in an Era of Blurring Boundaries Between Machine and Human Sounds

  • Jeesun Oh
  • Hyeonjeong Im
  • Sangsu Lee

The voice of widely used conversational agents (CAs) is standardized to be highly intelligible, yet it still sounds machine-generated due to its artificial qualities. With advancements in deep neural networks, voice synthesis technology has become nearly indistinguishable from a real person. The voice enables users to discern the speakers’ identities and significantly impacts user perception, particularly in voice-only interactions. While more natural, human-sounding voices are generally preferred, their use in CAs raises potential ethical dilemmas, such as eliciting unwanted social responses or confusing the nature of the speaker. In this evolving landscape, it is necessary to understand the voice characteristics from multiple facets of voice design for CAs. Therefore, our study examines the voice characteristics of both artificial-sounding and human-sounding voices. Then, we propose a ‘third-kind’ of voice that considers the characteristics of each voice type. This discussion contributes to the debate on the future direction of voice design in the field of Conversational User Interface research.

UnMute Toolkit: Speech Interactions Designed With Minoritised Language Speakers

  • Thomas Reitmaier
  • Electra Wallington
  • Ondrej Klejch
  • Dani Kalarikalayil Raju
  • Nina Markl
  • Emily Esther Nielsen
  • Gavin Bailey
  • Jennifer Pearson
  • Matt Jones
  • Peter Bell
  • Simon Robinson

In this paper and interactive exhibit we demonstrate a portfolio of systems and approaches that progressively vary on the theme of co-creating and situating spoken-language technologies to suit the needs, functions, and ways of speaking of diverse, resource-constrained, and under-heard language communities in South Africa and India. These systems demonstrate the benefits of human-centred machine learning methodologies and showcase how language technologies and conversational systems can broaden digital participation of minoritised language communities.

Understanding Linguistic and Visual Factors that Affect Human Trust Perception of Virtual Agents

  • Natalia Tyulina
  • Tatiana Aloi Emmanouil
  • Sarah Ita Levitan

This work investigates how visual and spoken cues of virtual agents interact to affect user perception of agent trustworthiness. It is directly motivated by practical applications, such as an assistive robot companion for the elderly or homebound, or a virtual agent that can provide psychological assessment and treatment for individuals with mental health challenges. Such technologies have the capacity to assist human users in impactful ways, but without human trust in these systems, adoption and usage will remain severely limited. Our findings reveal strong correlations between both visual and auditory features and perceived trustworthiness. This underscores the importance of incorporating a comprehensive range of nonverbal cues and auditory signals into interface design.

Unravelling the Paradigm: Prioritizing Process over Results—Sports Training as a Catalyst for Age-Friendly CUI Design

  • Rezvan Boostani
  • Cosmin Munteanu
  • Anastasia Kuzminykh

Existing accessibility research may unintentionally marginalize older adults and exacerbate the digital divide. Emerging Conversational User Interfaces (CUI), particularly in essential sectors like finance, lack age-friendliness design. Through a review of CUI literature, we illustrate how existing evidence underscores the prevalent exclusion of older adults in research and design. The exclusion of this demographic is stark and more pronounced in services in many critical domains, e.g., finance. To rectify this, we must fundamentally alter our approach to CUI design to actively involve older adults from its inception. We argue that we need fundamental shifts in our methods, prioritizing processes over specific outcomes in design methodologies. This entails close collaboration with older adults throughout the research and design to gain deep insights into their needs, mental models, and expectations. It is time to confront these biases and ensure equitable access for all.

Unveiling Information Through Narrative In Conversational Information Seeking

  • Vahid Sadiri Javadi
  • Johanne R Trippas
  • Lucie Flek

Searching through conversational interactions has been emphasized as the next frontier. Nowadays, conversational agents can generate natural language responses, transforming how we search for information. A key challenge in conversational information-seeking is how these agents present information: should they only reflect facts, cater to human cognitive preferences, or strike a balance between them? These challenges raise questions about aligning conversational agents with human cognitive processes. Our position paper emphasizes the role of narrative in addressing these questions. We explore how narratives influence human comprehension and propose a framework for optimal conversational narratives. These narratives aim to enhance interaction between humans and conversational agents in explanatory information-seeking scenarios.

Using Large Language Models for Robot-Assisted Therapeutic Role-Play: Factuality is not enough!

  • Sviatlana Höhn
  • Jauwairia Nasir
  • Ali Paikan
  • Pouyan Ziafati
  • Elisabeth André

Robot-assisted social role-play can help neurodivergent individuals practice social skills in a safe environment. Large language models (LLM) facilitate the implementation of such agents. However, high quality standards must be ensured in this sensitive setting. This article argues that current evaluation methods of generated language are not sufficient because they are grounded in beliefs about language as an external code to describe the world (referential functions of language). We argue that non-referential functions of language must be part of the evaluation of LLM-generated language when LLMs engage in social interactions with users. We test the feasibility of our approach in a pilot implementation of a platform for robot-assisted social role-play. Out proposed evaluation framework helps to assess systematically referential and non-referential functions of LLM-generated language. We argue that the evaluation framework can be also applied to multimodal interaction.

You Today, Better Tomorrow: Envisioning the Role of Conversation in Recommender Systems of the Future

  • Manveer Kalirai
  • Anastasia Kuzminykh

Recommender systems could evolve from traditional models of recommendation that largely harness data on past interactions to predict what a user might want in a given moment, towards systems that also support and nurture user self-actualization. This shift could guide users in exploring and fulfilling the needs of their future potential selves, untethered from their past and current identities. In this provocation, we suggest that interactive conversational recommendation is a suitable means to rouse this vision. Conversational recommendation is capable of eliciting real-time and layered preferences, and can enable systems to take on a more proactive role in dialoguing with users about their aspirational needs—particularly in helping users navigate the intricacies that often surround these needs. We also examine the potential challenges associated with the realization of such recommender systems—for instance, the complexities in transitioning from past-based patterns of personalization to those that accommodate present-oriented and future-oriented personalization, and the preservation of user agency whilst broadening the scope of roles recommender systems can play. Overall, this paper advocates for a necessary progression in recommender systems, one propelled by conversational recommendation, towards designs that not only avail present-day user needs, but also actively stimulate pathways toward the actualization of their potential and aspirational future selves.

"¿Te vienes? Sure!" Joint Fine-tuning of Language Detection and Transcription Improves Automatic Recognition of Code-Switching Speech

  • Leopold Hillah
  • Mateusz Dubiel
  • Luis A. Leiva

Human communication in multilingual communities often leads to code-switching, where individuals seamlessly alternate between two or more languages in their daily interactions. While this phenomenon has been increasingly prevalent thanks to linguistic globalization, it presents challenges for Automatic Speech Recognition (ASR) systems since they are designed with the assumption of transcribing a single language at a time. In this work, we propose a simple yet unexplored approach to tackle this challenge by fine-tuning the Whisper pre-trained model jointly on language identification (LID) and transcription tasks through the introduction of an auxiliary LID loss term. Our results show significant improvements in transcription errors, ranging between 14 and 36 percentage points of difference. Ultimately, our work opens a new direction for research on code-switching speech, offering an opportunity to enhance current capabilities of conversational agents.

SESSION: Session: Workshops

EduCUI: The First International Workshop for Conversational User Interfaces in Education

  • Justin Edwards
  • Adelson de Araujo
  • Smit Desai
  • Jan de Wit
  • Heloisa Candello
  • Daniel J. Rough
  • Leigh Clark
  • Anni-Sofia Roberts
  • Benjamin R Cowan

This workshop aims to connect the Conversational User Interfaces (CUI) and the learning sciences and educational technologies (EduTech) communities through discussion of their shared view of the future of conversational user interfaces in educational contexts. The workshop aims to encourage creative consideration of the trade-offs in design and cross-disciplinary expertise needed for designing CUIs for education. Considerations of these design opportunities and limitations will be explored through a design activity and performance activity, by which attendees will get to know each other’s areas of expertise and networks of resources. The half-day workshop will facilitate greater communication and collaboration between these communities in a popular and socially important research area.

Designing Age-Inclusive Interfaces: Emerging Mobile, Conversational, and Generative AI to Support Interactions across the Life Span

  • Sayan Sacar
  • Cosmin Munteanu
  • Jaisie Sin
  • Christina Wei
  • Sergio Sayago
  • Wei Zhao
  • Jenny Waycott

We are concurrently witnessing two significant shifts: voice and chat-based conversational user interfaces (CUIs) are becoming ubiquitous (especially more recently due to advances in generative AI and LLMs - large language models), and older people are becoming a very large demographic group (and increasingly adopting of mobile technology on which such interfaces are present). However, despite the recent increase in research activity, age-relevant and inter/cross-generational aspects continue to be underrepresented in both research and commercial product design. Therefore, the overarching aim of this workshop is to increase the momentum for research within the space of hands-free, mobile, and conversational interfaces that centers on age-relevant and inter- and cross-generational interaction. For this, we plan to create an interdisciplinary space that brings together researchers, designers, practitioners, and users, to discuss and share challenges, principles, and strategies for designing such interfaces across the life span. We thus welcome contributions of empirical studies, theories, design, and evaluation of hands-free, mobile, and conversational interfaces designed with aging in mind (e.g. older adults or inter/cross-generational). We particularly encourage contributions focused on leveraging recent advances in generative AI or LLMs. Through this, we aim to grow the community of CUI researchers across disciplinary boundaries (human-computer interaction, voice and language technologies, geronto-technologies, information studies, etc.) that are engaged in the shared goal of ensuring that the aging dimension is appropriately incorporated in mobile / conversational interaction design research.

Between Trust and Identity: Form, Function, and Presentation

  • Minha Lee
  • Donald McMillan
  • Illaria Torre
  • Joel E. Fischer
  • Yvon Ruitenburg

As conversational user interfaces (CUIs) evolve, trust and identity challenges become increasingly pronounced. The identities of CUIs are multifaceted, incorporating individual names like Siri or Alexa, the companies behind them like Apple or Amazon, and various attributes, e.g., race, gender, and class, as perceived by people and/or as designed into these CUIs. Identity is also encoded in the embodiment, be it as an abstract animation on a watch, an avatar in virtual reality, or a humanoid robot, as well as in the backstory designers give these agents. But, if identity is fragmented, e.g., across multiple physical forms, if and how users can establish trust becomes difficult to address. Drawing from diverse fields including ethics, design, and engineering, we explore the hurdles posed by ambiguous identities. A dynamic embodiment of a CUI across multiple devices presents technical complexities, and importantly, it raises ethical dilemmas surrounding trust. In this workshop, we aim to synthesize research goals and methods to further probe the intricacies of identity fragmentation and its implications for user trust in CUIs. To pursue a collaborative debate, we formulate that trust and identity suffer from the chicken or egg dilemma; should issues surrounding identity be resolved first before trust can even be conceived to be possible between humans and CUIs? Can users truly trust a CUI that lacks a consistent and transparent identity, and would that trust be different for different embodiments and platforms? We consider that trust itself perhaps should be questioned given that the issues surrounding identity are not resolved. We additionally discuss whether a uniform identity across all interfaces is conducive to user trust, or whether the adoption of distinct personas on disparate platforms is more effective in engendering user trust.

Voicecraft: Designing Task-specific Voice Assistant Personas

  • Mateusz Dubiel
  • Smit Desai
  • Nima Zargham
  • Anuschka Schmitt

Voicecraft workshop aims to establish a research community focused on the design and evaluation of Voice Assistant (VA) personas for both task-oriented functions (e.g., information search, online shopping) and personal growth applications (e.g., coaching, mindful reflection, tutoring). Through discussion and collaborative efforts, we will seek to propose a set of practices and standards that will help to improve the ecological validity of VA personas. In particular, we will explore topics such as the interaction design of voice-based interfaces, the impact of agent personas on the user experience, and the approaches for designing such VA agents. This workshop will serve as a platform to build a better-equipped community to explore VA personas that provide a better fit to range of everyday interaction scenarios.