A preventative approach is crucial for adolescents’ mental well-being, as problems often arise at a young age. Acceptance and Commitment Therapy (ACT) is an evidence-based intervention approach used to enhance psychological flexibility, a central factor in adolescents’ mental well-being. Conversational interfaces are recently being experimented with in mental health promotion. Their conversational style plays a significant role in creating meaningful experiences to achieve positive intervention outcomes. In this study, our objective was to understand adolescents’ expectations of the conversational style of a text-based virtual coach being developed as part of an ACT-based online program to support intervention engagement. We evaluated eight conversation scripts by collecting qualitative and quantitative data through an online survey from over 200 adolescents. Our findings provide insights on preferred conversational interface features regarding conversational style, including language use, artificiality, and empathy in the domain of adolescent mental well-being.
The launch of ChatGPT has attracted significant attention and showcased the potentially game-changing capabilities of conversational AI. These capabilities, and lack of user research, highlight the need to investigate how users experience interactions with conversational AIs like ChatGPT. Therefore, we conducted a questionnaire study with ChatGPT users (N=194), inquiring about their good and poor experiences with ChatGPT. The user reports were analyzed by a thematic analysis and systematized through a pragmatic-hedonic framework. Our results demonstrate how user experience is influenced by pragmatic attributes such as ChatGPT providing useful and detailed information and easing work- or school-related tasks. Additionally, user experience is impacted by hedonic attributes, such as entertainment and creative interactions, and interactions leaving the user impressed or surprised. Our study underscores that user experience concerning conversational AI like ChatGPT is assessed by useful and productive interactions even in early phase of uptake, suggesting the importance of pragmatic attributes.
ChatGPT, an AI chatbot, has gained popularity for its capability in generating human-like responses. However, this feature carries several risks, most notably due to its deceptive behaviour such as offering users misleading or fabricated information that could further cause ethical issues. To better understand the impact of ChatGPT on our social, cultural, economic, and political interactions, it is crucial to investigate how ChatGPT operates in the real world where various societal pressures influence its development and deployment. This paper emphasizes the need to study ChatGPT "in the wild", as part of the ecosystem it is embedded in, with a strong focus on user involvement. We examine the ethical challenges stemming from ChatGPT’s deceptive human-like interactions and propose a roadmap for developing more transparent and trustworthy chatbots. Central to our approach is the importance of proactive risk assessment and user participation in shaping the future of chatbot technology.
Policymakers are increasingly interested in using virtual assistants to augment social care services in the context of a demographic ageing crisis. At the same time, technology companies are marketing conversational user interfaces (CUIs) and smart home systems as assistive technologies for elderly and disabled people. However, we know relatively little about how today’s commercially available CUIs are used to assist in everyday homecare activities, or how care service users and human care assistants interpret and adapt these technologies in practice. Here we report on a longitudinal conversation analytic case study to identify, describe, and share how CUIs can be used as assistive conversational agents in practice. The analysis reveals that, while CUIs can augment and support new capabilities in a homecare environment, they cannot replace the delicate interactional work of human care assistants. We argue that CUI design is best inspired and underpinned by a better understanding of the joint coordination of homecare activities
We present “Mystery Agent,” an interactive storytelling voice user interface (VUI) equipped with self-regulated learning strategies to deliver informal health-related learning to older adults through a murder mystery story. We conducted a mixed methods user study with 10 older adults to evaluate Mystery Agent, using usability and perception-based questionnaires, followed by a semi-structured interview and co-design activity to generate design insights and identify design priorities. We found older adults had a positive experience interacting with Mystery Agent and considered storytelling to be an appropriate and engaging way to learn health information. However, older adults identified credibility, compassion, and control as crucial factors influencing long-term use. To address this, we present design guidelines using Mystery Agent as an example to help practitioners and researchers devise novel solutions to address the informal health information learning needs of older adults.
(A sonnet)
A new force awakened this year ’23.
A friend, or a foe, or a weapon - all three?
Too early to say, but too plainly to see;
that this is the Year Of The ChatGPT.
Does this spell the end of CUI as we know it?
If so, can we stop it, or kill it, or slow it?
What’s the point of me trying to write like a poet?
If I’ve got an idea, then I’d better well show it.
I promise this sonnet of pure provocation,
is tied in my mind to AI conversation
A CUI imbued with ideas for creation -
suppose I propose such a bold application?
Can CUIs inspire us with poetic verse?
A silly idea, but I’m sure you’ve heard worse.
The recent months have seen an explosion of interest, hype, and concern about generative AI, driven by the release of ChatGPT. In this article I seek to explicate some potential and actual harms of the engineering and use of generative AI such as ChatGPT. With this I also suggest a reframing for researchers with an interest in interaction. With this reframing I seek to provoke researchers to consider studying the settings of ChatGPT development and use as active sites of production. Research should focus on the organisational, technological and interactional practices and contexts in and through which generative AI and its outputs—harmful and otherwise—are produced, by whom, to what end, and with what consequences on societies.
This paper explores the lived experience of using ChatGPT in HCI research through a month-long trioethnography. Our approach combines the expertise of three HCI researchers with diverse research interests to reflect on our daily experience of living and working with ChatGPT. Our findings are presented as three provocations grounded in our collective experiences and HCI theories. Specifically, we examine (1) the emotional impact of using ChatGPT, with a focus on frustration and embarrassment, (2) the absence of accountability and consideration of future implications in design and raise (3) questions around bias from a Global South perspective. Our work aims to inspire critical discussions about utilizing ChatGPT in HCI research and advance equitable and inclusive technological development.
Human-machine dialogue (HMD) research debates the degree to which language production in this context is egocentric or allocentric. That is, the degree to which a person might take a machine’s perspective into account. Our study aims to identify whether users produce allocentric or egocentric language within speech-based HMD when there is asymmetry in the information available to both partners. Through an adapted referential communication task, we manipulated the presence or absence of visual distractors and occlusions, similarly to previous referential tasks used in psycholinguistic research. Results show that people are sensitive to the presence of distractors and occlusions and tend to produce more informative expressions to help machine partners account for the visual asymmetries. We discuss the findings on how allocentric production in HMD is explained by how the division of labour manifests in spoken HMD. The findings further our understanding of the language production mechanisms in HMD.
Voice assistants interrupt people when they pause mid-question, a frustrating interaction that requires the full repetition of the entire question again. This impacts all users, but particularly people with cognitive impairments. In human-human conversation, these situations are recovered naturally as people understand the words that were uttered. In this paper we build answer pipelines which parse incomplete questions and repair them following human recovery strategies. We evaluated these pipelines on our new corpus, SLUICE. It contains 21,000 interrupted questions, from LC-QuAD 2.0 and QALD-9-plus, paired with their underspecified SPARQL queries. Compared to a system that is given the full question, our best partial understanding pipeline answered only 0.77% fewer questions. Results show that our pipeline correctly identifies what information is required to provide an answer but is not yet provided by the incomplete question. It also accurately identifies where that missing information belongs in the semantic structure of the question.
The focus on one-to-one speak/wait conversational interaction with artificial system is partly misguided and partly cynical. Misguided because it pre-supposes that our relationship with such a system should be one-to-one and that human-like turn taking is never required. Cynical because we avoid the difficult challenge of building complex systems with a problematic route for publication. Whereas vision systems are regularly used in social robots and virtual agents to detect multiple dialogue partners and aid diarization, speech analysis and human-like turn-taking has lagged far behind. In this positional paper we make the case for focusing on human-like turn taking and multi-party interaction, discuss why realtime speech analysis and conversational management has been neglected, and put forward a program to correct this.
From straightforward interactions to full-fledged open-ended dialogues, Conversational User Interfaces (CUIs) are designed to support end-user goals and follow their requests. As CUIs become more capable, investigating how to restrict or limit their ability to carry out user requests becomes increasingly critical. Currently, such intentionally constrained user interactions are accompanied by a generic explanation (e.g., “I’m sorry, but as an AI language model, I cannot say...”). We describe the role of moral bias in such user restrictions as a potential source of conflict between CUI users’ autonomy and system characterisation as generated by CUI designers. Just as the users of CUIs have diverging moral viewpoints, so do CUI designers—which either intentionally or unintentionally affects how CUIs communicate. Mitigating user moral biases and making the moral viewpoints of CUI designers apparent is a critical path forward in CUI design. We describe how moral transparency in CUIs can support this goal, as exemplified through intelligent disobedience. Finally, we discuss the risks and rewards of moral transparency in CUIs and outline research opportunities to inform the design of future CUIs.
The aim of this study is to explore the challenges and experiences of conversational agent (CA) practitioners in order to highlight their practical needs and bring them into consideration within the scholarly sphere. A range of data scientists, conversational designers, executive managers and researchers shared their opinions and experiences through semi-structured interviews. They were asked about emerging trends, the challenges they face, and the design processes they follow when creating CAs. In terms of trends, findings included mixed feelings regarding no-code solutions and a desire for a separation of roles. The challenges mentioned included a lack of socio-technical tools and conversational archetypes. Finally, practitioners followed different design processes and did not use the design processes described in the academic literature. These findings were analyzed to establish links between practitioners’ insights and discussions in related literature. The goal of this analysis is to highlight research-practice gaps by synthesising five practitioner needs that are not currently being met. By highlighting these research-practice gaps and foregrounding the challenges and experiences of CA practitioners, we can begin to understand the extent to which emerging literature is influencing industrial settings and where more research is needed to better support CA practitioners in their work.
As chatbots gain popularity across a variety of applications, from investment to health, they employ an increasing number of features that can influence the perception of the system. Since chatbots often provide advice or guidance, we ask: do these aspects affect the user’s decision to follow their advice? We focus on two chatbot features that can influence user perception: 1) response variability in answers and delays and 2) reply suggestion buttons. We report on a between-subject study where participants made investment decisions on a simulated social trading platform by interacting with a chatbot providing advice. Performance-based study incentives made the consequences of following the advice tangible to participants. We measured how often and to what extent participants followed the chatbot’s advice compared to an alternative source of information. Results indicate that both response variability and reply suggestion buttons significantly increased the inclination to follow the advice of the chatbot.
Recent interest in speech-to-text applications has found speech to be an efficient modality for text input. However, the spontaneity of speech makes direct transcriptions of spoken compositions effortful to edit. While previous works in Human-Computer Interaction (HCI) domain focus on improving error correction, there is a lack of theoretical ground around the understanding of speech as an input modality. This work explores literature from Cognitive Science to synthesize relevant theories and findings for the HCI audience to reference. Motivated by the literature indicating a fast memory decay of speech production and a preference towards gist abstraction in memory traces, an experiment was conducted to observe users’ immediate recall of their verbal composition. Based on the theories and findings, we introduce new interaction concepts and workflows that adapt to the characteristics of speech input.
Voice assistants are becoming increasingly useful and support realistic conversations, yet communication breakdowns occur. We investigate the use of humor as a repair strategy in an experiment where the voice assistant makes a mistake and then utilizes one of four humorous personalities to repair the breakdown in the conversation. We conducted a study with 30 participants, each of whom took the Humor Style Questionnaire (HSQ) to understand their predisposition to humor type, and then engaged in conversation with each of the four humorous personalities and one that was designed to give neutral repair responses (non-humorous). Aggressive personalities were rated as the funniest, yet there was no clear connection between the participant’s humor style and their preferred voice assistant personality. While humorous responses were successful in repairing communication breakdowns, participants overall preferred non-humorous responses. This research provides insight into the role of humor in communication breakdown repair with voice assistants.
Valuable insights into an individual’s current thoughts and stance regarding behaviour change can be obtained by analysing the language they use, which can be conceptualized using Motivational Interviewing concepts. Training conversational agents (CAs) to detect and employ these concepts could help them provide more personalized and effective assistance. This study investigates the similarity of written language around behaviour change spanning diverse conversational and social contexts and change objectives. Drawing on previous research that applied MI concepts to texts about health behaviour change, we evaluate the performance of existing classifiers on six newly constructed datasets from diverse contexts. To gain insights in determining factors when identifying change language, we explore the impact of lexical features on classification. The results suggest that patterns of change language remain stable across contexts and domains, leading us to conclude that peer-to-peer online data may be sufficient to train CAs to understand user utterances related to behaviour change.
Conversational agents mimic natural conversation to interact with users. Since the effectiveness of interactions strongly depends on users’ perception of agents, it is crucial to design agents’ behaviors to provide the intended user perceptions. Research on human-agent and human-human communication suggests that speech specifics are associated with perceptions of communicating parties, but there is a lack of systematic understanding of how speech specifics of agents affect users’ perceptions. To address this gap, we present a framework outlining the relationships between elements of agents’ conversation architecture (dialog strategy, content affectiveness, content style and speech format) and aspects of users’ perception (interaction, ability, sociability and humanness). Synthesized based on literature reviewed from the domains of HCI, NLP and linguistics (n=57), this framework demonstrates both the identified relationships and the areas lacking empirical evidence. We discuss the implications of the framework for conversation design and highlight the inconsistencies with terminology and measurements.
Over the past decade, voice user interface (VUI) design has been steadily growing, along with a growing VUI presence in consumer markets. However, there is currently a lack of widely-established guidelines for VUI design. While many sets of VUI guidelines have been proposed, they tend to be developed independently of each other, leading to a lack of consensus on appropriate guidelines for VUI design. This can hinder the wider adoption of practical VUI guidelines. To address this gap, we performed a large-scale meta-analysis of 336 VUI design guidelines that have been proposed in academic literature. Using thematic analysis, we present a unified and synthesized set of 14 guidelines, representing the most universally proposed principles of VUI design as captured by the 336 VUI guidelines identified in academic literature. We hope that this synthesized set can address several of the challenges to the adoption of VUI guidelines in design practice.
Programming by voice is a potentially useful method for individuals with motor impairments. Spoken programs can be challenging for a standard speech recognizer with a language model trained on written text mined from sources such as web pages. Having an effective language model that captures the variability in spoken programs may be necessary for accurate recognition. In this work, we explore how novice and expert programmers speak code without requiring them to adhere to strict grammar rules. We investigate two approaches to collect data by having programmers speak either highlighted or missing lines of code. We observed that expert programmers spoke more naturally, while novice programmers spoke more syntactically. A commercial speech recognizer had a high error rate on our spoken programs. However, by adapting the recognizer’s language model with our spoken code transcripts, we were able to substantially reduce the error rate by 27% relative to the baseline on unseen spoken code.
Unclear speech, like mumbling, is difficult to understand for people, and even harder for conversational user interfaces (CUI) to process. Yet, there are multiple reasons why unintelligible speech is meaningful between humans, playing a critical role in social dynamics and status signaling that evolved in humans over time allowing us to form cohesive social groups for survival. For example in modern times, humans often use such changed speech in order to make themselves understandable only by in-group members, e.g. “mumble rap”, while subtly excluding out-group members. As such, we argue here that future CUIs must be attentive to how people use various forms of non-standard changed speech (e.g. mumbling, dialect, slang, inflection) to express themselves, lest CUIs be socially inept. Based on psychological, linguistic, and cross-cultural research, we point out several major challenges for researchers: 1) current CUIs typically omit non-standard speech like mumbling which are critical to human social communication, and 2) in the future humans may innately form ingroups with their personal CUIs resulting in speech behaviors meant to exclude outgroup members (both humans and machines). Both of those challenges require more research to address. Moreover, the use of changed speech for status signaling and ingroup/outgroup (IG/OG) signaling appears to be a phenomenon that varies across diverse cultures, languages, and situations, which CUI designers and engineers need to be mindful of going forward.
Conversation design at least partly aspires to create Voice User Interfaces which emulate human speech production. And yet, there is no established approach for the development of naturalistic conversational infrastructure for VUIs; conversation designers are advised to work from their common sense understanding of conversation, producing written scripts, based on memory and imagination, which are later converted into speech. This is a shortcoming in conversation design which needs to be addressed. In this provocation paper, we argue that the starting point in the development of any VUI should be the examination of natural spoken conversation, preferably from the same interactional context in which the VUI will be deployed. We provide a short example to illustrate how the current process of conversation scriptwriting can be a barrier to this, and demonstrate how this can be overcome using the social scientific approach of Conversation Analysis (CA).
During their information seeking people tend to filter out all the parts of the available information that do not fit their existing beliefs or opinions. In this paper we present a model for this “Self-imposed Filter Bubble” (SFB) consisting of four dimensions. Thereby, we aim to 1) estimate the probability of the user being caught in an SFB and consequently, 2) identify suitable clues to reduce this probability in the further course of a dialogue. Using an exemplary implementation in an argumentative dialogue system, we demonstrate the validity and applicability of this model in an online user study with 102 participants. These findings serve as a basis for developing a system strategy to break the user’s SFB and contribute to a sustainable and profound reflection on a topic from all viewpoints.
Research on voice assistants has primarily studied how people use them for informational needs, music requests, and to control electronic devices (e.g., IoT). Recent research suggests people such as older adults want to use them to address social and relational needs, but lacks empirical evidence to show how older adults are currently engaging in these behaviors. In this paper, we use a machine learning approach to analyze more than 600,000 queries that 456 older adults in assisted living communities made to Amazon Alexa devices over two years, classifying how older adults use voice assistants for social well-being purposes. We present empirical evidence showing how older adults engage in three primary relational behaviors with Alexa – 1) asking personal questions to "get to know" the assistant, 2) asking for advice and 3) engaging with the voice assistant to alleviate stress. We use these findings to discuss ethical implications of voice assistant use in long-term care settings.
This paper explores the potential implications of embodied conversational agents (ECAs) in healthcare, focusing on the impact of appearance and conversation style on trustworthiness. We conducted a Research through Design investigation of ECAs for supporting women during the periconception period and in pregnancy. The paper presents the results of a Wizard of Oz study in which two alternative prototypes, a chatbot, and an ECA, were tested in a tertiary hospital by 25 participants. Reflecting on the results we suggest that limited patients’ trust in ECAs maybe be beneficial for achieving trustworthy use of these agents in the healthcare context.
The theme for CUI 2023 is ‘designing for inclusive conversation’, but who are CUIs really designed for? The field has its roots in computer science, which has a long acknowledged diversity problem. Inspired by studies mapping out the diversity of the CHI and voice assistant literature, we set out to investigate how these issues have (or have not) shaped the CUI literature. To do this we reviewed the 46 full-length research papers that have been published at CUI since its inception in 2019. After detailing the eight papers that engage with accessibility, social interaction, and performance of gender, we show that 90% of papers published at CUI with user studies recruit participants from Europe and North America (or do not specify). To complement existing work in the community towards diversity we discuss the factors that have contributed to the current status quo, and offer some initial suggestions as to how we as a CUI community can continue to improve. We hope that this will form the beginning of a wider discussion at the conference.
Conversational agents are rapidly advancing in terms of their capabilities and human likeness - both of which are intended to enhance the user experience and engagement. One human quality that can potentially increase trust and likeability is humor. However, what is considered humorous and what is not depends on many contextual and personal factors that are not only difficult for machines to detect, but even humans are still struggling to understand them. This makes training AI to be humorous highly challenging. But is this due only to the technical limitations? In this provocation paper, we discuss the hindrances to utilizing humor in commercial conversational agents and propose addressing this topic from a social and political perspective.
We examine the ideological differences in the debate surrounding large language models (LLMs) and AI regulation, focusing on the contrasting positions of the Future of Life Institute (FLI) and the Distributed AI Research (DAIR) institute. The study employs a humanistic HCI methodology, applying narrative theory to HCI-related topics and analyzing the political differences between FLI and DAIR, as they are brought to bear on research on LLMs. Two conceptual lenses, “existential risk” and “ongoing harm,” are applied to reveal differing perspectives on AI's societal and cultural significance. Adopting a longtermist perspective, FLI prioritizes preventing existential risks, whereas DAIR emphasizes addressing ongoing harm and human rights violations. The analysis further discusses these organizations’ stances on risk priorities, AI regulation, and attribution of responsibility, ultimately revealing the diverse ideological underpinnings of the AI and LLMs debate. Our analysis highlights the need for more studies of longtermism's impact on vulnerable populations, and we urge HCI researchers to consider the subtle yet significant differences in the discourse on LLMs.
Small business owners (SBOs) face several challenges when asking for microcredit loans from financial institutions. Usual difficulties include having low credit scores, unbanking situations, outstanding debts, informal employment situations, inability to showcase their payment capability, and lack of financial guarantor. Moreover, SBOs often find it hard to apply for microcredit loans due to bureaucracy, proof documents, and lack of information on how to proceed. For those reasons, banks and non-profit organizations have credit agents and advisors to give SBOs directions, and help them. Moreover, there are plenty of NGOs focused on financial education to teach the basics of business management and planning. The task of asking for a loan is a complex practice, and asymmetric power relationships might emerge from it, what does not benefit micro-entrepreneurs. In this provocation paper, we aim to investigate credibility as a value, and describe how a conversational system based on artificial intelligence might be employed to mitigate perceptions of mistrust, and occasionally, in trying to achieve that, inadvertently amplify those perceptions.
Current conversational and hierarchical structures between Personal Assistants (PAs) and drivers are clearly and explicitly defined, in that users are either hierarchically superordinate or on par with their PAs. Simultaneously, technological advances around intelligence, personalization and proactivity have gained momentum in the most recent past. The exponential development of intelligence-based technologies will soon enable PAs to take over large parts of the driving process. PAs will convince drivers to trust in their capabilities, by explaining themselves and their decision-making processes. They will consider drivers’ knowledge, experience, and needs to transform interactions into deeply personal experiences. Furthermore, proactive PAs will provide context-sensitive content in adequate situations. These developments may just tip the scale and challenge currently valid PA-driver-relations. We envision roles to undergo a reversal: where conversations are now driven by users and assisted by assistants, intelligent, personalized, and proactive PAs will take over the lead and, figuratively, the driver seat.
Artificial Intelligence (AI)-based Conversational Agents (CAs) have a great potential to include marginalized and vulnerable populations. However, some issues still make these interfaces exclusive for some users. This paper proposes to discuss how increasing CAs’ transparency can contribute to these systems’ inclusiveness and indicates open issues that must be addressed to make AI-based CAs more transparent and inclusive. We argue that adding more guidance to users on how CAs work, what they can do, and how they may be operated might alleviate older adults’ misperceptions about functioning and privacy that hamper CAs’ adoption, facilitate its usage for people with impairments, and help identify possible prejudicial biases. As challenges, researchers and practitioners should investigate how to determine appropriate levels of transparency through personalization, produce human-centered knowledge on transparency, and study new methods, tools, and processes to support CA development that considers inclusiveness.
Conversational agents have limited conversational capabilities and there is a debate as to whether interactions with conversational user interfaces (CUIs) are truly conversational. Currently, most news and journalistic content is presented in a monologic form. Simultaneously, there is an expectation that CUIs can change how we interact with news content. To explore what conversational interactions with the news could look like, two co-speculation workshops were arranged. The design-led inquiries focus on how conversations can be used as a resource for designing interactions with CUIs for news. Three different prototyping techniques were used in the design explorations: storyboarding, scripting and role-playing. Our work offers two main contributions: 1) We identify three dimensions relevant to the design space of CUI for news: the CUIs’ role, conversational capabilities, and locus of control, and 2) a critical reflection on the potential of different techniques for prototyping CUIs.
Spoken dialogue systems (SDSs) have been separately developed under two different categories, task-oriented and chit-chat. The former focuses on achieving functional goals and the latter aims at creating engaging social conversations without special goals. Creating a unified conversational model that can engage in both chit-chat and task-oriented dialogue is a promising research topic in recent years. However, the potential “initiative” that occurs when there is a change between dialogue modes in one dialogue has rarely been explored. In this work, we investigate two kinds of dialogue scenarios, one starts from chit-chat implicitly involving task-related topics and finally switching to task-oriented requests; the other starts from task-oriented interaction and eventually changes to casual chat after all requested information is provided. We contribute two efficient prompt models which can proactively generate a transition sentence to trigger system-initiated transitions in a unified dialogue model. One is a discrete prompt model trained with two discrete tokens, the other one is a continuous prompt model using continuous prompt embeddings automatically generated by a classifier. We furthermore show that the continuous prompt model can also be used to guide the proactive transitions between particular domains in a multi-domain task-oriented setting.
The importance of data is increasing and so is the use of voice-interface-chatbots. While tasks that can be handled by chatbots are still relatively simple, analysing data can be a difficult task for non-experts. The aim of the research was to show how an interaction with a voice-interface-chatbot to explore data can work. In a preliminary study we investigated what questions are asked by people when looking at tabular data. We then conducted a Wizard of Oz study with a voice-interface-chatbot that can answer questions about the National Council elections with text or graphs. In 15 interviews 1235 messages were exchanged, of which 159 were categorized as information requests. Our study shows that conversational interfaces should have the ability to include past conversation history in the evaluation of questions in order to encourage user engagement. We present a typology of questions and provide in-depth insights to inform the development of chatbots for human data interaction.
Building effective voice interfaces for the instruction of service robots in specialised environments is difficult due to the local knowledge of workers, such as specific terminology for objects and space, leading to limited data to train language models (known as ‘low-resource’ domains) and challenges in language grounding. We present a language grounding study in which we a) elicit spoken natural language of context experts in situ through a Wizard of Oz study and compile a dataset, b) qualitatively examine linguistic properties of the resulting instructions to reveal referential categories and parameters employed to construct instructions in context. We discuss how our language grounding protocol may be applied to bootstrap a language model in its targeted use context. Our work contributes a linguistic understanding of robot instructions that can be applied by designers and researchers to develop spoken language understanding for human-robot interactions in specialised, low-resource environments.
Users of task-oriented dialogue systems are often limited to ‘in-schema queries’, i.e., questions constrained by a predefined database structure. Providing access to additional semi- or unstructured knowledge could enable users to enter a wider range of queries answerable by the system. To this end, we have integrated a Question-Answering (QA)-module in an interactive restaurant search system and evaluated its impact using a crowd-sourced user evaluation. The QA-module includes knowledge selection and response generation components, both driven by fine-tuned GPT-2 language models, and a method to prevent responses unrelated to a user question (‘off-topic responses’). The results show that systems with QA-module are significantly preferred over the baseline without QA-module. Moreover, while the off-topic response prevention method was correctly triggered in 98.1% of questions not covered in the knowledge base, users showed more preference to the system that can retrieve information irrespective of whether it is relevant or not.
Conversational agents (CAs) have become ubiquitous in our daily lives. Recognizing the potential of CAs being persuasive agents, we are interested in leveraging CAs to reduce compulsive smartphone use, a widespread behavior among young adults that can lead to negative consequences. This work presents the design and development of StayFocused, a mobile app incorporating a chatbot to assist people in setting focus goals and reflecting on their phone-checking behaviors in situ. Particularly, we highlight the iterative process of curating prompts for GPT-3, and the lessons learned from our trials and errors. With StayFocused, we propose a three-week between-subjects study with college students. We envision the design of StayFocused and the proposed study will deepen our understanding of how CAs support immediate actions as well as sustained behavior change, and inform the design of persuasive technologies for reducing unintended behaviors such as compulsive smartphone use.
A small number of role-playing games can be played on speech agents like Amazon Alexa and Google Assistant. These offer a new, accessible way for people to access a game genre that is most commonly played on video game consoles or PCs. A sample of 8 participants were asked to play 15-20 minutes of The Bard’s Tale: Warlocks of Largefearn using Amazon Alexa after which they took part in a semi-structured interview. The interviews and play sessions were transcribed and analysed using inductive thematic analysis. Six themes were generated: Technical Issues, Frustration, Immersion, Memory and Cognitive Load, Simplicity vs. Detail, and Speech Agents as a Medium for Games. Participants found the experience interesting and novel but were frustrated by a number of user experience issues, and had notable strain placed on their memory and cognitive load among other problems. A list of 10 design guidelines for speech agent-based role-playing games was created.
Evaluating and understanding the inappropriateness of chatbot behaviors can be challenging, particularly for chatbot designers without technical backgrounds. To democratize the debugging process of chatbot misbehaviors for non-technical designers, we propose a framework that leverages dialogue act (DA) modeling to automate the evaluation and explanation of chatbot response inappropriateness. The framework first produces characterizations of context-aware DAs based on discourse analysis theory and real-world human-chatbot transcripts. It then automatically extracts features to identify the appropriateness level of a response and can explain the causes of the inappropriate response by examining the DA mismatch between the response and its conversational context. Using interview chatbots as a testbed, our framework achieves comparable classification accuracy with higher explainability and fewer computational resources than the deep learning baseline, making it the first step in utilizing DAs for chatbot response appropriateness evaluation and explanation.
Current spoken conversational user interfaces (CUIs) are predominantly implemented using a sequential, utterance based, two-party, speak-wait/speak-wait approach. Human-human conversation 1) is not sequential, with overlap, interruption and back channels; 2) processes utterances before they are complete and 3) are often multi-party. As part of Honda Research Institute’s Haru project a light weight word spotting speech recognition system - A conversational listener - was implemented to allow very fast turn-taking in simple voice interaction conditions. In this paper, we present a pilot evaluation of the conversational listener in a script follower context (which allows a robot to act out a dialog with a user). We compare a disembodied version of the system with expressive synthesis to Alexa with and without fast turn-taking. Qualitative results indicate that users were sensitive to turn-taking delay and characterful speech synthesis.
Platforms such as Google DialogFlow and Amazon Lex have enabled easier development of conversational agents. The standard approach to training these agents involve collecting and annotating in-domain data in the form of labelled utterances. However, obtaining in-domain data for training machine learning models remains a bottleneck. Schema-based dialogue, which involves laying out a structured representation of the flow of a “typical” dialogue, and prompt-based methods, which involve writing instructions in natural language to large language models such as GPT-3, are promising ways to tackle this problem. However, usability issues when translating these methods into practice are less explored. Our study takes a first step towards addressing this gap by having 23 students who had finished a graduate-level course on spoken dialogue systems report their experiences as they defined structured schemas and composed instruction-based prompts for two task-oriented dialogue scenarios. Through inductive coding and subsequent thematic analysis of the survey data, we explored users’ authoring experiences with schema and prompt-based methods. The findings provide insights for future data collection and authoring tool design for dialogue systems.
In our busy lives, meals are being squeezed into shorter times, or fit into multitasking routines of watching videos and scrolling through social media. This can lead to eating quickly without considering the food choices and overeating. Mindful eating has been promoted as an effective method to combat mindless eating with various tools available to facilitate mindfulness practices. However, the use of voice assistants to facilitate mindful eating is rather limited. We developed a voice assistant to facilitate mindful eating activities and conducted a field study with four participants over six days. We examined how the social role of the voice assistant, acting as a friend or a counselor, affects mindful eating experiences. The results suggest that voice assistants can assist users in mindful eating, however, participants preferred the friend version of the voice assistant. Implications for the design of voice assistants for mindfulness activities are provided.
Conversational agents (CAs) that deliver proactive interventions can benefit users by reducing their cognitive workload and improving performance. However, little is known regarding how such interventions would impact perception of CA’s appropriateness in voice-only, decision-making tasks. We conducted a within-subjects experiment (N=30) to evaluate the effect of CA’s feedback delivery strategy at three levels (no feedback, unsolicited, and solicited feedback) in an interactive food ordering scenario. We discovered that unsolicited feedback was perceived to be more appropriate than solicited feedback. Our results provide preliminary insights regarding the impact of proactive feedback on CA perception in decision-making tasks.
As agile manufacturing expands and workforce mobility increases, the importance of efficient knowledge transfer among factory workers grows. Cognitive Assistants (CAs) with Large Language Models (LLMs), like GPT-3.5, can bridge knowledge gaps and improve worker performance in manufacturing settings. This study investigates the opportunities, risks, and user acceptance of LLM-powered CAs in two factory contexts: textile and detergent production. Several opportunities and risks are identified through a literature review, proof-of-concept implementation, and focus group sessions. Factory representatives raise concerns regarding data security, privacy, and the reliability of LLMs in high-stake environments. By following design guidelines regarding persistent memory, real-time data integration, security, privacy, and ethical concerns, LLM-powered CAs can become valuable assets in manufacturing settings and other industries.
User Experience in Human-Computer Interaction is composed of a multitude of building blocks, one of which is how Voice Assistants (VAs) talk to their users. Linguistic considerations around syntax, grammar, and lexis have proven to influence users’ perception of VAs. Users have nuanced preferences regarding how they want their VAs to talk to them. Previous studies have found these preferences to differ between domains, but an exhaustive and methodical overview is still outstanding. By means of an A/B study spanning over domains as well as dialog types, this paper methodically closes this gap and explores the degree of domain-sensitivity across different types of dialogs in German. The results paint a mixed picture regarding the importance of domain-sensitivity. While some degree of domain-sensitivity was found for in-car prompts, it generally seems to play a rather minor role in users’ experience of VAs in the vehicle.
This study examined business communication practices with chatbots among various Small and Medium Enterprise (SME) stakeholders in Singapore, including business owners/employees, customers, and developers. Through qualitative interviews and chatbot transcript analysis, we investigated two research questions: (1) How do the expectations of SME stakeholders compare to the conversational design of SME chatbots? and (2) What are the business reasons for SMEs to add human-like features to their chatbots? Our findings revealed that functionality is more crucial than anthropomorphic characteristics, such as personality and name. Stakeholders preferred chatbots that explicitly identified themselves as machines to set appropriate expectations. Customers prioritized efficiency, favoring fixed responses over free text input. Future research should consider the evolving expectations of consumers, business owners, and developers as chatbot technology advances and becomes more widely adopted.
Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI’s ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as ‘open source’, many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.
While voice-controllable Intelligent Personal Assistants (IPAs) have become widespread in recent years, they remain primarily reactive with rather constrained calendaring capabilities. Anticipating more adaptive and complex assistants in the future, we organised a multidisciplinary expert discussion investigating potential use cases, interaction principles, and user modelling challenges within proactive IPAs for time management. This paper presents the identified themes and deliberations on enticing self-reflection, longitudinal task assistance, interaction modality, dialogue design, perception of the system, usage willingness, onboarding, and explainability. These findings outline a framework of advanced IPAs for time management.
When talking about products, people often express their needs in vague terms with vocabulary that does not necessarily overlap with product descriptions written by retailers. This poses a problem for chatbots in online shops, as the vagueness and vocabulary mismatch can lead to misunderstandings. In human-human communication, people intuitively build a common understanding throughout a conversation, e.g., via feedback loops. To inform the design of conversational product search systems, we investigated the effect of different feedback behaviors on users’ perception of a chatbot’s competence and conversational engagement. Our results show that rephrasing the user’s input to express what was understood increases conversational engagement and gives the impression of a competent chatbot. Using a generic feedback acknowledgment (e.g., “right” or “okay”), however, does not increase engagement or perceived competence. Auto-feedback for conversational product search systems therefore needs to be designed with care.
In this study, we have developed a voicebot which asks users questions about their daily activities and social participation to gain insights into their happiness and well-being. We hypothesize that showing disclosure when asking questions can elicit reciprocity of self-disclosure by the users. We define two types of disclosure: self-disclosure and other-disclosure. Self-disclosure is sharing thoughts, feelings and information about oneself, whereas other-disclosure is sharing information about others and opinions of others. We analyzed 122 answers to the voicebot’s disclosure and control questions by annotating the number of self-disclosure statements in the answers. We found no significant effect of asking disclosure questions on the number of self-disclosure statements. However, we did find a positive effect of asking disclosure questions on common markers of reciprocity such as the number of words, topic phrases, and first-person pronouns. Replication of this study with more participants would strengthen the validity of the findings.
Voice interaction is an increasingly popular technology, allowing users to control devices and applications without the need for physical interaction or ocular attention. Augmented voice playback control features, such as audio icons, have the potential to significantly improve voice navigation for instructional videos. This study evaluates audio icons for improving how-to video navigation in a Wizard-of-Oz-controlled setup with 24 participants assembling a wooden robot using a voice-controlled laptop. Results showed that audio icons helped participants complete the task faster, with fewer voice commands, and higher satisfaction. However, some usability challenges were observed. Significant differences in perceived usability were found between audio icons placed with visual points-of-action and the baseline, but not between the baseline and audio icons at 30-second intervals. These findings provide valuable insights for VUI system researchers and designers to advance the use of audio icons for improving voice interface navigation.
Much has been written about privacy in the context of conversational and voice assistants. Yet, there have been remarkably few developments in terms of the actual privacy offered by these devices. But how much of this is due to the technical and design limitations of speech as an interaction modality? In this paper, we set out to reframe the discussion on why commercial conversational assistants do not offer meaningful privacy and transparency by demonstrating how they could. By instrumenting the open-source voice assistant Mycroft to capture audit trails for data access, we demonstrate how such functionality could be integrated into big players in the sector like Alexa and Google Assistant. We show that this problem can be solved with existing technology and open standards and is thus fundamentally a business decision rather than a technical limitation.
This paper investigates the potential for spreading misinformation via third-party voice applications in voice assistant ecosystems such as Amazon Alexa and Google Assistant. Our work fills a gap in prior work on privacy issues associated with third-party voice applications, looking at security issues related to outputs from such applications rather than compromises to privacy from user inputs. We define misinformation in the context of third-party voice applications and implement an infrastructure for testing third-party voice applications using automated natural language interaction. Using our infrastructure, we identify — for the first time — several instances of misinformation in third-party voice applications currently available on the Google Assistant and Amazon Alexa platforms. We then discuss the implications of our work for developing measures to pre-empt the threat of misinformation and other types of harmful content in third-party voice assistants becoming more significant in the future.
Early literacy acquisition has been shown to be life-changing. In resource-scarce multilingual, multicultural societies in which appropriately skilled human resources are scarce and the acquisition of early literacy is a huge challenge, the innovative and creative use of technology offers new ways of tackling this strategic educational challenge. Speech technology is particularly suited to this problem, since children can speak long before they can read and write. In this work we present the Ngiyaqonda! project, in which a multimodal learning environment is being developed to facilitate both literacy development and language learning, with the ultimate aim of solving a unique and pressing challenge facing South African foundation phase learners.
This paper introduces Tilbot, an open source, easy to use platform for the design of conversational user interfaces. Although many platforms for the creation of conversational agents already exist, they often are closed source and focus mainly on commercial purposes. Tilbot is created as an open source solution that can be hosted on-premises. By focusing primarily on students and researchers, Tilbot stimulates open science research, collaborations, and building a scientific community to create and share best practices and experiences. At the CUI conference, visitors will be able to try out Tilbot, and engage with its original creators to highlight their wishes and brainstorm directions for future development.
We present HyLECA, an open-source framework designed for the development of long-term engaging controlled conversational agents. HyLECA’s dialogue manager employs a hybrid architecture, combining rule-based methods for controlled dialogue flows with retrieval-based and generation-based approaches to enhance the utterance variability and flexibility. The motivation behind HyLECA lies in enhancing user engagement and enjoyment in task-oriented chatbots by leveraging the natural language generation capabilities of open-domain large language models within the confines of predetermined dialogue flows. Moreover, we discuss the technical capabilities, potential applications, relevance, and adaptability of the system. Lastly, we report preliminary findings from integrating state-of-the-art large language models in simulating a conversation centred on smoking cessation.
We present a modular dialogue experiments and demonstration toolkit (MoDEsT) that assists researchers in planning tailored conversational AI-related studies. The platform can: 1) assist users in picking multiple templates based on specific task needs; 2) allow users to create their experiment/demo interface by selecting different components; and 3) support playback from dialogue history (i.e. logs). MoDEsT is an open-source platform, hosted on a university-managed publicly accessible server.
In order to carry out human-robot collaborative tasks efficiently, robots have to be able to communicate with their human counterparts. In many applications, speech interfaces are deployed as a way to empower robots with the ability to communicate. Despite the progress made in speech recognition and (multi-modal) dialogue systems, such interfaces continue to be brittle in a number of ways and the experience of the failure of such interfaces is commonplace amongst roboticists. Surprisingly, a rigorous and complete analysis of communicative failures is still missing, and the technical literature is positively skewed towards the success and good performance of speech interfaces. In order to address this blind spot and investigate failures in conversations between humans and robots, an interdisciplinary effort is necessary. This workshop aims to raise awareness of said blind spot and provide a platform for discussing communicative troubles and failures in human-robot interactions and potentially related failures in non-robotic speech interfaces. We aim to bring together researchers studying communication in different fields, to start a scrupulous investigation into communicative failures, to begin working on a taxonomy of such failures, and enable a preliminary discussion on possible mitigating strategies. This workshop intends to be a venue where participants can freely discuss the failures they have encountered, to positively and constructively learn from them.
We are concurrently witnessing two significant shifts: voice and chat-based conversational user interfaces (CUIs) are becoming ubiquitous, and older people are becoming a very large demographic group. However, despite the recent increase in research activity within fields such as CUI, older adults continue to be underrepresented as CUI users both in research and in the design of commercial products. Therefore, the overarching aim of this workshop is to increase the momentum for research that centers on older adults as CUI users. For this, we plan to create an interdisciplinary space that brings together researchers, designers, practitioners, and users, to discuss and share challenges, principles, and strategies for designing CUIs for the ageing population. We thus welcome contributions of empirical studies, theories, design, and evaluation of CUIs for older adults. Through this, we aim to grow the community of CUI researchers across disciplinary boundaries (human-computer interaction, voice and language technologies, geronto-technologies, information studies, etc.) that are engaged in the shared goal of ensuring that older adults are not marginalized or excluded from the design of CUIs.
Conversational User Interfaces (CUIs) are becoming increasingly applied in a broad range of sensitive settings to address the needs and struggles of vulnerable or marginalized users. Sensitive settings include, for instance, CUIs mediating the communication difficulties of people with dementia or supporting refugees to cope with new cultural practices as a chatbot on a government website. While researchers are increasingly designing CUIs for such sensitive settings, methods and participatory design approaches to address vulnerable user groups’ highly sensitive needs and struggles are sparse in research thus far. This workshop aims to explore how we can design CUIs for and in sensitive settings with vulnerable users in mind through the participatory design process. We aim to establish a working definition of vulnerability, sensitive settings, and how practice-oriented design of CUIs can be inclusive of diverse users.
As CUIs become more prevalent in both academic research and the commercial market, it becomes more essential to design usable and adoptable CUIs. While research has been growing on the methods for designing CUIs for commercial use, there has been little discussion on overall community practice of developing design resources to aid in practical CUI design. The aim of this workshop therefore is to bring the CUI community together to discuss the current practices for developing tools and resources for practical CUI design, the adoption (or non-adoption) of these tools and resources, and how these resources are utilized in the training and education of new CUI designers entering the field. This workshop will bring together all parts of the CUI community to have meaningful discussions on how CUI design resources are currently developed, and how we can improve these resources and tools to aid in their adoption in practical CUI design, and CUI academic & industry design training.
The increasing prevalence of communicative agents raises questions about human-agent communication and the impact of such interaction on people’s behavior in society and human-human communication. This workshop aims to address three of those questions: (i) How can we identify malicious design strategies – known as dark patterns – in social agents?; (ii) What is the necessity for and the effects of present and future design features, across different modalities and social contexts, in social agents?; (iii) How can we incorporate the findings of the first two questions into the design of social agents? This workshop seeks to conjoin ongoing discourses of the CUI and wider HCI communities, including recent trends focusing on ethical designs. Out of the collaborative discussion, the workshop will produce a document distilling possible research lines and topics encouraging future collaborations.