Conference Keynotes

Simon King: What is AI speech generation currently capable of?

There are a bewildering number of papers being published about synthetic speech generation. Every new paper claims a major advance using yet another new model architecture. I will try to make sense of the landscape, concentrating on what the latest models are capable of. These models are large, computationally expensive, and are trained on tens of thousands of hours of speech. Fortunately, many are also getting more capable: a single pre-trained model can do several tasks, such as text-to-speech and voice conversion. This might liberate us. Instead of tedious searches for the perfect model, we can instead concentrate on properly defining the problems we want to solve - What exactly is "conversational speech"? What would good cross-lingual prosody transfer sound like? Why do listeners sometimes prefer synthetic speech over natural speech? Does that mean we could generate synthetic speech that users trust more than natural speech?

Prof. Simon King is Professor of Speech Processing at the University of Edinburgh, where he is director of the Centre for Speech Technology Research and of a long-running Masters programme in Speech and Language Processing. He also works for Papercup, an AI dubbing company that uses synthetic speech to make the world's videos watchable in any language. Simon has research interests in speech synthesis, speaker verification spoofing, and signal processing, with over 250 publications. He is a Fellow of IEEE and of ISCA, has led the annual Blizzard Challenge speech synthesis evaluations since 2007, and was Technical Chair of Interspeech 2023.

Heloisa Candello: Responsible Conversational User Interfaces

Responsible AI has been a popular topic in academic and industry settings with the advent of conversational AI based on generative models in the last two years. Despite the growing scientific research in this field, what should be considered when designing for responsibility in conversational AI interactions is still being investigated. In this talk, I will revisit some of the CUI work our community has been publishing in the last six years and my projects in education and finance to unveil the main considered definitions, criteria, and methodologies for designing responsible AI systems. We will discuss responsible AI in public and private settings and human values identified as essential to design responsible CUIs. Our research projects have investigated how to foster trust, accountability, transparency, fairness, and acceptance of conversational user interfaces deployed in natural settings with diverse audiences. Furthermore, we will discuss how bias emerged as a criterion when interacting with CUIs in museum settings, how we investigated accountability and trust embedded into financial advisors’ chatbots, and our recent work into elucidating values such as creditworthiness with micro business women in underrepresented communities using conversational systems. This talk can serve as the basis for further discussions during the conference on promoting social impact and mitigating harm when designing conversational generative systems with responsibility.

Dr. Heloisa Candello is a Senior Research Scientist at the Responsible Tech group at IBM Research – Brazil, where she works with a team of talented researchers, software engineers and designers on innovative AI solutions. She has been with IBM for over 10 years, applying her expertise in user research and user experience design to create engaging and ethical AI interactive systems, especially conversational interfaces.

With a PhD in Human-Computer Interaction, Dr. Candello is passionate about exploring the design issues and opportunities involved in human-AI collaboration. She has published multiple papers in prestigious conferences and journals and received an honorable mention award at CHI 2019. Heloisa is currently serving as ACM Distinguished Speaker of ACM Distinguished Speaker Program, and an active volunteer and contributor to the ACM SIGCHI community, where she served as a member of the Volunteer Development Committee for two years and now is a co-chair of the LATAM committee, and CUI Steering committee. Her goal is to advance the field of HCI and AI, and to empower people with the support of responsible AI technologies.

Jonathan Gratch: The Social Function of Machine Emotional Expressions

In this talk I will discuss the use of synthetic expressions for shaping human-agent interactions.

Jonathan Gratch is a Research Full Professor of Computer Science and Psychology at the University of Southern California (USC) and Director for Virtual Human Research at USC’s Institute for Creative Technologies. He completed his Ph.D. in Computer Science at the University of Illinois in Urbana-Champaign in 1995. Dr. Gratch’s research focuses on computational models of human cognitive and social processes, especially emotion, and explores these models’ potential to advance psychological theory and shape human-machine interaction. He is the founding Editor-in-Chief (retired) of IEEE’s Transactions on Affective Computing, Associate Editor for Affective Science, Emotion Review, and former President of the Association for the Advancement of Affective Computing (AAAC). He is a Fellow of AAAI, AAAC, and the Cognitive Science Society.