Tsung-Hsien Wen (Shawn)

Dialogue System Group
Department of Engineering
University of Cambridge

Bio

Tsung-Hsien (Shawn) Wen is a co-founder and CTO of PolyAI, a London-based startup that builds super-human voice AI for contact centers. Under his lead, the company has attracted more than $30M in venture funding, and has been featured as the 100 most innovative AI companies in CB Insights 2021 and Gartner Cool Vendor 2020. PolyAI was also nominated as the Company of the Year 2019 by Cambridge Computer Lab Ring. Previous winners include Deepmind, Improbable, SwiftKey, and Raspberry Pi. PolyAI also builds the first AI agent in restaurants that can hold multi-turn natural conversations with Google Duplex.

Before PolyAI, Shawn worked as a research scientist at Google Brain and a conversational AI consultant at IPSoft Amelia. He holds a PhD from the Dialogue System Group, University of Cambridge, where he worked with Professor Steve Young. His research on conversational AI has attracted 3900+ citations and won several best paper awards including EMNLP 2015 and SigDial 2015. He had also hosted multiple industry seminars at research groups at Google, Apple, Xerox, and Baidu. Until today, Shawn still plays an important role in the international conversational AI community, including hosting the first two workshops of NLP for Conversational AI and serving as the program chair of the conversational AI track for both ACL and EMNLP.

Positions

Present 2017

Co-founder and CTO

PolyAI
2017 2017

Research Scientist

Google Brain
2017 2016

Research Consultant

IPSoft, Amelia team
2014 2013

Compulsory Second Lieutenant Chief Counselor

Taiwan Army
2013 2011

Teaching Assistant

National Taiwan University, Digital Speech Processing and Speech Special Project
2013 2011

Part-time Algorithm Developer

StorySense Computing, Inc, acquired by 电话帮 in 2014.

Education

Ph.D. 2017

Ph.D. student in Engineering

University of Cambridge
M.A.2013

Master of Science in Engneering

National Taiwan University
B.A.2011

Bachelor of Science in Engineering

National Taiwan University

Honors, Awards and Organisers

2020

General Chair of 2nd NLP for Converational AI workshop

https://sites.google.com/view/2ndnlp4convai/
2019

General Chair of 1st NLP for Converational AI workshop

https://sites.google.com/view/nlp4convai/
2015

Best Paper Award, EMNLP 2015

Earned 1 of 3 out of 312 accepted papers.
2015

Best Paper Award, SigDial 2015

Earned 1 of 3 out of around 100 accepted papers.
2015

Toshiba Research Studentship, Toshiba Research Europe Ltd

3-year studentship funded by Toshiba Research Europe Ltd, Cambridge Research Laboratory, for developing wide domain statistical dialogue systems.
2015

Government Scholarship for Stufying Overseas, MOE of Taiwan

1 of 16 selected EECS students based on outstanding academic achievements.
Aug 2013

InterSpeech 2013 Best Student Paper Nominee, ISCA

Earned 1 of 12 out of thousands of accepted papers.
Dec 2012

InterSpeech 2012 Best Student Paper Nominee, ISCA

Earned 1 of 10 out of thousands of accepted papers.
2010

Sir Zong Education Foundation Student Grant, Sir Zong Foundation

Scholoarship for outstanding college and high school students.

Talks & Patents

Industry Invited Talks

25 June 2019

AI.X, Seoul, South Korea

Title: "Conversational AI Platform for Customer Services" [slides]
4 May 2018

ODSC, Boston, USA

Title: "Democratise Conversational AI: Scaling Academic Research to Industrial Applications" [slides]
19 Jun 2017

Samsung R&D, Warsaw, Poland

Title: "Deep Learning for Natural Language Generation and End-to-End Dialogue Modeling" [slides]
08 Mar 2017

Apple Siri team, Cambridge, UK

Title: "Task-oriented Neural Dialogue Systems" [slides]
23 Jun 2016

Google Deep Dialogue team, Mountain View, CA, USA

Title: "A Network-based End-to-End Trainable Task-oriented Dialogue System" [slides]
23 Feb 2016

Xerox Research Centre Europe, Grenoble, France

Title: "Scalable Neural Language Generation for Spoken Dialogue Systems" [slides]
29 Jul 2015

Baidu NLP group seminar, Beijing, China.

Title: "Scalable Neural Language Generation for Open Domain Dialogue Systems" [slides]

Academia Invited Talks

05 Jan 2017

National Taiwan University, Taipei, Taiwan

Title: "Task-oriented Neural Dialogue Systems" [slides]
09 Nov 2016

Toyota Technological Institute at Chicago, Chicago, IL, USA

Title: "Task-oriented Neural Dialogue Systems" [slides]
06 Sep 2016

Tutorial @ INLG, Edinburgh, UK

Title: "Deep Learning for Natural Language Generation" [slides] [opensource]
24 May 2016

Heriot Watt University, Edinburgh, UK

Title: "Beyond Conditional LM: NN Language Generation for Dialogue Systems" [slides]
19 Nov 2015

University of Sheffield, United Kingdom

Title: "Neural Language Generation for Spoken Dialogue Systems" [slides]
11 Sep 2015

University of Cambridge, United Kingdom

Title: "Semantically Conditioned LSTM-based NLG for Spoken Dialogue Systems" [slides]
18 Aug 2015

Academic Sinica, Taipei, Taiwan

Title: "Scalable Neural Language Generation for Open Domain Dialogue Systems" [slides]

Patents

05 Jan 2021

United States Patent PN828822US: DIALOGUE SYSTEM , A DIALOGUE METHOD, A METHOD OF GENERATING ...

Natural Language Understanding (NLU) module and initial solution based on delexicalised slot-specific classifiers.[link]
24 Nov 2020

United States Patent PN830709US: DIALOGUE SYSTEM AND A DIALOGUE METHOD

Dialogue management module of modular task-oriented dialogue systems and initial solution to innovative hybrid data flow + control flow dialogue method.[link]
26 May 2020

United States Patent PN830711US: RESPONSE RETRIEVAL SYSTEM AND METHOD

A retrieval-based approach to dialogue based on (multi-modal) response selection.[link]

Teaching

25 Feb 2016

MPhil course for Spoken Dialogue Systems, University of Cambridge, UK

Title: "Statistical Natural Language Generation" [slides]

Publications

First author only, for a full list of publications, please see my Google Scholar page.

Filter by type:

Sort by year:

Latent Intention Dialogue Models

Tsung-Hsien Wen, Yishu Miao, Phil Blumson, Steve Young

Conference PapersIn Proceedings on ICML, Sydney, Australia, Auguest, 2017

Abstract

Developing a dialogue agent that is capable of making autonomous decisions and communicating by natural language is one of the long-term goals of machine learning research. Traditional approaches either rely on hand-crafting a small state-action set for applying reinforcement learning that is not scalable or constructing deterministic models for learning dialogue sentences that fail to capture natural conversational variability. In this paper, we propose a Latent Intention Dialogue Model (LIDM) that employs a discrete latent variable to learn underlying dialogue intentions in the framework of neural variational inference. In a goal-oriented dialogue scenario, these latent intentions can be interpreted as actions guiding the generation of machine responses, which can be further refined autonomously by reinforcement learning. The experimental evaluation of LIDM shows that the model out-performs published benchmarks for both corpus-based and human evaluation, demonstrating the effectiveness of discrete latent variable models for learning goal-oriented dialogues.

A Network-based End-to-End Trainable Task-oriented Dialogue System

Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. R.-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young

Conference PapersIn Proceedings on EACL, Valencia, Spain, April, 2017

Abstract

Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring labelled datasets and solving a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable dialogue system along with a new way of collecting task-oriented dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.

Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. R.-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, and Steve Young

Conference PapersIn Proceedings on EMNLP, Austin Texas, USA, November 2016

Abstract

Recently a variety of LSTM-based conditional language models (LM) have been applied across a range of language generation tasks. In this work we study various model architectures and different ways to represent and aggregate the source information in an end-to-end neural dialogue system framework. A method called snapshot learning is also proposed to facilitate learning from supervised sequential signals by applying a companion cross-entropy objective function to the conditioning vector. The experimental and analytical results demonstrate firstly that competition occurs between the conditioning vector and the LM, and the differing architectures provide different trade-offs between the two. Secondly, the discriminative power and transparency of the conditioning vector is key to providing both model interpretability and better performance. Thirdly, snapshot learning leads to consistent performance improvements independent of which architecture is used.

Multi-domain Neural Network Language Generation for Spoken Dialogue Systems

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. R.-Barahona, Pei-Hao Su, David Vandyke, and Steve Young

Conference PapersIn Proceedings on NAACL-HLT, San Diego, USA, June 2016

Abstract

Moving from limited-domain natural language generation (NLG) to open domain is difficult because the number of semantic input combinations grows exponentially with the number of domains. Therefore, it is important to leverage existing resources and exploit similarities between domains to facilitate domain adaptation. In this paper, we propose a procedure to train multi-domain, Recurrent Neural Network-based (RNN) language generators via multiple adaptation steps. In this procedure, a model is first trained on counterfeited data synthesised from an out-of-domain dataset, and then fine tuned on a small set of in-domain utterances with a discriminative objective function. Corpus-based evaluation results show that the proposed procedure can achieve competitive performance in terms of BLEU score and slot error rate while significantly reducing the data needed to train generators in new, unseen domains. In subjective testing, human judges confirm that the procedure greatly improves generator performance when only a small amount of data is available in the domain.

Toward Multi-domain Language Generation using Recurrent Neural Networks

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. R.-Barahona, Pei-Hao Su, David Vandyke, and Steve Young

Workshop PapersNIPS Workshop on ML for SLU and Interaction, Montreal, Canada, December 2015

Abstract

In this paper we study the performance and domain scalability of two different Neural Network architectures for Natural Language Generation in Spoken Dialogue Systems. We found that by imposing a sigmoid gate on the dialogue act vector, the Semantically Conditioned Long Short-term Memory generator can prevent semantic repetitions and achieve better performance across all domains compared to an RNN Encoder-Decoder generator. However, in a domain adaptation experiment, the RNN Encoder-Decoder generator, with a separate slot and value parameterisation, is capable of learning faster by leveraging out-of-domain data. We conclude that the way to represent and integrate the semantic elements is of great importance to NN-based NLG systems. Further advances will therefore require a representation that is more scalable across domains without significantly compromising in-domain performance.

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems [Best Paper Award]

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young

Conference PapersIn Proceedings on EMNLP, Lisbon, Portugal, September 2015

Abstract

Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. With fewer heuristics, an objective evaluation in two differing test domains showed the proposed method improved performance compared to previous methods. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems.

Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking [Best Paper Award]

Tsung-Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young

Conference PapersIn Proceedings on SigDial, Prague, Czech Public, September 2015

Abstract

The natural language generation (NLG) component of a spoken dialogue system (SDS) usu- ally needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual dia- logue systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolu- tional neural network structure which can be trained on dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evalu- ation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based systems.

Recurrent Neural Network Based Language Model Personalization by Social Network Crowdsourcing [Best Paper Shortlist]

Tsung-Hsien Wen, Aaron Heidel, Hung-yi Lee, Yu Tsao and Lin-Shan Lee

Conference PapersIn Proceedings on InterSpeech, Lyon, France, August 2013

Abstract

Speech recognition has become an important feature in smartphones in recent years. Different from traditional au- tomatic speech recognition, the speech recognition on smartphones can take advantage of personalized language models to model the linguistic patterns and wording habits of a particular smartphone owner better. Owing to the popularity of social networks in recent years, personal texts and messages are no longer inaccessible. However, data sparseness is still an unsolved problem. In this paper, we propose a three-step adaptation approach to personalize recurrent neural network language models (RNNLMs). We believe that its capability to model word histories as distributed representations of arbitrary length can help mitigate the data sparseness problem. Furthermore, we also propose additional user-oriented features to empower the RNNLMs with stronger capabilities for personalization. The experiments on a Facebook dataset showed that the proposed method not only drastically reduced the model perplexity in preliminary experiments, but also moderately reduced the word error rate in n-best rescoring tests.

Interactive Spoken Content Retrieval by Extended Query Model and Continuous State Space Markov Decision Process

Tsung-Hsien Wen, Hung-yi Lee, Pei-hao Su, and Lin-Shan Lee

Conference PapersIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 2013

Abstract

Interactive retrieval is important for spoken content because the retrieved spoken items are not only difficult to be shown on the screen but also scanned and selected by the user, in addition to the speech recognition uncertainty. The user cannot playback and go through all the retrieved items to find out what he is looking for. Markov Decision Process (MDP) was used in a previous work to help the system take different actions to interact with the user based on an estimated retrieval performance, but the MDP state was represented by the less precise quantized retrieval performance metric. In this paper, we consider the retrieval performance metric as a continuous state variable in MDP and optimize the MDP by fitted value iteration (FVI). We also use query expansion with the language modeling retrieval framework to produce the next set of retrieval results. Improved performance was found in the preliminary experiments.

Personalized Language Modeling by Crowd Sourcing with Social Network Data for Voice Access of Cloud Applications

Tsung-Hsien Wen, Hung-yi Lee, Tai-Yuan Chen, and Lin-Shan Lee

Conference PapersIEEE Workshop on Spoken Language Technology (SLT), Miami, Florida, December 2012

Abstract

Voice access of cloud applications via smartphones is very attractive today, specifically because a smartphones is used by a single user, so personalized acoustic/language models become feasible. In particular, huge quantities of texts are available within the social networks over the Internet with known authors and given relationships, it is possible to train personalized language models because it is reasonable to assume users with those relationships may share some common subject topics, wording habits and linguistic patterns. In this paper, we propose an adaptation framework for building a robust personalized language model by incorporating the texts the target user and other users had posted on the social networks over the Internet to take care of the linguistic mismatch across different users. Experiments on Facebook dataset showed encouraging improvements in terms of both model perplexity and recognition accuracy with proposed approaches considering relationships among users, similarity based on latent topics, and random walk over a user graph.

Interactive Spoken Content Retrieval with Different Types of Action Optimized by a Markov Decision Process [Best Paper Shortlist]

Tsung-Hsien Wen, Hung-yi Lee, and Lin-Shan Lee

Conference PapersIn Proceedings on InterSpeech, Portland OR, USA, September 2012

Abstract

Interaction with user is specially important for spoken content retrieval, not only because of the recognition uncertainty, but because the retrieved spoken content items are difficult to be shown on the screen and difficult to be scanned and selected by the user. The user cannot playback and go through all the retrieved items and then find out they are not what he is looking for. In this paper, we propose a new approach for interactive spoken content retrieval, in which the system can estimate the quality of the retrieved results, and take different types of actions to clarify the user’s intention based on an intrinsic policy. The policy is optimized by a Markov Decision Process (MDP) trained with Reinforcement Learning based on a set of pre-defined rewards considering the extra burden given to the user.

Voice Access of Cloud Applications : Language Model Personalization and Interactive Spoken Content Retrieval

Tsung-Hsien Wen

Thesis

Abstract

This thesis considers voice access of cloud applications with two parts: (1) Personalized Language Model and (2) Interactive spoken document retrieval. Model mismatch has been a major problem in speech recognition. With hand-held devices widely used today, personalized models become possible. A huge quantities of posts and comments with known owners emerged on social network websites, personal corpora become practically available but with data sparseness problem unsolved. In the first part of this thesis, we proposed personalized language modeling approaches by estimating the language similarities between different social network users and integrating the corresponding personal corpora accordingly. We studied both N-gram language models as well as recurrent neural network language models, and the experimental results support the concept. In the second part of this thesis, we studied interactive spoken document retrieval. Interactive retrieval is helpful to spoken content retrieval because retrieved spoken items are difficult to be shown on screen and browsed by the user, in addition to the speech recognition uncertainty. We model the interaction process by a Markov Decision Process and train the policy with Reinforcement Learning. Experimental results demonstrate the retrieval performance can be improved with the interactions.

Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking

Tsung-Hsien Wen, Milica Gasic, Dongho Kim, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young

Technical ReportUniversity of Cambridge Engineering Department

Abstract

Contact Me

shawnwen@poly-ai.com
tsung-hsien.shawn.wen
Tsung-Hsien Wen

Tsung-Hsien Wen

Tsung-Hsien Wen (Shawn)

Bio

Positions

Co-founder and CTO

Research Scientist

Research Consultant

Compulsory Second Lieutenant Chief Counselor

Teaching Assistant

Part-time Algorithm Developer

Education

Honors, Awards and Organisers

Talks & Patents

Industry Invited Talks

AI.X, Seoul, South Korea

ODSC, Boston, USA

Samsung R&D, Warsaw, Poland

Apple Siri team, Cambridge, UK

Google Deep Dialogue team, Mountain View, CA, USA

Xerox Research Centre Europe, Grenoble, France

Baidu NLP group seminar, Beijing, China.

Academia Invited Talks

National Taiwan University, Taipei, Taiwan

Toyota Technological Institute at Chicago, Chicago, IL, USA

Tutorial @ INLG, Edinburgh, UK

Heriot Watt University, Edinburgh, UK

University of Sheffield, United Kingdom

University of Cambridge, United Kingdom

Academic Sinica, Taipei, Taiwan

Patents

United States Patent PN828822US: DIALOGUE SYSTEM , A DIALOGUE METHOD, A METHOD OF GENERATING ...

United States Patent PN830709US: DIALOGUE SYSTEM AND A DIALOGUE METHOD

United States Patent PN830711US: RESPONSE RETRIEVAL SYSTEM AND METHOD

Teaching

MPhil course for Spoken Dialogue Systems, University of Cambridge, UK

Publications

Filter by type:

Latent Intention Dialogue Models

Abstract

A Network-based End-to-End Trainable Task-oriented Dialogue System

Abstract

Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Abstract

Multi-domain Neural Network Language Generation for Spoken Dialogue Systems

Abstract

Toward Multi-domain Language Generation using Recurrent Neural Networks

Abstract

Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems [Best Paper Award]

Abstract

Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking [Best Paper Award]

Abstract

Recurrent Neural Network Based Language Model Personalization by Social Network Crowdsourcing [Best Paper Shortlist]

Abstract

Interactive Spoken Content Retrieval by Extended Query Model and Continuous State Space Markov Decision Process

Abstract

Personalized Language Modeling by Crowd Sourcing with Social Network Data for Voice Access of Cloud Applications

Abstract

Interactive Spoken Content Retrieval with Different Types of Action Optimized by a Markov Decision Process [Best Paper Shortlist]

Abstract

Voice Access of Cloud Applications : Language Model Personalization and Interactive Spoken Content Retrieval

Abstract

Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking

Abstract

Contact Me