Datasets

Note: these datasets are strictly limited for academic use.

This emotional conversation dataset is designed for generating emotional responses. There are about 1,110,000 post-response pairs collected from Weibo. Each post/response is tagged with an emotion category (happy, sad, like, disgust, angry, others) by an emotion classifier. This data are used in Emotional Conversation Generation Challenge in 2017 and Short Text Conversation Challenge at NTCIR in 2018.

More details can be found in the above challenge pages and in this paper:
Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, Bing Liu.
Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory. AAAI 2018, New Orleans, Louisiana, USA.

Please kindly cite our paper if this paper and the dataset are helpful.

[dataset download] [bib]

Commonsense Conversation Dataset contains one-turn post-response pairs with the corresponding commonsense knowledge graphs. Each pair is associated with some knowledge graphs retrieved from ConceptNet. We have applied some filtering rules to retain high-quality and useful knowledge graphs.

Refer to our paper for more details:
Hao Zhou, Tom Yang, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu.
Commonsense Knowledge Aware Conversation Generation with Graph Attention. IJCAI-ECAI 2018, Stockholm, Sweden.

Please kindly cite our paper if this paper and the dataset are helpful.

[dataset download] [bib]

Sentence function is an important linguistic feature and a typical taxonomy in terms of the purpose of the speaker. There are four major function types in the language including interrogative, declarative, imperative, and exclamatory. The dataset is designed to generating responses that are consistent to a particular sentence function.

The posts and responses, collected from Weibo, have been tokenized. Each pair has been annotated with a sentence function label by a classifier, indicating which function type the responses belong to. 

Refer to our paper for more details:
Pei Ke, Jian Guan, Minlie Huang, Xiaoyan Zhu.
Generating Informative Responses with Controlled Sentence Function. In ACL 2018.

Please kindly cite our paper if this paper and the dataset are helpful.

[dataset download] [bib]

The dataset contains post-response pairs collected from Weibo. Each response is a question, detected with manually-crafted templates. We filtered some universal questions that contains no nouns or verbs. The posts and responses have been tokenized. The dataset also contains PMI values between a topic word in post and another in response.

Refer to our paper for more details:
Yansen Wang, Chenyi Liu, Minlie Huang, Liqiang Nie.
Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders. ACL 2018, Melbourne, Australia.

Please kindly cite our paper if this paper and the dataset are helpful.

[dataset download] [bib]

This dataset was created for explicit personalization in open-domain dialogue systems, that is, assigning personality/profile to a chatbot. This allows system developers to control the profile of a chatbot explicitly and specifically. The dataset consists of several subsets, and more details can be found in the following paper:


Qiao Qian, Minlie Huang, Haizhou Zhao, Jingfang Xu, Xiaoyan Zhu.
Assigning personality/identity to a chatting machine for coherent conversation generation. IJCAI-ECAI 2018, Stockholm, Sweden.

Please kindly cite our paper if this paper and the dataset are helpful.

[dataset download] [bib]


This dataset was created for multi-relation question answering over knowledge bases. The dataset contains question-answer pairs with the corresponding answer paths. The paths collected from two subsets of Freebase and the questions are generated by templates. Each question mentions more than one relation, requiring reasoning over multiple facts in KB.

The dataset consists of several subsets, and more details can be found in the following paper:

Mantong Zhou, Minlie Huang and Xiaoyan Zhu.

An Interpretable Reasoning Network for Multi-Relation Question Answering. COLING 2018, Santa Fe, New Mexico, USA.

Please kindly cite our paper if this paper and the dataset are helpful.

[dataset download] [bib]