Participants will submit dialogue response generation systems. We do not provide a training dataset, and participants may use any datasets which they want to use. We provide [a baseline model](https://github.com/Silin159/PersonaChat-BART-PeaCoK), which can be tested on the [ConvAI2 PERSONA-CHAT](https://arxiv.org/abs/1902.00098v1) dataset, so that you can see what the problem of this task is. We will evaluate submitted systems on the persona-grounded dialogue dataset. The dialogues in the evaluation dataset have persona sentences similar to the PersonaChat dataset, but the number of persona sentences for a person is more than five sentences. The major part of the persona is derived from the [PeaCoK](https://github.com/Silin159/PeaCoK) knowledge graph.
Participants will submit dialogue response generation systems. We provide [a baseline model](https://github.com/Silin159/PeaCoK-PersonaChat) trained on [ConvAI2 PERSONA-CHAT](https://arxiv.org/abs/1902.00098v1) dataset with [PeaCoK](https://github.com/Silin159/PeaCoK) persona knowledge augmentation. Our trained baseline model checkpoint could be downloaded from [this repository](https://github.com/Silin159/PersonaChat-BART-PeaCoK).
Participants may use any datasets for training their models, not limited to our provided [training datasets](https://drive.google.com/drive/folders/1A51hZvSLvJoPAKDy2XR_eb-ooZqPRgbb?usp=sharing) used for developing the baseline model. Our provided training data include:
* Original PERSONA-CHAT (with either original or revised PERSONA-CHAT profiles):
* Training set (original profiles): `data/persona_peacok/train_persona_original_chat_convai2.json`
* Validation set (original profiles): `data/persona_peacok/valid_persona_original_chat_convai2.json`
* Training set (revised profiles): `data/persona_peacok/train_persona_revised_chat_convai2.json`
* Validation set (revised profiles): `data/persona_peacok/valid_persona_revised_chat_convai2.json`
* PERSONA-CHAT with profiles augmented with PeaCoK facts (up to 5 randomly chosen to augment each profile):
* Training set (augmented original profiles): `data/persona_peacok/train_persona_original_chat_ext.json`
* Validation set (augmented original profiles): `data/persona_peacok/valid_persona_original_chat_ext.json`
* Training set (augmented revised profiles): `data/persona_peacok/train_persona_revised_chat_ext.json`
* Validation set (augmented revised profiles): `data/persona_peacok/valid_persona_revised_chat_ext.json`
* Full set of PeaCoK facts linked to each PERSONA-CHAT profile:
* For original profiles: `data/persona_peacok/persona_extend_full_original.json`
* For revised profiles: `data/persona_peacok/persona_extend_full_revised.json`
* Full PeaCoK knowledge graph:
*`data/peacok_kg.json`
We will evaluate submitted systems on an internal persona-grounded dialogue dataset. The dialogues in our evaluation dataset have persona sentences similar to the PERSONA-CHAT dataset, but the number of persona sentences for a person is more than five sentences. The major part of the persona is derived from the [PeaCoK](https://github.com/Silin159/PeaCoK) knowledge graph.
We also provide a list of other resources that may be related to this task:
Partner Personas Generation for Diverse Dialogue Generation (PPG): [Paper](https://arxiv.org/abs/2111.13833) and [Code](https://github.com/HongyuanLuke/PPG)
On Symbolic and Neural Commonsense Knowledge Graphs (COMET-ATOMIC 2020): [Paper](https://arxiv.org/abs/2010.05953) and [Code](https://github.com/allenai/comet-atomic-2020)