diff --git a/README.md b/README.md index 91dca1204eed1a570cbc348cb78c1059887eb60f..1e48d989f351701df0749a2126ada96d73ae1ecb 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,33 @@ The **Commonsense Persona-grounded Dialogue (CPD)** Challenge hosts one track on ## [Task 1: Commonsense Dialogue Response Generation](https://www.aicrowd.com/challenges/commonsense-persona-grounded-dialogue-challenge-2023/problems/task-1-commonsense-dialogue-response-generation) -Participants will submit dialogue response generation systems. We do not provide a training dataset, and participants may use any datasets which they want to use. We provide [a baseline model](https://github.com/Silin159/PersonaChat-BART-PeaCoK), which can be tested on the [ConvAI2 PERSONA-CHAT](https://arxiv.org/abs/1902.00098v1) dataset, so that you can see what the problem of this task is. We will evaluate submitted systems on the persona-grounded dialogue dataset. The dialogues in the evaluation dataset have persona sentences similar to the PersonaChat dataset, but the number of persona sentences for a person is more than five sentences. The major part of the persona is derived from the [PeaCoK](https://github.com/Silin159/PeaCoK) knowledge graph. +Participants will submit dialogue response generation systems. We provide [a baseline model](https://github.com/Silin159/PeaCoK-PersonaChat) trained on [ConvAI2 PERSONA-CHAT](https://arxiv.org/abs/1902.00098v1) dataset with [PeaCoK](https://github.com/Silin159/PeaCoK) persona knowledge augmentation. Our trained baseline model checkpoint could be downloaded from [this repository](https://github.com/Silin159/PersonaChat-BART-PeaCoK). + +Participants may use any datasets for training their models, not limited to our provided [training datasets](https://drive.google.com/drive/folders/1A51hZvSLvJoPAKDy2XR_eb-ooZqPRgbb?usp=sharing) used for developing the baseline model. Our provided training data include: + +* Original PERSONA-CHAT (with either original or revised PERSONA-CHAT profiles): + * Training set (original profiles): `data/persona_peacok/train_persona_original_chat_convai2.json` + * Validation set (original profiles): `data/persona_peacok/valid_persona_original_chat_convai2.json` + * Training set (revised profiles): `data/persona_peacok/train_persona_revised_chat_convai2.json` + * Validation set (revised profiles): `data/persona_peacok/valid_persona_revised_chat_convai2.json` +* PERSONA-CHAT with profiles augmented with PeaCoK facts (up to 5 randomly chosen to augment each profile): + * Training set (augmented original profiles): `data/persona_peacok/train_persona_original_chat_ext.json` + * Validation set (augmented original profiles): `data/persona_peacok/valid_persona_original_chat_ext.json` + * Training set (augmented revised profiles): `data/persona_peacok/train_persona_revised_chat_ext.json` + * Validation set (augmented revised profiles): `data/persona_peacok/valid_persona_revised_chat_ext.json` +* Full set of PeaCoK facts linked to each PERSONA-CHAT profile: + * For original profiles: `data/persona_peacok/persona_extend_full_original.json` + * For revised profiles: `data/persona_peacok/persona_extend_full_revised.json` +* Full PeaCoK knowledge graph: + * `data/peacok_kg.json` + +We will evaluate submitted systems on an internal persona-grounded dialogue dataset. The dialogues in our evaluation dataset have persona sentences similar to the PERSONA-CHAT dataset, but the number of persona sentences for a person is more than five sentences. The major part of the persona is derived from the [PeaCoK](https://github.com/Silin159/PeaCoK) knowledge graph. + +We also provide a list of other resources that may be related to this task: +[Original PERSONA-CHAT Paper](https://arxiv.org/abs/1801.07243) +[PERSONA-CHAT Leaderboard](https://paperswithcode.com/sota/dialogue-generation-on-persona-chat-1) +Partner Personas Generation for Diverse Dialogue Generation (PPG): [Paper](https://arxiv.org/abs/2111.13833) and [Code](https://github.com/HongyuanLuke/PPG) +On Symbolic and Neural Commonsense Knowledge Graphs (COMET-ATOMIC 2020): [Paper](https://arxiv.org/abs/2010.05953) and [Code](https://github.com/allenai/comet-atomic-2020) 