Commonsense Persona-grounded Dialogue Challenge - Task 1 - Starter kit
This repository is the CPD Challenge (Task 1) Submission template and Starter kit! Clone the repository to compete now!
This repository contains:
- Documentation on how to submit your models to the leaderboard
- The procedure for best practices and information on how we evaluate your model, etc.
- Starter code for you to get started!
Table of Contents
- Commonsense Persona-grounded Dialogue Challenge - Task 1 - Starter kit
- Table of Contents
- Competition Overview
- Getting Started
- How to write your own model?
- How to start participating?
- Other Concepts
- 📎 Important links
Competition Overview
This challenge is an opportunity for researchers and machine learning enthusiasts to test their skills on the challenging tasks of Commonsense Dialogue Response Generation (Task1) and Commonsense Persona Knowledge Linking (Task2) for persona-grounded dialogue.
Research on dialogue systems has been around for a long time, but thanks to Transformers and Large Language Models (LLM), conversational AI has come a long way in the last five years, becoming more human-like. On the other hand, it is still challenging to collect natural dialogue data for research and to benchmark which models ultimately perform the best because there is no definitive assessment data or metrics, and the comparisons are often within a limited amount of models.
We contribute to the research and development of current state-of-the-art dialogue systems, by crafting high quality human-human dialogues for model testing, and providing a common benchmarking venue by hosting this CPDC 2023 competition.
The competition aims to see the best approach among state-of-the-art participant models on an evaluation dataset of natural conversation. The submitted systems will be evaluated on a new Commonsense Persona-grounded Dialogue dataset. To this end, we first created several persona profiles, similar to ConvAI2, with a natural personality based on a commonsense persona-grounded knowledge graph (PeaCoK†) newly released on ACL 2023, and allowing us to obtain naturally related persona sentences. Furthermore, based on that persona, we created a natural dialogue between two people and prepared a sufficient amount of dialogue data for evaluation.
The Commonsense Persona-grounded Dialogue (CPD) Challenge hosts one track on Commonsense Dialogue Response Generation (Task 1) and one track on Commonsense Persona Knowledge Linking (Task 2). Independent leaderboards are set for the two tracks, each featuring a separate prize pool. In either case, participants may use any learning data. In Task 1, participants will submit dialogue response generation systems. We will evaluate them on the prepared persona-grounded dialogue dataset mentioned above. In Task 2, participants will submit systems linking knowledge to a dialogue. This task is designed in the similar spirit of ComFact, which is released along with the published paper in EMNLP 2022. We will evaluate them by checking if the linking of persona-grounded knowledge can be judged successfully on the persona-grounded dialogue dataset.
† PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives (ACL2023 Outstanding Paper Award)
Task 1: Commonsense Dialogue Response Generation
Participants will submit dialogue response generation systems. We do not provide a training dataset, and participants may use any datasets which they want to use. We provide a baseline model, which can be tested on the ConvAI2 PERSONA-CHAT dataset, so that you can see what the problem of this task is. We will evaluate submitted systems on the persona-grounded dialogue dataset. The dialogues in the evaluation dataset have persona sentences similar to the PersonaChat dataset, but the number of persona sentences for a person is more than five sentences. The major part of the persona is derived from the PeaCoK knowledge graph.
Getting Started
- Sign up to join the competition on the AIcrowd website.
- Fork this starter kit repository. You can use this link to create a fork.
- Clone your forked repo and start developing your model.
- Develop your model(s) following the template in how to write your own model section.
- Submit your trained models to AIcrowd Gitlab for evaluation (full instructions below). The automated evaluation setup will evaluate the submissions on the private datasets and report the metrics on the leaderboard of the competition.
How to write your own model?
We recommend that you place the code for all your models in the agents/
directory (though it is not mandatory). You should implement the following
-
generate_responses
- This function is called to generate the response of a conversation given two persona information.
Add your agent name in agent/user_config.py
, this is what will be used for the evaluations.
An example are provided in agent/dummy_agent.py
How to start participating?
Setup
- Add your SSH key to AIcrowd GitLab
You can add your SSH Keys to your GitLab account by going to your profile settings here. If you do not have SSH Keys, you will first need to generate one.
-
Fork the repository. You can use this link to create a fork.
-
Clone the repository
git clone git@gitlab.aicrowd.com:aicrowd/challenges/commonsense-persona-grounded-dialogue-challenge-2023/commonsense-persona-grounded-dialogue-challenge-task-1-starter-kit
-
Install competition specific dependencies!
cd commonsense-persona-grounded-dialogue-challenge-task-1-starter-kit pip install -r requirements.txt
-
Write your own model as described in How to write your own model section.
-
Test your model locally using
python local_evaluation.py
-
Make a submission as described in How to make a submission section.
How do I specify my software runtime / dependencies?
We accept submissions with custom runtime, so you don't need to worry about which libraries or framework to pick from.
The configuration files typically include requirements.txt
(pypi packages), apt.txt
(apt packages) or even your own Dockerfile
.
An example Dockerfile is provided in utilities/_Dockerfile which you can use as a starting point.
You can check detailed information about setting up runtime dependencies in the 👉 docs/runtime.md file.
What should my code structure be like?
Please follow the example structure as it is in the starter kit for the code structure. The different files and directories have following meaning:
.
├── aicrowd.json # Submission meta information - like your username
├── apt.txt # Linux packages to be installed inside docker image
├── requirements.txt # Python packages to be installed
├── local_evaluation.py # Use this to check your model evaluation flow locally
├── dummy_data_task1.json # A set of dummy conversations you can use for integration testing
└── agents # Place your models related code here
├── dummy_agent.py # Dummy agent for example interface
└── user_config.py # IMPORTANT: Add your agent name here
Finally, you must specify an AIcrowd submission JSON in aicrowd.json
to be scored!
The aicrowd.json
of each submission should contain the following content:
{
"challenge_id": "task-1-commonsense-dialogue-response-generation",
"authors": ["your-aicrowd-username"],
"gpu": true,
"description": "(optional) description about your awesome model"
}
This JSON is used to map your submission to the challenge - so please remember to use the correct challenge_id
as specified above. You can modify the authors
and description
keys. Please DO NOT add any additional keys to aicrowd.json
unless otherwise communicated during the course of the challenge.
Other Concepts
Evaluation Metrics
Time and compute constraints
You will be provided conversations with 7 turns each in batches of upto 50 conversations
. For each batch of conversations, the first set of turns will be provided to your model. After the response is receieved the further turns of the same conversation will be provided. Each conversation will have exactly 7 turns. Your model needs to complete all 7 responses of 50 conversations within **1 hour**
. The number of batches of conversation your model will process will vary based on the challenge round.
Before running on the challenge dataset, your model will be run on the dummy data, as a sanity check. This will show up as the convai-validation
phase on your submission pages. The dummy data will contain 5 conversations of 7 turns each
, your model needs to complete the validation phase within **15 minutes**
.
Your model will be run on an AWS g5.2xlarge node. This node has 8 vCPUs, 32 GB RAM, and one Nvidia A10G GPU with 24 GB VRAM.
Before your model starts processing conversations, it is provided an additional time upto 5 minutes to load models or preprocess any data if needed.
Local Evaluation
Participants can run the evaluation protocol for their model locally with or without any constraint posed by the challenge to benchmark their models privately. See local_evaluation.py
for details. You can change it as you like, your changes to local_evaluation.py
will NOT be used for the competition.
Note about Dummy test data
The file dummy_data_task1.json
is a dummy test dataset to test your code before submission. All dialogues in the dataset based on a same pair of persona A and persona B, but the actual test dataset for evaluation is not like this and was created based on different pairs of personas.
Contributing
🙏 You can share your solutions or any other baselines by contributing directly to this repository by opening merge request.
- Add your implemntation as
agents/<your_agent>.py
. - Import it in
user_config.py
- Test it out using
python local_evaluation.py
. - Add any documentation for your approach at top of your file.
- Create merge request! 🎉🎉🎉
How to make a submission?
👉 Follow the instuctions provided here docs/submission.md
Best of Luck 🎉 🎉