We provide two separate settings for participants to choose from, the GPU track and the Prompt Engineering Track.
## GPU Track
In this track we provide participants with access to a single GPU with 24GB VRAM, this will allow them to fine tune and submit their own LLMs that are specific for this task.
## Prompt Engineering Track
In the prompt engineering track, we provide participants with access to the OpenAI API. This will allow anyone to test their prompt engineering skills with a powerful LLM and combine it with advanced etrieval based methods to generate context.
## Can I participate in both tracks?
Yes, anyone can participate in both tracks, the prize pool is common. The submission limits will apply to both tracks combined. See below details of how to specify the track for the submissions.
# Getting Started
1.**Sign up** to join the competition [on the AIcrowd website](https://www.aicrowd.com/challenges/commonsense-persona-grounded-dialogue-challenge-2023/problems/task-1-commonsense-dialogue-response-generation).
...
...
@@ -90,7 +105,7 @@ We also provide a list of other resources that may be related to this task:
We recommend that you place the code for all your models in the `agents/` directory (though it is not mandatory). You should implement the following
-`generate_responses` - This function is called to generate the response of a conversation given two persona information.
-`generate_responses` - This function is called to generate the response of a conversation given persona information.
**Add your agent name in**`agent/user_config.py`, this is what will be used for the evaluations.
...
...
@@ -108,19 +123,19 @@ You can add your SSH Keys to your GitLab account by going to your profile settin
"description":"(optional) description about your awesome model"
}
```
...
...
@@ -170,20 +197,32 @@ This JSON is used to map your submission to the challenge - so please remember t
### Evaluation Metrics
### Time and compute constraints
### Time, compute and api constraints
You will be provided conversations with 7 turns each in `batches of upto 50 conversations`. For each batch of conversations, the first set of turns will be provided to your model. After the response is receieved the further turns of the same conversation will be provided. Each conversation will have exactly 7 turns. Your model needs to `complete all 7 responses of 50 conversations within **1 hour**`. The number of batches of conversation your model will process will vary based on the challenge round.
Before running on the challenge dataset, your model will be run on the dummy data, as a sanity check. This will show up as the `convai-validation` phase on your submission pages. The dummy data will contain `5 conversations of 7 turns each`, your model needs to `complete the validation phase within **15 minutes**`.
Before your model starts processing conversations, it is provided an additional time upto *5 minutes* to load models or preprocess any data if needed.
## GPU Track
Your model will be run on an AWS g5.2xlarge node. This node has **8 vCPUs, 32 GB RAM, and one Nvidia A10G GPU with 24 GB VRAM**.
Before your model starts processing conversations, it is provided an additional time upto *5 minutes* to load models or preprocess any data if needed.
## Prompt Engineering Track
Your model will be run on an AWS m5.xlarge node. This node has *4 vCPUs, 16 GB RAM**
For API usage, the following constraints will apply:
* A maximum of 2 api calls per utterance is allowed.
* Input token limit per dialog (the combined number of input tokens for 7 utterances) - 10,000
* Output token limit per dialog (the combined number of output tokens for 7 utterances) - 1,000
## Local Evaluation
Participants can run the evaluation protocol for their model locally with or without any constraint posed by the challenge to benchmark their models privately. See `local_evaluation.py` for details. You can change it as you like, your changes to `local_evaluation.py` will **NOT** be used for the competition.
To test your submissions with the prompt engineering track, please use `local_evaluation_with_api.py`
## Note about Dummy test data
The file `dummy_data_task1.json` is a dummy test dataset to test your code before submission. All dialogues in the dataset based on a same pair of persona A and persona B, but the actual test dataset for evaluation is not like this and was created based on different pairs of personas.
We recommend that you place the code for all your agents in the `agents` directory (though it is not mandatory). All your submissions should contain an Agent class. We have added dummy agent example in [`dummy_agent.py`](dummy_agent.py). The agent class should contain the `generate_responses`
We recommend that you place the code for all your agents in the `agents` directory (though it is not mandatory). All your submissions should contain an Agent class. We have added dummy agent example in [`dummy_agent.py`](dummy_agent.py) and a api usage example in [`prompt_agent.py`]. The agent class should contain the `generate_responses`
## How to participate in GPU Track
Set `"gpu": true` in `aicrowd.json`. While the gpu flag is set to true, the api will not be usable.
## How to participate in Prompt Engineering Track
Set `"gpu": false` in `aicrowd.json`. API usage will be enabled only when GPU is not used.
## Submission details
**Add your agent class name in**[`user_config.py`](user_config.py) as UserAgent