Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • gaojingtong/amazon-kdd-cup-2024-starter-kit
  • pp/amazon-kdd-cup-2024
  • jeremy_shi/amazon-kdd-cup-2024-starter-kit
  • zeng_biao_jie/amazon-kdd-cup-2024-starter-kit
  • der2933/amazon-kdd-cup-2024-starter-kit
  • zbt2702160239/amazon-kdd-cup-2024-starter-kit
  • pokce/amazon-kdd-cup-2024-starter-kit
  • xbtl/amazon-kdd-cup-2024-starter-kit
  • boren/amazon-kdd-cup-2024-starter-kit
  • simon_jegou/amazon-kdd-cup-2024-starter-kit
  • li_zhi_peng/amazon-kdd-cup-2024-starter-kit
  • shisong_qin/amazon-kdd-cup-2024-starter-kit
  • lei_ding5/amazon-kdd-cup-2024-starter-kit
  • Pokce2/amazon-kdd-cup-2024-starter-kit
  • lizhipeng/amazon-kdd-cup-2024-starter-kit
  • giba/amazon-kdd-cup-2024-starter-kit
  • liuxiaoming1412/amazon-kdd-cup-2024-phase-2-lxm-07
  • haoyuzhang/amazon-kdd-cup-2024-starter-kit
  • aicrowd/challenges/amazon-kdd-cup-2024/amazon-kdd-cup-2024-starter-kit
19 results
Show changes
Commits on Source (18)
Showing
with 1424 additions and 157 deletions
models/**
\ No newline at end of file
.git/
models/**
data/
\ No newline at end of file
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.jsonl filter=lfs diff=lfs merge=lfs -text
index.ivf filter=lfs diff=lfs merge=lfs -text
*.ivf filter=lfs diff=lfs merge=lfs -text
......@@ -160,5 +160,5 @@ cython_debug/
#.idea/
scores.json
data/
*.ipynb
\ No newline at end of file
*.ipynb
*.out
\ No newline at end of file
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive \
LANG=en_US.UTF-8 \
......@@ -24,7 +24,7 @@ RUN groupadd -g 1001 aicrowd && \
USER ${USER_NAME}
WORKDIR ${HOME_DIR}
# Install Miniconda and Python packages. You can change the python version by using another Miniconda.
# Install Miniconda and Python packages. You can change the python version by using another Miniconda.
RUN wget -nv -O miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py38_22.11.1-1-Linux-x86_64.sh \
&& bash miniconda.sh -b -p ${CONDA_DIR} \
&& . ${CONDA_DIR}/etc/profile.d/conda.sh \
......@@ -33,7 +33,11 @@ RUN wget -nv -O miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-py38
&& rm -rf miniconda.sh
COPY --chown=1001:1001 requirements.txt ${HOME_DIR}/requirements.txt
# Use CFLAGS to set compiler options
RUN pip install --upgrade pip setuptools wheel numpy==1.24.4
RUN pip install -r requirements.txt --no-cache-dir
COPY --chown=1001:1001 requirements_eval.txt ${HOME_DIR}/requirements_eval.txt
RUN pip install -r requirements_eval.txt --no-cache-dir
......
......@@ -155,20 +155,22 @@ This also includes instructions on [specifying your software runtime](docs/submi
## 💻 What hardware does my code run on ?
You can find more details about the hardware and system configuration in [docs/hardware-and-system-config.md](docs/hardware-and-system-config.md).
In summary, we provide you `2` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)] in Phase 1; and `4` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)] in Phase 2.
In summary, we provide you `4` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)] in Phase 2.
Your solution will be given a certain amount of time for inference, after which it would be immediately killed and no results would be available. The time limit is set at
| Phase | Track 1 | Track 2 | Track 3 | Track 4 | Track 5 |
| ------ | ------- | ------- | ------- | ------- | ------- |
| **Phase 1**| 140 minutes | 40 minutes | 60 minutes | 60 minutes | 5 hours |
| **Phase 2**| 70 minutes | 20 minutes | 30 minutes | 20 minutes | 140 minutes |
For reference, the baseline solution with zero-shot [Vicuna-7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) (Find it [**here**](https://gitlab.aicrowd.com/aicrowd/challenges/amazon-kdd-cup-2024/amazon-kdd-cup-2024-starter-kit/-/blob/master/models/dummy_model.py)) consumes the following amount of time.
For reference, the baseline solution with zero-shot LLaMA3-8B-instruct consumes the following amount of time.
| Phase | Track 1 | Track 2 | Track 3 | Track 4 |
| ------ | ------- | ------- | ------- | ------- |
| **Phase 1**| ~50 minutes | ~3 minutes | ~25 minutes | ~35 minutes |
| **Phase 2**| 1490s | 397s | 576s | 359s |
We limit the prediction time of each sample to at most **15 seconds**.
We limit the prediction time of each sample to at most **10 seconds**. This limit applies at a batch level. For example, for a batch of 8 samples, you should return the prediction after at most 80 seconds. Otherwise, your submission will be killed.
Your maximum repo size is 200GB.
## 🧩 How are my model responses parsed by the evaluators ?
Please refer to [parsers.py](parsers.py) for more details on how we parse your model responses.
......
{
"challenge_id": "amazon-kdd-cup-24-understanding-shopping-concepts",
"challenge_id": "amazon-kdd-cup-24-multi-lingual-abilities",
"authors": [
"your-aicrowd-username"
"der2933"
],
"gpu": false,
"description": "(optional) description about your custom model"
"gpu": true,
"description": "goooooooooooooooooooooooooood"
}
\ No newline at end of file
This diff is collapsed.
source diff could not be displayed: it is stored in LFS. Options to address this: view the blob.
......@@ -35,6 +35,7 @@ docker run \
--gpus all \
-v "$(pwd)":/submission \
-w /submission \
--shm-size=10.24gb\
$IMAGE_NAME python local_evaluation.py
# Note: We assume you have nvidia-container-toolkit installed and configured
......
### Setting Up and Downloading Baseline Model weighta with Hugging Face
This guide outlines the steps to download (and check in) the models weights required for the baseline models.
We will focus on the `Meta-Llama-3-8B-Instruct`.
But the steps should work equally well for any other models on hugging face.
#### Preliminary Steps:
1. **Install the Hugging Face Hub Package**:
Begin by installing the `huggingface_hub` package, which includes the `hf_transfer` utility, by running the following command in your terminal:
```bash
pip install huggingface_hub[hf_transfer]
```
2. **Accept the LLaMA Terms**:
You must accept the LLaMA model's terms of use by visiting: [meta-llama/Meta-Llama-3-8B-Instruct Terms](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
3. **Create a Hugging Face CLI Token**:
Generate a CLI token by navigating to: [Hugging Face Token Settings](https://huggingface.co/settings/tokens). You will need this token for authentication.
#### Hugging Face Authentication:
1. **Login via CLI**:
Authenticate yourself with the Hugging Face CLI using the token created in the previous step. Run:
```bash
huggingface-cli login
```
When prompted, enter the token.
#### Model Downloads:
1. **Download LLaMA-2-7b Model**:
Execute the following command to download the `Meta-Llama-3-8B-Instruct` model to a local subdirectory. This command excludes unnecessary files to save space:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download \
meta-llama/Meta-Llama-3-8B-Instruct \
--local-dir-use-symlinks False \
--local-dir models/meta-llama/Meta-Llama-3-8B-Instruct \
--exclude *.pth # These are alternates to the safetensors hence not needed
```
#### Version Control with Git LFS:
1. **Track Model Weights**:
Use Git Large File Storage (LFS) to track the model directories. This ensures efficient handling of large files:
```bash
git lfs track "models/meta-llama/*"
```
2. **Commit and Push**:
Add the models to your Git repository, commit the changes, and push them to your remote repository:
```bash
git add models/
git commit -am "add weights"
git push origin master
```
If you are struggling with GIT-LFS, you are very much encouraged to check out [this post](https://discourse.aicrowd.com/t/how-to-upload-large-files-size-to-your-submission/2304).
......@@ -52,20 +52,36 @@ def generate_model_outputs(data_df, model):
- A list containing the model outputs for each entry in the data DataFrame.
"""
outputs = []
for _, row in tqdm(
data_df.iterrows(), total=len(data_df), desc="Generating Responses"
):
is_multiple_choice = row["task_type"] == "multiple-choice"
# the 'task_type' column won't be available during evaluation, so you should use something like
# ```is_multiple_choice = row['is_multiple_choice']``
prompt = row["input_field"]
model_output = model.predict(prompt, is_multiple_choice)
outputs.append(model_output)
return outputs
task_grouped_df = data_df.groupby(by=["task_type"])
for task_type, task_group_data_df in task_grouped_df:
task_group_data_df = task_group_data_df.reset_index(drop=True)
is_multiple_choice = task_type[0] == "multiple-choice"
batch_size = model.get_batch_size()
batches = [task_group_data_df[i:i+batch_size] for i in range(0,len(task_group_data_df),batch_size)]
for batch_df in batches:
batch = {
"prompt": batch_df["input_field"].tolist(),
}
model_output = model.batch_predict(
batch,
is_multiple_choice
)
outputs.append(
pd.DataFrame({
"input_field": batch["prompt"],
"model_output_str": model_output
}))
df_outputs = pd.concat(outputs)
return df_outputs
# Function to evaluate the generated model outputs
def evaluate_outputs(data_df, outputs, log_every_n_steps=1):
def evaluate_outputs(data_df, log_every_n_steps=1):
"""
Evaluate the model outputs against ground truth values using specified metrics.
......@@ -84,17 +100,18 @@ def evaluate_outputs(data_df, outputs, log_every_n_steps=1):
for row_idx, row in tqdm(
data_df.iterrows(), total=len(data_df), desc="Evaluating"
):
task_name, task_type, metric, ground_truth = (
task_name, task_type, metric, ground_truth, model_output_str = (
row["task_name"],
row["task_type"],
row["metric"],
row["output_field"],
row["model_output_str"],
)
if metric not in eval_methods:
raise NotImplementedError(f"No metric for {metric=}")
model_output = task_parsers[task_type].parse(outputs[row_idx])
model_output = task_parsers[task_type].parse(model_output_str)
eval_fn = eval_methods[metric]
metric_score = eval_fn(model_output, ground_truth)
......@@ -211,7 +228,7 @@ def main():
# Load development data
# Please download the development data from : https://www.aicrowd.com/challenges/amazon-kdd-cup-2024-multi-task-online-shopping-challenge-for-llms/dataset_files
# and place it at: ./data/development.json
DATA_FILENAME = "./data/development.json"
DATA_FILENAME = "./data/development_0626.json"
if not os.path.exists(DATA_FILENAME):
raise FileNotFoundError(
......@@ -230,14 +247,15 @@ def main():
model = UserModel()
# Generate model outputs
outputs = generate_model_outputs(data_df, model)
data_df["outputs"] = (
outputs # Optional: Add outputs back to DataFrame for inspection
)
print(data_df.head())
df_outputs = generate_model_outputs(data_df, model)
# add outputs to the data_df
merged_data_df = pd.merge(data_df, df_outputs, on="input_field")
print(merged_data_df.head())
# Evaluate the generated outputs and calculate metrics
per_task_metrics = evaluate_outputs(data_df, outputs)
per_task_metrics = evaluate_outputs(merged_data_df)
# Aggregate and display the evaluation scores
overall_metrics = aggregate_scores(per_task_metrics)
......
from rouge_score import rouge_scorer
from sentence_transformers import SentenceTransformer
import numpy as np
import evaluate
import os
from typing import List, Tuple, Union
import evaluate
import numpy as np
import torch
from typing import List, Union, Tuple
from loguru import logger
from rouge_score import rouge_scorer
from sentence_transformers import SentenceTransformer
sacrebleu = None
sentence_transformer_model_cache = {}
......
......@@ -4,7 +4,7 @@
For a streamlined experience, we suggest placing the code for all your models within the `models` directory. This is a recommendation for organizational purposes, but it's not a strict requirement.
## Model Base Class
Your models should inherit from the `ShopBenchBaseModel` class found in [base_model.py](base_model.py). We provide an example model, `dummy_model.py`, to illustrate how you might structure your own model. Crucially, your model class must implement the `predict` method.
Your models should inherit from the `ShopBenchBaseModel` class found in [base_model.py](base_model.py). We provide an example model, `dummy_model.py`, to illustrate how you might structure your own model. Crucially, your model class must implement the `batch_predict` method.
## Configuring Your Model
To ensure your model is recognized and utilized correctly, please specify your model class name in the [`user_config.py`](user_config.py) file, by following the instructions in the inline comments.
......@@ -12,12 +12,14 @@ To ensure your model is recognized and utilized correctly, please specify your m
## Model Inputs and Outputs
### Inputs
Your model will receive two pieces of information for every task:
- `prompt` (`str`): This is the specific task's input prompt.
- `batch` (`Dict[str, Any]`): A batch of inputs as a dictionary, where the dictionary has the following key:
- `prompt` (`List[str]`): `A list if prompts representing the tasks in a batch`
- `is_multiple_choice` (`bool`): This indicates whether the task is a multiple choice question.
### Outputs
The output from your model's `predict` function should always be a string. Depending on the task, this could be:
The output from your model's `batch_predict` function should be a list of string responses for all the prompts in the input batch.
Depending on the task, each response could be:
- A single integer (in the range [0, 3]) for multiple choice tasks.
- A comma-separated list of integers for ranking tasks.
- A comma-separated list of named entities for Named Entity Recognition (NER) tasks.
......
from typing import Any, Dict, List
# class ShopBenchBaseModel:
# def __init__(self):
# pass
# def get_batch_size(self) -> int:
# """
# Determines the batch size that is used by the evaluator when calling the `batch_predict` function.
# Returns:
# int: The batch size, an integer between 1 and 16. This value indicates how many
# queries should be processed together in a single batch. It can be dynamic
# across different batch_predict calls, or stay a static value.
# """
# raise NotImplementedError("get_batch_size method not implemented")
# def batch_predict(self, batch: Dict[str, Any], is_multiple_choice:bool) -> List[str]:
# """
# Generates a batch of prediction based on associated prompts and task_type
# For multiple choice tasks, it randomly selects a choice.
# For other tasks, it returns a list of integers as a string,
# representing the model's prediction in a format compatible with task-specific parsers.
# Parameters:
# - batch (Dict[str, Any]): A dictionary containing a batch of input prompts with the following keys
# - prompt (List[str]): a list of input prompts for the model.
# - is_multiple_choice bool: A boolean flag indicating if all the items in this batch belong to multiple choice tasks.
# Returns:
# str: A list of predictions for each of the prompts received in the batch.
# Each prediction is
# a string representing a single integer[0, 3] for multiple choice tasks,
# or a string representing a comma separated list of integers for Ranking, Retrieval tasks,
# or a string representing a comma separated list of named entities for Named Entity Recognition tasks.
# or a string representing the (unconstrained) generated response for the generation tasks
# Please refer to parsers.py for more details on how these responses will be parsed by the evaluator.
# """
# raise NotImplementedError("predict method not implemented")
class ShopBenchBaseModel:
def __init__(self):
pass
......
This diff is collapsed.
{
"_name_or_path": "/root/models/llama3",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.3",
"use_cache": true,
"vocab_size": 128256
}
{
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9,
"transformers_version": "4.42.3"
}
File added
File added
File added