Skip to content
Snippets Groups Projects
Commit 35b32ad8 authored by mohanty's avatar mohanty
Browse files

Merge branch 'baseline-v0' into 'master'

add baseline related docs

See merge request !1
parents 44ee7a6e 01007dcc
No related branches found
No related tags found
1 merge request!1add baseline related docs
...@@ -24,6 +24,7 @@ This repository is the CRAG: Comphrensive RAG Benchmark **Submission template an ...@@ -24,6 +24,7 @@ This repository is the CRAG: Comphrensive RAG Benchmark **Submission template an
- [How to make a submission?](#-how-to-make-a-submission) - [How to make a submission?](#-how-to-make-a-submission)
- [What hardware does my code run on?](#-what-hardware-does-my-code-run-on-) - [What hardware does my code run on?](#-what-hardware-does-my-code-run-on-)
- [How are my model responses parsed by the evaluators?](#-how-are-my-model-responses-parsed-by-the-evaluators-) - [How are my model responses parsed by the evaluators?](#-how-are-my-model-responses-parsed-by-the-evaluators-)
- [Baselines](#baselines)
6. [Frequently Asked Questions](#-frequently-asked-questions) 6. [Frequently Asked Questions](#-frequently-asked-questions)
6. [Important Links](#-important-links) 6. [Important Links](#-important-links)
...@@ -101,6 +102,8 @@ This also includes instructions on [specifying your software runtime](docs/submi ...@@ -101,6 +102,8 @@ This also includes instructions on [specifying your software runtime](docs/submi
You can find more details about the hardware and system configuration in [docs/hardware-and-system-config.md](docs/hardware-and-system-config.md). You can find more details about the hardware and system configuration in [docs/hardware-and-system-config.md](docs/hardware-and-system-config.md).
In summary, we provide you `4` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)]. In summary, we provide you `4` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)].
## 🏁 Baseline
We include two baselines for demonstration purposes, and you can read more abou them in [docs/baselines.md](docs/baselines.md).
# ❓ Frequently Asked Questions # ❓ Frequently Asked Questions
## Which track is this starter kit for ? ## Which track is this starter kit for ?
......
# CRAG Baselines
For the CRAG benchmark, we provide participants with two baseline models to help get started. Detailed implementations of these baseline models are accessible through the links provided below. Participants are encouraged to use these as a starting point for the competition.
Please note that these baselines are **NOT** tuned for performance or efficiency, and are provided as is for demonstration.
## Available Baseline Models:
1. **Vanilla Llama 2 Model**: For an implementation guide and further details, refer to the Vanilla Llama 2 model documentation [here](../models/vanilla_llama_baseline.py).
2. **RAG Baseline Model**: For an implementation guide and further details, refer to the RAG Baseline model documentation [here](../models/rag_llm_model.py).
## Preparing Your Submission:
Before you can submit your solutions using these baselines, it is necessary to download the model weights and incorporate them into this repository. To do this, follow the step-by-step instructions outlined in the document: [download_baseline_model_weights.md](download_baseline_model_weights.md).
Additionally, ensure that your configurations in [user_config.py](../models/user_config.py) correctly reference the model class you intend to use for your submission.
These steps are crucial for a successful submission. Make sure to follow them carefully. Good luck!
\ No newline at end of file
### Setting Up and Downloading Baseline Model weighta with Hugging Face
This guide outlines the steps to download (and check in) the models weights required for the baseline models.
We will focus on the `Llama-2-7b-chat-hf` and `all-MiniLM-L6-v2` models.
But the steps should work equally well for any other models on hugging face.
#### Preliminary Steps:
1. **Install the Hugging Face Hub Package**:
Begin by installing the `huggingface_hub` package, which includes the `hf_transfer` utility, by running the following command in your terminal:
```bash
pip install huggingface_hub[hf_transfer]
```
2. **Accept the LLaMA Terms**:
You must accept the LLaMA model's terms of use by visiting: [LLaMA-2-7b-chat-hf Terms](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
3. **Create a Hugging Face CLI Token**:
Generate a CLI token by navigating to: [Hugging Face Token Settings](https://huggingface.co/settings/tokens). You will need this token for authentication.
#### Hugging Face Authentication:
1. **Login via CLI**:
Authenticate yourself with the Hugging Face CLI using the token created in the previous step. Run:
```bash
huggingface-cli login
```
When prompted, enter the token.
#### Model Downloads:
1. **Download LLaMA-2-7b Model**:
Execute the following command to download the `Llama-2-7b-chat-hf` model to a local subdirectory. This command excludes unnecessary files to save space:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download \
meta-llama/Llama-2-7b-chat-hf \
--local-dir-use-symlinks False \
--local-dir models/meta-llama/Llama-2-7b-chat-hf \
--exclude *.bin # These are alternates to the safetensors hence not needed
```
3. **Download MiniLM-L6-v2 Model (for sentence embeddings)**:
Similarly, download the `sentence-transformers/all-MiniLM-L6-v2` model using the following command:
```bash
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download \
sentence-transformers/all-MiniLM-L6-v2 \
--local-dir-use-symlinks False \
--local-dir models/sentence-transformers/all-MiniLM-L6-v2 \
--exclude *.bin *.h5 *.ot # These are alternates to the safetensors hence not needed
```
#### Version Control with Git LFS:
1. **Track Model Weights**:
Use Git Large File Storage (LFS) to track the model directories. This ensures efficient handling of large files:
```bash
git lfs track "models/meta-llama/*"
git lfs track "models/sentence-transformers/*"
```
2. **Commit and Push**:
Add the models to your Git repository, commit the changes, and push them to your remote repository:
```bash
git add models/
git commit -am "add weights"
git push origin master
```
import os
from typing import List
import numpy as np
import torch
from blingfire import text_to_sentences_and_offsets
from bs4 import BeautifulSoup
from models.utils import trim_predictions_to_max_token_length
from sentence_transformers import SentenceTransformer
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
pipeline,
)
######################################################################################################
######################################################################################################
###
### IMPORTANT !!!
### Before submitting, please follow the instructions in the docs below to download and check in :
### the model weighs.
###
### https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/blob/master/docs/download_baseline_model_weights.md
###
###
### DISCLAIMER: This baseline has NOT been tuned for performance
### or efficiency, and is provided as is for demonstration.
######################################################################################################
# Load the environment variable that specifies the URL of the MockAPI. This URL is essential
# for accessing the correct API endpoint in Task 2 and Task 3. The value of this environment variable
# may vary across different evaluation settings, emphasizing the importance of dynamically obtaining
# the API URL to ensure accurate endpoint communication.
# Please refer to https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/crag-mock-api
# for more information on the MockAPI.
#
# **Note**: This environment variable will not be available for Task 1 evaluations.
CRAG_MOCK_API_URL = os.getenv("CRAG_MOCK_API_URL", "http://localhost:8000")
class RAGModel:
def __init__(self):
"""
Initialize your model(s) here if necessary.
This is the constructor for your DummyModel class, where you can set up any
required initialization steps for your model(s) to function correctly.
"""
self.sentence_model = SentenceTransformer('models/sentence-transformers/all-MiniLM-L6-v2', device='cuda')
self.num_context = 10
self.max_ctx_sentence_length = 1000 # characters
self.prompt_template = """You are given a quesition and references which may or may not help answer the question.
You are to respond with just the answer and no surrounding sentences.
If you are unsure about the answer, respond with "I don't know".
### Question
{query}
### References
{references}
### Answer"""
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=False,
)
model_name = "models/meta-llama/Llama-2-7b-chat-hf"
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.llm = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='auto',
quantization_config=bnb_config,
torch_dtype=torch.float16,
)
self.generation_pipe = pipeline(task="text-generation",
model=self.llm,
tokenizer=self.tokenizer,
max_new_tokens=10)
def generate_answer(self, query: str, search_results: List[str]) -> str:
"""
Generate an answer based on a provided query and a list of pre-cached search results.
Parameters:
- query (str): The user's question or query input.
- search_results (List[str]): A list containing the text content from web pages
retrieved as search results for the query. Each element in the list is a string
representing the HTML text of a web page.
Returns:
- (str): A plain text response that answers the query. This response is limited to 75 tokens.
If the generated response exceeds 75 tokens, it will be truncated to fit within this limit.
Notes:
- If the correct answer is uncertain, it's preferable to respond with "I don't know" to avoid
the penalty for hallucination.
- Response Time: Ensure that your model processes and responds to each query within 10 seconds.
Failing to adhere to this time constraint **will** result in a timeout during evaluation.
"""
all_sentences = []
for html_text in search_results:
soup = BeautifulSoup(html_text['page_result'], features="html.parser")
text = soup.get_text().replace('\n', '')
if len(text) > 0:
offsets = text_to_sentences_and_offsets(text)[1]
for ofs in offsets:
sentence = text[ofs[0]:ofs[1]]
all_sentences.append(sentence[:self.max_ctx_sentence_length])
else:
all_sentences.append('')
all_embeddings = self.sentence_model.encode(all_sentences, normalize_embeddings=True)
query_embedding = self.sentence_model.encode(query, normalize_embeddings=True)[None, :]
cosine_scores = (all_embeddings * query_embedding).sum(1)
top_sentences = np.array(all_sentences)[(-cosine_scores).argsort()[:self.num_context]]
references = ''
for snippet in top_sentences:
references += '<DOC>\n' + snippet + '\n</DOC>\n'
references = ' '.join(references.split()[:500])
final_prompt = self.prompt_template.format(query=query, references=references)
result = self.generation_pipe(final_prompt)[0]['generated_text']
answer = result.split("### Answer\n")[1]
# Trim prediction to a max of 75 tokens
trimmed_answer = trim_predictions_to_max_token_length(answer)
return trimmed_answer
from models.dummy_model import DummyModel from models.dummy_model import DummyModel
UserModel = DummyModel UserModel = DummyModel
\ No newline at end of file
# Uncomment the lines below to use the Vanilla LLAMA baseline
# from models.vanilla_llama import ChatModel
# UserModel = ChatModel
# Uncomment the lines below to use the RAG LLAMA baseline
# from models.rag_llama_baseline import RAGModel
# UserModel = RAGModel
import os
from typing import List
import numpy as np
import torch
from models.utils import trim_predictions_to_max_token_length
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
pipeline,
)
######################################################################################################
######################################################################################################
###
### IMPORTANT !!!
### Before submitting, please follow the instructions in the docs below to download and check in :
### the model weighs.
###
### https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/blob/master/docs/download_baseline_model_weights.md
###
###
### DISCLAIMER: This baseline has NOT been tuned for performance
### or efficiency, and is provided as is for demonstration.
######################################################################################################
# Load the environment variable that specifies the URL of the MockAPI. This URL is essential
# for accessing the correct API endpoint in Task 2 and Task 3. The value of this environment variable
# may vary across different evaluation settings, emphasizing the importance of dynamically obtaining
# the API URL to ensure accurate endpoint communication.
# Please refer to https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/crag-mock-api
# for more information on the MockAPI.
#
# **Note**: This environment variable will not be available for Task 1 evaluations.
CRAG_MOCK_API_URL = os.getenv("CRAG_MOCK_API_URL", "http://localhost:8000")
class ChatModel:
def __init__(self):
"""
Initialize your model(s) here if necessary.
This is the constructor for your DummyModel class, where you can set up any
required initialization steps for your model(s) to function correctly.
"""
self.prompt_template = """You are given a quesition and references which may or may not help answer the question. Your goal is to answer the question in as few words as possible.
### Question
{query}
### Answer"""
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=False,
)
model_name = "models/meta-llama/Llama-2-7b-chat-hf"
if not os.path.exists(model_name):
raise Exception(f"""
The evaluators expect the model weights to be checked into the repository,
but we could not find the model weights at {model_name}
Please follow the instructions in the docs below to download and check in the model weights.
https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/blob/master/docs/dataset.md
""")
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.llm = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='auto',
quantization_config=bnb_config,
torch_dtype=torch.float16,
)
self.generation_pipe = pipeline(task="text-generation",
model=self.llm,
tokenizer=self.tokenizer,
max_new_tokens=75)
def generate_answer(self, query: str, search_results: List[str]) -> str:
"""
Generate an answer based on a provided query and a list of pre-cached search results.
Parameters:
- query (str): The user's question or query input.
- search_results (List[str]): A list containing the text content from web pages
retrieved as search results for the query. Each element in the list is a string
representing the HTML text of a web page.
Returns:
- (str): A plain text response that answers the query. This response is limited to 75 tokens.
If the generated response exceeds 75 tokens, it will be truncated to fit within this limit.
Notes:
- If the correct answer is uncertain, it's preferable to respond with "I don't know" to avoid
the penalty for hallucination.
- Response Time: Ensure that your model processes and responds to each query within 10 seconds.
Failing to adhere to this time constraint **will** result in a timeout during evaluation.
"""
final_prompt = self.prompt_template.format(query=query)
result = self.generation_pipe(final_prompt)[0]['generated_text']
answer = result.split("### Answer")[1].strip()
# Trim prediction to a max of 75 tokens
trimmed_answer = trim_predictions_to_max_token_length(answer)
return trimmed_answer
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment