# Meta KDD Cup 2024 Winning Solution: Comprehensive Retrieval-Augmented Generation (CRAG) System
# Meta KDD Cup '24 [CRAG: Comprehensive RAG Benchmark](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024) Starter Kit
## Overview
Welcome to the db3 team's winning solution for the Meta KDD Cup 2024. This repository contains our implementation for the Comprehensive Retrieval-Augmented Generation (CRAG) system, which leverages web sources and knowledge graphs to answer queries. Our solutions secured first place in all three tasks of the competition.
This repository is the CRAG: Comphrensive RAG Benchmark **Submission template and Starter kit**! Clone the repository to compete now!
## Repository Structure
**This repository contains**:
***Documentation** on how to submit your models to the leaderboard
***The procedure** for best practices and information on how we evaluate your model, etc.
***Starter code** for you to get started!
The base framework for our solution is derived from the [Meta Comprehensive RAG Benchmark Starter Kit](https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/tree/master). We have extended and customized this framework to develop our winning solution, particularly focusing on the `models` folder which contains the core components of our approach.
To use our solution, you need to download the following models and place them in the specified directories:
1.**Meta Llama 3 - 8B Instruct**:
Download from [Meta Llama 3 - 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and place it in `models/Llama-3-8B-instruct`.
# 📖 Competition Overview
2.**All-MiniLM-L6-v2**:
Download from [All-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and place it in `models/all-Mini-L6-v2`.
# 📊 Dataset
Please find more details about the dataset in [docs/dataset.md](docs/dataset.md).
# 👨💻👩💻 Tasks
## 📏 Evaluation Metrics
Please refer to [local_evaluation.py](local_evaluation.py) for more details on how we will evaluate your submissions.
# 🏁 Getting Started
1.**Sign up** to join the competition [on the AIcrowd website](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024).
2.**Fork** this starter kit repository. You can use [this link](https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/forks/new) to create a fork.
3.**Clone** your forked repo and start developing your model.
4.**Develop** your model(s) following the template in [how to write your own model](#how-to-write-your-own-model) section.
5.[**Submit**](#-how-to-make-a-submission) your trained models to [AIcrowd Gitlab](https://gitlab.aicrowd.com) for evaluation [(full instructions below)](#-how-to-make-a-submission). The automated evaluation will evaluate the submissions on the public test set and report the metrics on the leaderboard of the competition.
# ✍️ How to write your own model?
Please follow the instructions in [models/README.md](models/README.md) for instructions and examples on how to write your own models for this competition.
# 🚴 How to start participating?
## Setup
1.**Add your SSH key** to AIcrowd GitLab
You can add your SSH Keys to your GitLab account by going to your profile settings [here](https://gitlab.aicrowd.com/-/profile/keys). If you do not have SSH Keys, you will first need to [generate one](https://docs.gitlab.com/ee/user/ssh.html).
2.**Fork the repository**. You can use [this link](https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/forks/new) to create a fork.
5. Write your own model as described in [How to write your own model](#how-to-write-your-own-model) section.
6. Test your model locally using `python local_evaluation.py`.
7. Accept the Challenge Rules on the main [challenge page](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024) by clicking on the **Participate** button. Also accept the Challenge Rules on the Task specific page (link on the challenge page) that you want to submit to.
8. Make a submission as described in [How to make a submission](#-how-to-make-a-submission) section.
# ✍️ How to write your own model?
Please follow the instructions in [models/README.md](models/README.md) for instructions and examples on how to write your own models for this competition.
## 📮 How to make a submission?
Please follow the instructions in [docs/submission.md](docs/submission.md) to make your first submission.
This also includes instructions on [specifying your software runtime](docs/submission.md#specifying-software-runtime-and-dependencies), [code structure](docs/submission.md#code-structure-guidelines), [submitting to different tracks](docs/submission.md#submitting-to-different-tracks).
**Note**: **Remember to accept the Challenge 1Rules** on the challenge page, **and** the task page before making your first submission.
## 💻 What hardware does my code run on ?
You can find more details about the hardware and system configuration in [docs/hardware-and-system-config.md](docs/hardware-and-system-config.md).
In summary, we provide you `4` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)].
## 🏁 Baseline
We include two baselines for demonstration purposes, and you can read more abou them in [docs/baselines.md](docs/baselines.md).
# ❓ Frequently Asked Questions
## Which track is this starter kit for ?
This starter kit can be used to submit to any of the tracks. You can find more information in [docs/submission.md#submitting-to-different-tracks](docs/submission.md#submitting-to-different-tracks).
## Where can I know more about the dataset schema ?
The dataset schema is described in [docs/dataset.md](docs/dataset.md).