update readme

d245ca19 · jiazunchen · dcdb1c24 · d245ca19 · d245ca19 · d245ca19
Commit d245ca19 authored 7 months ago by jiazunchen
--- a/README.md
+++ b/README.md
-![banner image](https://aicrowd-production.s3.eu-central-1.amazonaws.com/challenge_images/meta-kdd-cup-24/meta_kdd_cup_24_banner.jpg)
-[![Discord](https://img.shields.io/discord/565639094860775436.svg)](https://discord.gg/yWurtB2huX)
+# Meta KDD Cup 2024 Winning Solution: Comprehensive  Retrieval-Augmented Generation (CRAG) System

-# Meta KDD Cup '24 [CRAG: Comprehensive RAG Benchmark](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024) Starter Kit
+## Overview

+Welcome to the db3 team's winning solution for the Meta KDD Cup 2024. This repository contains our implementation for the Comprehensive Retrieval-Augmented Generation (CRAG) system, which leverages web sources and knowledge graphs to answer queries. Our solutions secured first place in all three tasks of the competition.

-This repository is the CRAG: Comphrensive RAG Benchmark **Submission template and Starter kit**! Clone the repository to compete now!
+## Repository Structure

-**This repository contains**:
-*  **Documentation** on how to submit your models to the leaderboard
-*  **The procedure** for best practices and information on how we evaluate your model, etc.
-*  **Starter code** for you to get started!
+The base framework for our solution is derived from the [Meta Comprehensive RAG Benchmark Starter Kit](https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/tree/master). We have extended and customized this framework to develop our winning solution, particularly focusing on the `models` folder which contains the core components of our approach.

-# Table of 1Contents
+Here is an overview of the repository structure:
+```meta-kdd-cup-2024/
+├── models/
+│   ├── dummy_model.py        # Overall pipeline implementation
+│   ├── Parse.py              # Parser implementation
+│   ├── prompt_api.py         # Prompt implementation
+│   ├── Retriever.py          # Retriever implementation
+│   ├── training_reproduction/ # Fine-tuning code and data
+│   ├── pretrain_models/      # Fine-tuned models storage
+│   └── processed_data/       # Public data storage
+├── ...                       # Other files and folders from the starter kit
+```

-1. [Competition Overview](#-competition-overview)
-2. [Dataset](#-dataset)
-3. [Tasks](#-tasks)
-4. [Evaluation Metrics](#-evaluation-metrics)
-5. [Getting Started](#-getting-started)
-   - [How to write your own model?](#️-how-to-write-your-own-model)
-   - [How to start participating?](#-how-to-start-participating)
-      - [Setup](#setup)
-      - [How to make a submission?](#-how-to-make-a-submission)
-      - [What hardware does my code run on?](#-what-hardware-does-my-code-run-on-)
-      - [How are my model responses parsed by the evaluators?](#-how-are-my-model-responses-parsed-by-the-evaluators-)
-      - [Baselines](#baselines)
-6. [Frequently Asked Questions](#-frequently-asked-questions)
-6. [Important Links](#-important-links)
+To use our solution, you need to download the following models and place them in the specified directories:

+1. **Meta Llama 3 - 8B Instruct**:
+   Download from [Meta Llama 3 - 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) and place it in `models/Llama-3-8B-instruct`.

-# 📖 Competition Overview
+2. **All-MiniLM-L6-v2**:
+   Download from [All-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and place it in `models/all-Mini-L6-v2`.

-
-# 📊 Dataset
-
-Please find more details about the dataset in [docs/dataset.md](docs/dataset.md).
-
-# 👨‍💻👩‍💻 Tasks  
-
-
-## 📏 Evaluation Metrics
-
-
-Please refer to [local_evaluation.py](local_evaluation.py) for more details on how we will evaluate your submissions.
-
-# 🏁 Getting Started
-1. **Sign up** to join the competition [on the AIcrowd website](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024).
-2. **Fork** this starter kit repository. You can use [this link](https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/forks/new) to create a fork.
-3. **Clone** your forked repo and start developing your model.
-4. **Develop** your model(s) following the template in [how to write your own model](#how-to-write-your-own-model) section.
-5. [**Submit**](#-how-to-make-a-submission) your trained models to [AIcrowd Gitlab](https://gitlab.aicrowd.com) for evaluation [(full instructions below)](#-how-to-make-a-submission). The automated evaluation will evaluate the submissions on the public test set and report the metrics on the leaderboard of the competition.
-
-# ✍️ How to write your own model?
-
-Please follow the instructions in [models/README.md](models/README.md) for instructions and examples on how to write your own models for this competition.
-
-# 🚴 How to start participating?
-
-## Setup
-
-1. **Add your SSH key** to AIcrowd GitLab
-
-You can add your SSH Keys to your GitLab account by going to your profile settings [here](https://gitlab.aicrowd.com/-/profile/keys). If you do not have SSH Keys, you will first need to [generate one](https://docs.gitlab.com/ee/user/ssh.html).
-
-
-2. **Fork the repository**. You can use [this link](https://gitlab.aicrowd.com/aicrowd/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/meta-comphrehensive-rag-benchmark-starter-kit/-/forks/new) to create a fork.
-
-3.  **Clone the repository**
-
-    ```bash
-    git clone git@gitlab.aicrowd.com:<YOUR-AICROWD-USERNAME>/meta-comphrehensive-rag-benchmark-starter-kit.git
-    cd meta-comphrehensive-rag-benchmark-starter-kit
-    ```
-
-4. **Install** competition specific dependencies!
-    ```bash
-    cd meta-comphrehensive-rag-benchmark-starter-kit
-    pip install -r requirements.txt
-    ```
-
-5. Write your own model as described in [How to write your own model](#how-to-write-your-own-model) section.
-
-6. Test your model locally using `python local_evaluation.py`.
-
-7. Accept the Challenge Rules on the main [challenge page](https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024) by clicking on the **Participate** button. Also accept the Challenge Rules on the Task specific page (link on the challenge page) that you want to submit to.
-
-8. Make a submission as described in [How to make a submission](#-how-to-make-a-submission) section.
-
-# ✍️ How to write your own model?
-
-Please follow the instructions in [models/README.md](models/README.md) for instructions and examples on how to write your own models for this competition.
-
-
-## 📮 How to make a submission?
-
-Please follow the instructions in [docs/submission.md](docs/submission.md) to make your first submission. 
-This also includes instructions on [specifying your software runtime](docs/submission.md#specifying-software-runtime-and-dependencies), [code structure](docs/submission.md#code-structure-guidelines), [submitting to different tracks](docs/submission.md#submitting-to-different-tracks).
-
-**Note**: **Remember to accept the Challenge 1Rules** on the challenge page, **and** the task page before making your first submission.
-
-## 💻 What hardware does my code run on ?
-You can find more details about the hardware and system configuration in [docs/hardware-and-system-config.md](docs/hardware-and-system-config.md).
-In summary, we provide you `4` x [[NVIDIA T4 GPUs](https://www.nvidia.com/en-us/data-center/tesla-t4/)].
-
-## 🏁 Baseline
-We include two baselines for demonstration purposes, and you can read more abou them in [docs/baselines.md](docs/baselines.md).
-
-# ❓ Frequently Asked Questions
-## Which track is this starter kit for ?
-This starter kit can be used to submit to any of the tracks. You can find more information in [docs/submission.md#submitting-to-different-tracks](docs/submission.md#submitting-to-different-tracks).
-
-## Where can I know more about the dataset schema ?
-The dataset schema is described in [docs/dataset.md](docs/dataset.md).
-
-**Best of Luck** :tada: :tada:
-
-# 📎 Important links
-
- 💪 Challenge Page: https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024
- 🗣 Discussion Forum: https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/discussion
- 🏆 Leaderboard: https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024/leaderboards
+3. **BGE Reranker v2 m3**:
+   Download from [BGE Reranker v2 m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) and place it in `models/bge-reranker-v2-m3`.
--- a/README.pdf
+++ b/README.pdf
--- a/models/Retriever.py
+++ b/models/Retriever.py
@@ -146,13 +146,12 @@ class Retriever2:
        return docs 
    
    def contains_year(self,sentence):
-    # 正则表达式匹配四位数的数字，通常用于年份
-        pattern = r'\b(19|20)\d{2}\b'  # 匹配1900-2099之间的年份
+        pattern = r'\b(19|20)\d{2}\b'  
        match = re.search(pattern, sentence)
        if match:
-            return True, match.group()  # 返回True和匹配的年份
+            return True, match.group() 
        else:
-            return False, None  # 未找到年份时返回False和None
+            return False, None  
    
    def judge_grammy(self,query):
        if 'grammy' in query or 'best' in query: