diff --git a/README.md b/README.md index f3b36f6991b44929d99f8942fca66084561244fc..2d0b3ff170b4777918486855f23da79cc9e54993 100644 --- a/README.md +++ b/README.md @@ -80,6 +80,9 @@ As all tasks are converted into text generation tasks, rule-based parsers will p Since all these metrics range from [0, 1], we calculate the average metric for all tasks within each track (macro-averaged) to determine the overall score for a track and identify track winners. The overall score of Track 5 will be calculated by averaging scores in Tracks 1-4. +Please refer to [local_evaluation.py](local_evaluation.py) for more details on how we will evaluate your submissions. + + # ðŸ—ƒï¸ Submission The challenge would be evaluated as a code competition. Participants must submit their code and essential resources, such as fine-tuned model weights and indices for Retrieval-Augmented Generation (RAG), which will be run on our servers to generate results and then for evaluation. @@ -225,10 +228,6 @@ Please follow the instructions in [docs/submission.md](ocs/submission.md) to mak **Note**: **Remember to accept the Challenge Rules** on the challenge page, and task page before making your first submission. -## Evaluation Metrics & Local Evaluation -Please refer to [local_evaluation.py](local_evaluation.py) for more details on how we will evaluate your submissions. - - ## Hardware and System Configuration We apply a limit on the hardware available to each participant to run their solutions. Specifically,