- Guide to Making Your First Submission
- Table of Contents
- Specifying Software Runtime and Dependencies
- Code Structure Guidelines
- Submitting to Different Tracks
- Submission Entry Point
- Setting Up SSH Keys
- Managing Large Model Files with Git LFS
- Why Use Git LFS?
- Steps to Use Git LFS
- Handling Previously Committed Large Files
- How to Submit Your Code
Guide to Making Your First Submission
This document is designed to assist you in making your initial submission smoothly. Below, you'll find step-by-step instructions on specifying your software runtime and dependencies, structuring your code, and finally, submitting your project. Follow these guidelines to ensure a smooth submission process.
Table of Contents
- Specifying Software Runtime and Dependencies
- Code Structure Guidelines
- Submitting to Different Tracks
- Submission Entry Point
- Setting Up SSH Keys
- Managing Large Model Files with Git LFS
- How to Submit Your Code
Specifying Software Runtime and Dependencies
Our platform supports custom runtime environments. This means you have the flexibility to choose any libraries or frameworks necessary for your project. Here’s how you can specify your runtime and dependencies:
-
requirements.txt
: List any PyPI packages your project needs. Do specify versions, as we observe significant difference in inference time between differenttransformer
versions. -
apt.txt
: Include any apt packages required. -
Dockerfile
: The one located at the root will be used by default to build your submission. You can specify the python version here if you need specific ones.
For detailed setup instructions regarding runtime dependencies, refer to the documentation in the docs/runtime.md
file.
Code Structure Guidelines
Your project should follow the structure outlined in the starter kit. Here’s a brief overview of what each component represents:
.
├── .dockerignore # Please specify the paths to your model checkpoints so that the large files won't be built into the docker image.
├── README.md # Project documentation and setup instructions
├── aicrowd.json # Submission meta information - like your username, track name
├── data
│ └── development.json # Development dataset local testing
├── docs
│ └── runtime.md # Documentation on the runtime environment setup, dependency configs
├── Dockerfile # The Dockerfile that will be used to build your submission and all dependencies. The default one will work fine, but you can write your own.
├── docker_run.sh # This script builds your submission locally and calls `local_evaluation.py`. It can be used to debug (if your submission fails to build).
├── local_evaluation.py # Use this to check your model evaluation flow locally
├── metrics.py # Scripts to calculate evaluation metrics for your model's performance
├── models
│ ├── README.md # Documentation specific to the implementation of model interfaces
│ ├── base_model.py # Base model class
│ ├── dummy_model.py # A simple or placeholder model for demonstration or testing. We also implement a simple Vicuna-7B baseline here.
│ └── user_config.py # IMPORTANT: Configuration file to specify your model
├── parsers.py # Model output parser
├── requirements.txt # Python packages to be installed for model development
├── requirements_eval.txt # Additional Python packages to be installed for local evaluation
└── utilities
└── _Dockerfile # Example Dockerfile for specifying runtime via Docker
Remember, your submission metadata JSON (aicrowd.json
) is crucial for mapping your submission to the challenge. Ensure it contains the correct challenge_id
, authors
, and other necessary information. To utilize GPUs, set the "gpu": true
flag in your aicrowd.json
.
Submitting to Different Tracks
Specify the track by setting the appropriate challenge_id
in your aicrowd.json. Here are the challenge IDs for various tracks:
Track Name | Challenge ID |
---|---|
Understanding Shopping Concepts | amazon-kdd-cup-24-understanding-shopping-concepts |
Shopping Knowledge Reasoning | amazon-kdd-cup-24-shopping-knowledge-reasoning |
User Behavior Alignment | amazon-kdd-cup-24-user-behavior-alignment |
Multi-Lingual Abilities | amazon-kdd-cup-24-multi-lingual-abilities |
All-Around | amazon-kdd-cup-24-all-around |
Submission Entry Point
The evaluation process will instantiate a model from models/user_config.py
for evaluation. Ensure this configuration is set correctly.
Setting Up SSH Keys
You will have to add your SSH Keys to your GitLab account by going to your profile settings here. If you do not have SSH Keys, you will first need to generate one.
Managing Large Model Files with Git LFS
When preparing your submission, it's crucial to ensure all necessary models and files required by your inference code are properly saved and included. Due to the potentially large size of model weight files, we highly recommend using Git Large File Storage (Git LFS) to manage these files efficiently.
Why Use Git LFS?
Git LFS is designed to handle large files more effectively than Git's default handling of large files. This ensures smoother operations and avoids common errors associated with large files, such as:
fatal: the remote end hung up unexpectedly
remote: fatal: pack exceeds maximum allowed size
These errors typically occur when large files are directly checked into the Git repository without Git LFS, leading to challenges in handling and transferring those files.
Steps to Use Git LFS
-
Install Git LFS: If you haven't already, install Git LFS on your machine. Detailed instructions can be found here.
-
Track Large Files: Use Git LFS to track the large files within your project. You can do this by running
git lfs track "*.model"
(replace*.model
with your file type). -
Add and Commit: After tracking the large files with Git LFS, add and commit them as you would with any other file. Git LFS will automatically handle these files differently to optimize their storage and transfer.
-
Push to Repository: When you push your changes to the repository, Git LFS will manage the large files, ensuring a smooth push process.
Handling Previously Committed Large Files
If you have already committed large files directly to your Git repository without using Git LFS, you may encounter issues. These files, even if not present in the current working directory, could still be in the Git history, leading to errors.
To resolve this, ensure that the large files are removed from the Git history and then re-add and commit them using Git LFS. This process cleans up the repository's history and avoids the aforementioned errors.
For more information on how to upload large files to your submission and detailed guidance on using Git LFS, please refer to this detailed guide.
Note: Properly managing large files not only facilitates smoother operations for you but also ensures that the evaluation process can proceed without hindrances.
How to Submit Your Code
To submit your code, push a tag beginning with "submission-" to your repository on GitLab. Follow these steps to make a submission:
Assuming, you have cloned the repo already by following the instructions here and made your changes.
- Commit your changes with
git commit -am "Your commit message"
. - Tag your submission (e.g.,
git tag -am "submission-v0.1" submission-v0.1
). - Push your changes and tags to the AIcrowd repository (e.g.
git push origin submission-v0.1
)
After pushing your tag, you can view your submission details at https://gitlab.aicrowd.com/<YOUR-AICROWD-USER-NAME>/amazon-kdd-cup-2024-starter-kit/issues
. It may take about 30 minutes for each submission to build and begin evaluation, so please be patient.
Ensure your aicrowd.json
is correctly filled with the necessary metadata, and you've replaced <YOUR-AICROWD-USER-NAME>
with your GitLab username in the provided URL.