Sound Demixing Challenge 2023 - Cinematic Sound Separation - Starter Kit
This repository is the Sound Demixing Challenge 2023 - Cinematic Sound Separation Starter kit! It contains:
- Documentation on how to submit your models to the leaderboard
- The procedure for best practices and information on how we evaluate your agent, etc.
- Starter code for you to get started!
Quick Links:
- Sound Demixing Challenge 2023 - Cinematic Sound Separation Track - Competition Page
- Discussion Forum
- Sound Demixing 2023 Challenge Overview
📝 Table of Contents
- About the Sound Demixing Challenge 2023
- Evaluation
- Baselines
- How to test and debug locally
- How to submit
- Setting up your codebase
- FAQs
🎶 About the Sound Demixing Challenge 2023
Have you ever sung using a karaoke machine or made a DJ music mix of your favourite song? Have you wondered how hearing aids help people listen more clearly or how video conference software reduces background noise?
They all use the magic of audio separation.
Music source separation (MSS) attracts professional music creators as it enables remixing and revising songs in a way traditional equalisers don't. Suppressed vocals in songs can improve your karaoke night and provide a richer audio experience than conventional applications.
The Sound Demixing Challenge 2023 (SDX23) is an opportunity for researchers and machine learning enthusiasts to test their skills by creating a system to perform audio source separation.
Given an audio signal as input (referred to as a "mixture"), you must decompose in its different parts.
🎻 Cinematic Sound Separation
Cinematic sound separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio and user interfaces for flexible listening.
Cinematic sound separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio and user interfaces for flexible listening.
Leaderboard A
Systems that are trained only on the training (tr
) and validation (cv
) part of DnR [1] are eligible for Leaderboards A.
"external_dataset_used": false
in the aicrowd.json
file.
🚨 NOTE: To participate in Leaderboard A, you need to set Leaderboard B
Systems that are trained on any other data (e.g., also using the test part tt
of the DnR dataset, data from the internet, etc) are eligible for Leaderboard B.
📁 Datasets
Cinematic source separation is the task of separating movie audio into the three tracks “dialogue”, “sound effects” and “music”. It has many applications ranging from language dubbing to upmixing of old movies to spatial audio.
For the training of the system, participants can use either the training data of the “Divide-and-Remaster” (DnR) dataset [1] (Leaderboard A) or any data that they have at their disposal (Leaderboard B). The DnR dataset consists of 3,406 mixtures (∼ 57 h) for the training set, 487 mixtures (∼ 8 h) for the validation set, and 973 mixtures ( ∼16 h) for the test set, along with their isolated ground-truth stems.
For the evaluation and ranking of the submissions, we use a newly created hidden dataset of real audio from 11 Sony Picture Entertainment movies. The data is stereo and sampled at 44.1 kHz.
You can download the DNR Dataset here
✅ Evaluation
As an evaluation metric, we are using the signal-to-distortion ratio (SDR), which is defined as,
SDR_{instr} = 10log_{10}\frac{\sum_n(s_{instr,left\ channel}(n))^2 + \sum_n(s_{instr,right\ channel}(n))^2}{\sum_n(s_{instr,left\ channel}(n) - \hat{s}_{instr,left\ channel}(n))^2 + \sum_n(s_{instr,right\ channel}(n) - \hat{s}_{instr,right\ channel}(n))^2}
where S_{instr}(n) is the waveform of the ground truth and Ŝ𝑖𝑛𝑠𝑡𝑟(𝑛) denotes the waveform of the estimate. The higher the SDR score, the better the output of the system is.
In order to rank systems, we will use the average SDR computed by
SDR_{song} = \frac{1}{4}(SDR_{bass} + SDR_{drums} + SDR_{vocals} + SDR_{other})
for each song. Finally, the overall score is obtained by averaging SDRsong over all songs in the hidden test set.
🤖 Baselines
Baselines will be released soon, check the forums for updates.
💻 How to Test and Debug Locally
The best way to test your models is to run your submission locally.
You can do this by simply running python evaluate_locally.py
. Note that your local setup and the server evalution runtime may vary. Make sure you mention setup your runtime according to the section: How do I specify my dependencies?
🚀 How to Submit
You can use the submission script source submit.sh <submission_text>
More information on submissions can be found in SUBMISSION.md.
A high level description of the Challenge Procedure:
- Sign up to join the competition on the AIcrowd website.
- Clone this repo and start developing your solution.
- Train your models on IGLU, and ensure run.sh will generate rollouts.
- Submit your trained models to AIcrowd Gitlab for evaluation (full instructions below). The automated evaluation setup will evaluate the submissions against the IGLU Gridworld environment for a fixed number of rollouts to compute and report the metrics on the leaderboard of the competition.
📑 Setting Up Your Codebase
AIcrowd provides great flexibility in the details of your submission!
Find the answers to FAQs about submission structure below, followed by
the guide for setting up this starter kit and linking it to the AIcrowd
GitLab.
FAQs
How do I submit a model?
In short, you should push you code to the AIcrowd's gitlab with a specific git tag and the evaluation will be triggered automatically. More information on submissions can be found at our submission.md.
How do I specify my dependencies?
We accept submissions with custom runtimes, so you can choose your
favorite! The configuration files typically include requirements.txt
(pypi packages), apt.txt
(apt packages) or even your own Dockerfile
.
You can check detailed information about this in runtime.md.
What should my code structure look like?
Please follow the example structure as it is in the starter kit for the code structure. The different files and directories have following meaning:
.
├── aicrowd.json # Add any descriptions about your model, set `external_dataset_used`, and gpu flag
├── apt.txt # Linux packages to be installed inside docker image
├── requirements.txt # Python packages to be installed
├── evaluate_locally.py # Use this to check your model evaluation flow locally
└── my_submission # Place your models and related code here
├── <Your model files> # Add any models here for easy organization
├── aicrowd_wrapper.py # Keep this file unchanged
└── user_config.py # IMPORTANT: Add your model name here
How can I get going with a completely new model?
Train your model as you like, and when you’re ready to submit, implement the inference class and import it to my_submission/user_config.py
. Refer to my_submission/README.md
for a detailed explanation.
Once you are ready, test your implementation python evaluate_locally.py
How do I actually make a submission?
You can use the submission script source submit.sh <submission_text>
The submission is made by adding everything including the model to git,
tagging the submission with a git tag that starts with submission-
, and
pushing to AIcrowd's GitLab. The rest is done for you!
For large model weight files, you'll need to use git-lfs
More details are available at docs/submission.md.
When you make a submission browse to the issues
page on your repository, a sucessful submission should look like this.
How to use GPU?
To use GPU in your submissions, set the gpu flag in aicrowd.json
.
"gpu": true,
Are there any hardware or time constraints?
Your submission will need to complete predictions on each sound tracks under 1.57 x duration of the track. Make sure you take advantage of all the cores by parallelizing your code if needed. Incomplete submissions will fail.
The machine where the submission will run will have following specifications:
- 4 vCPUs
- 16GB RAM
- (Optional) 1 NVIDIA T4 GPU with 16 GB VRAM - This needs setting
"gpu": true
inaicrowd.json
📎 Important links
You may also like the new Music Separation track
Best of Luck 🎉 🎉