Skip to content
Snippets Groups Projects
mohanty's avatar
mohanty authored
c4b802ed
History

Banner image

Sound Demixing Challenge 2023 - Music Demixing Track - Starter Kit

Discord

This repository is the Sound Demixing Challenge 2023 - Music Demixing Track Starter kit! It contains:

  • Documentation on how to submit your models to the leaderboard
  • The procedure for best practices and information on how we evaluate your agent, etc.
  • Starter code for you to get started!

Quick Links:

📝 Table of Contents

  1. About the Sound Demixing Challenge 2023
  2. Evaluation
  3. Baselines
  4. How to test and debug locally
  5. How to submit
  6. Dataset
  7. Setting up your codebase
  8. FAQs

🎶 About the Sound Demixing Challenge 2023

Have you ever sung using a karaoke machine or made a DJ music mix of your favourite song? Have you wondered how hearing aids help people listen more clearly or how video conference software reduces background noise?

They all use the magic of audio separation.

Music source separation (MSS) attracts professional music creators as it enables remixing and revising songs in a way traditional equalisers don't. Suppressed vocals in songs can improve your karaoke night and provide a richer audio experience than conventional applications.

The Sound Demixing Challenge 2023 (SDX23) is an opportunity for researchers and machine learning enthusiasts to test their skills by creating a system to perform audio source separation.

Given an audio signal as input (referred to as a "mixture"), you must decompose in its different parts.

separation image

🎻 About the Music Demixing Track

This task will focus on music source separation. Participants will submit systems that separate a song into four instruments: vocals, bass, drums, and other (the instrument "other" contains signals of all instruments other than the first three, e.g., guitar or piano).

Karaoke systems can benefit from the audio source separation technology as users can sing over any original song, where the vocals have been suppressed, instead of picking from a set of "cover" songs specifically produced for karaoke.

Similar to Music Demixing Challenge 2021, this task will have two leaderboards.

Leaderboard A (MUSDB18)

Participants in Leaderboard A will be allowed to train their system exclusively on the training part of the MUSDB18-HQ dataset. This dataset has become the standard in literature as it is free to use and allows anyone to start training source separation models.

The label swaps are included in the dataset for this leaderboard.

Leaderboard B (No bars held)

Leaderboard B will allow bleeding/mixtures in training data. You can train on any data that you like. For both the leaderboards, the winning teams will be required to publish their training code (to receive a prize) as it is about the training method.

🚨 NOTE

To participate in Leaderboard B, you need to set "external_dataset_used": true in the aicrowd.json file.

✅ Evaluation

As an evaluation metric, we are using the signal-to-distortion ratio (SDR), which is defined as,

SDRinstr=10log10n(sinstr,left channel(n))2+n(sinstr,right channel(n))2n(sinstr,left channel(n)s^instr,left channel(n))2+n(sinstr,right channel(n)s^instr,right channel(n))2SDR_{instr} = 10log_{10}\frac{\sum_n(s_{instr,left\ channel}(n))^2 + \sum_n(s_{instr,right\ channel}(n))^2}{\sum_n(s_{instr,left\ channel}(n) - \hat{s}_{instr,left\ channel}(n))^2 + \sum_n(s_{instr,right\ channel}(n) - \hat{s}_{instr,right\ channel}(n))^2}

where

Sinstr(n)S_{instr}(n)
is the waveform of the ground truth and Ŝ𝑖𝑛𝑠𝑡𝑟(𝑛) denotes the waveform of the estimate. The higher the SDR score, the better the output of the system is.

In order to rank systems, we will use the average SDR computed by

SDRsong=14(SDRbass+SDRdrums+SDRvocals+SDRother)SDR_{song} = \frac{1}{4}(SDR_{bass} + SDR_{drums} + SDR_{vocals} + SDR_{other})

for each song. Finally, the overall score is obtained by averaging SDRsong over all songs in the hidden test set.

🤖 Baselines

We use the Open-Unmix library for the baseline. Specifically, we provide trained checkpoints for the UMXL model. You can use the baseline by switching to the openunmix-baseline branch on this repository. To test the models locally, you need to install git-lfs.

When submitting your own models, you need to submit the checkpoints using git-lfs. Check the instructions shared in the inference file here

💻 How to Test and Debug Locally

The best way to test your models is to run your submission locally.

You can do this by simply running python evaluate_locally.py. Note that your local setup and the server evalution runtime may vary. Make sure you mention setup your runtime according to the section: How do I specify my dependencies?

🚀 How to Submit

You can use the submission script source submit.sh <submission_text>

More information on submissions can be found in SUBMISSION.md.

A high level description of the Challenge Procedure:

  1. Sign up to join the competition on the AIcrowd website.
  2. Clone this repo and start developing your solution.
  3. Train your models on IGLU, and ensure run.sh will generate rollouts.
  4. Submit your trained models to AIcrowd Gitlab for evaluation (full instructions below). The automated evaluation setup will evaluate the submissions against the IGLU Gridworld environment for a fixed number of rollouts to compute and report the metrics on the leaderboard of the competition.

💽 Dataset

Download the public dataset for this task using this link, you'll need to accept the rules of the competition to access the data. The data is same as the well known MUSDB18-HQ dataset and its compressed version.

📑 Setting Up Your Codebase

AIcrowd provides great flexibility in the details of your submission!
Find the answers to FAQs about submission structure below, followed by the guide for setting up this starter kit and linking it to the AIcrowd GitLab.

FAQs

How do I submit a model?

In short, you should push you code to the AIcrowd's gitlab with a specific git tag and the evaluation will be triggered automatically. More information on submissions can be found at our submission.md.

How do I specify my dependencies?

We accept submissions with custom runtimes, so you can choose your favorite! The configuration files typically include requirements.txt (pypi packages), apt.txt (apt packages) or even your own Dockerfile.

You can check detailed information about this in runtime.md.

What should my code structure look like?

Please follow the example structure as it is in the starter kit for the code structure. The different files and directories have following meaning:

.
├── aicrowd.json                # Add any descriptions about your model, and set `external_dataset_used`
├── apt.txt                     # Linux packages to be installed inside docker image
├── requirements.txt            # Python packages to be installed
├── evaluate_locally.py         # Use this to check your model evaluation flow locally
└── my_submission               # Place your models and related code here
    ├── <Your model files>      # Add any models here for easy organization
    ├── aicrowd_wrapper.py      # Keep this file unchanged
    └── user_config.py          # IMPORTANT: Add your model name here

How can I get going with a completely new model?

Train your model as you like, and when you’re ready to submit, implement the inference class and import it to my_submission/user_config.py. Refer to my_submission/README.md for a detailed explanation.

Once you are ready, test your implementation python evaluate_locally.py

How do I actually make a submission?

You can use the submission script source submit.sh <submission_text>

The submission is made by adding everything including the model to git, tagging the submission with a git tag that starts with submission-, and pushing to AIcrowd's GitLab. The rest is done for you!

For large model weight files, you'll need to use git-lfs

More details are available at docs/submission.md.

When you make a submission browse to the issues page on your repository, a sucessful submission should look like this.

submission image

Are there any hardware or time constraints?

Your submission will need to complete predictions on all the sound tracks under 120 minutes. Make sure you take advantage of all the cores by parallelizing your code if needed. Incomplete submissions will fail.

The machine where the submission will run will have following specifications:

  • 4 vCPUs
  • 16GB RAM

📎 Important links

You may also like the new Cinematic Sound Demixing track

Best of Luck 🎉 🎉