Skip to content
Snippets Groups Projects

ZEW Data Purchasing Challenge 2022

ZEW Data Purchasing Challenge 2022 - Starter Kit

Discord

This repository is the main Data Purchasing Challenge template and Starter kit. Clone the repository to compete now!

This repository contains:

  • Documentation on how to submit your models to the leaderboard
  • The procedure for best practices and information on how we evaluate your agent, etc.
  • Starter code for you to get started!

Coming Soon

  • Baselines
  • Notebook Submissions

Table of contents


🏆 About the Challenge

In this multi-label image classification task, the images consist of synthetically generated images of painted metal sheets. A classifier is meant to predict whether the sheets have production damages and if so which ones. Participants have access to a set of images, a subset of which are labeled with respect to production damages. They have to decide which of the unlabeled images labels should be labeled. Labeling is assumed to be costly. Therefore, the selection has to respect a given budget. In this sense, participants have to make a data purchasing decision.

Each of the images have a 4 dimensional label representing the presence or the absence of ['scratch_small', 'scratch_large', 'dent_small', 'dent_large'] in the images.

What's special about this challenge?

As you would have noticed the challenge name is "Data Purchasing Challenge". Wonder why? 😉

This challenge features online evaluation in which your submissions don't only train & predict online. BUT go through purchase phase as well.

What is a Purchase Phase? 🤔

This challenge has subset of the dataset which is unlabelled. During the purchase phase, your model is provided with a fixed budget. Your model can use that budget and ask images to be labelled using purchase_label function.

In this sense, participants have to make a data purchasing decision.

We hope you are as excited as we are!! 🤩

💪 Getting Started

Download Dataset

# Listing dataset files
aicrowd dataset list -c data-purchasing-challenge-2022

# Downloading all dataset files (~1G)
aicrowd dataset download -c data-purchasing-challenge-2022

# Downloading debug dataset (6MB)
aicrowd dataset download -c data-purchasing-challenge-2022 debug.tar.gz

Don't have AIcrowd CLI installed? 🥺
You can install it here or Download Datasets without CLI.

Using this repository

This repository contains submission template in which your solutions are expected.

# Clone the repository
git clone https://gitlab.aicrowd.com/zew/data-purchasing-challenge-2022-starter-kit.git
cd data-purchasing-challenge-2022-starter-kit

# Install dependencies
conda env create --file environment.yaml

# Download the dataset, and place it in `data/` folder

# Run codebase locally
python run.py

This will run all the phases (pre_training, purchase & prediction) locally and share your scores locally.

👥 Participation

The participation flow look as follows:

Quick description about all the phases:

  • Runtime Setup
    You can use environment.yaml for all your packages requirement from Conda and PyPI. In case you are advanced developer and need more freedom, checkout all the other supported runtime configurations here.
  • Pre-Train Phase
    It is your typical training phase. You need to implement pre_training_phase function and it will have access to training_dataset (instance of ZEWDPCBaseDataset). Learn more about it by referring to inline documentation here.
  • Purchase Phase
    In this phase you have access to unlabelled dataset as well, which you can probe till your budget lasts. Learn more about it by referring to inline documentation here.
  • Prediction Phase
    In this phase, you have access to a test set, and you are supposed to make predictions using your trained models. inline documentation here

🧩 Repository structure

Required files

File Description
ZEWDPCBaseRun (class in run.py) Entry point to your implementation
aicrowd.json A configuration file used to identify the challenge and resources needed for evaluation
environment.yaml List of python packages that should be installed (including pip packages) for your code to run

Other important files

File Description
evaluation_metrics.py Helps your generate score for your run locally
data/ Directory containing dataset (you don't need to upload dataset for submissions)
dataset.py Dataset wrapper implementation using which you can access dataset easily and purchase the labels during purchase phase

🚀 Submission

  • Prepare your runtime environment
  • Make submissions by pushing your code repository
  • Get scores, iterate and improve! 💪

More details for active participation in present in SUBMISSION.md

📎 Important links

✍️ Maintainers