ZEW Data Purchasing Challenge 2022 - Starter Kit
This repository is the main Data Purchasing Challenge template and Starter kit. Clone the repository to compete now!
This repository contains:
- Documentation on how to submit your models to the leaderboard
- The procedure for best practices and information on how we evaluate your agent, etc.
- Starter code for you to get started!
Coming Soon
- Baselines
- Notebook Submissions
Table of contents
- 🏆 About the Challenge
- 💪 Getting Started
- 👥 Participation
- 🧩 Repository Structure
- 🚀 Submission
- 📎 Important Links
🏆 About the Challenge

In this multi-label image classification task, the images consist of synthetically generated images of painted metal sheets. A classifier is meant to predict whether the sheets have production damages and if so which ones. Participants have access to a set of images, a subset of which are labeled with respect to production damages. They have to decide which of the unlabeled images labels should be labeled. Labeling is assumed to be costly. Therefore, the selection has to respect a given budget. In this sense, participants have to make a data purchasing decision.
Each of the images have a 4 dimensional label representing the presence or the absence of ['scratch_small', 'scratch_large', 'dent_small', 'dent_large']
in the images.
What's special about this challenge? ⭐
As you would have noticed the challenge name is "Data Purchasing Challenge". Wonder why? 😉
This challenge features online evaluation in which your submissions don't only train & predict online. BUT go through purchase phase as well.
What is a Purchase Phase? 🤔
This challenge has subset of the dataset which is unlabelled. During the purchase phase, your model is provided with a fixed budget. Your model can use that budget and ask images to be labelled using purchase_label
function.
In this sense, participants have to make a data purchasing decision.
We hope you are as excited as we are!! 🤩
💪 Getting Started
Download Dataset
# Listing dataset files
aicrowd dataset list -c data-purchasing-challenge-2022
# Downloading all dataset files (~1G)
aicrowd dataset download -c data-purchasing-challenge-2022
# Downloading debug dataset (6MB)
aicrowd dataset download -c data-purchasing-challenge-2022 debug.tar.gz
Don't have AIcrowd CLI installed? 🥺
You can install it here or Download Datasets without CLI.
Using this repository
This repository contains submission template in which your solutions are expected.
# Clone the repository
git clone https://gitlab.aicrowd.com/zew/data-purchasing-challenge-2022-starter-kit.git
cd data-purchasing-challenge-2022-starter-kit
# Install dependencies
conda env create --file environment.yaml
# Download the dataset, and place it in `data/` folder
# Run codebase locally
python run.py
This will run all the phases (pre_training, purchase & prediction) locally and share your scores locally.
👥 Participation
The participation flow look as follows:

Quick description about all the phases:
-
Runtime Setup
You can useenvironment.yaml
for all your packages requirement from Conda and PyPI. In case you are advanced developer and need more freedom, checkout all the other supported runtime configurations here. -
Pre-Train Phase
It is your typical training phase. You need to implementpre_training_phase
function and it will have access totraining_dataset
(instance of ZEWDPCBaseDataset). Learn more about it by referring to inline documentation here. -
Purchase Phase
In this phase you have access to unlabelled dataset as well, which you can probe till your budget lasts. Learn more about it by referring to inline documentation here. -
Prediction Phase
In this phase, you have access to a test set, and you are supposed to make predictions using your trained models. inline documentation here
🧩 Repository structure
Required files
File | Description |
---|---|
ZEWDPCBaseRun (class in run.py ) |
Entry point to your implementation |
aicrowd.json |
A configuration file used to identify the challenge and resources needed for evaluation |
environment.yaml |
List of python packages that should be installed (including pip packages) for your code to run |
Other important files
File | Description |
---|---|
evaluation_metrics.py |
Helps your generate score for your run locally |
data/ |
Directory containing dataset (you don't need to upload dataset for submissions) |
dataset.py |
Dataset wrapper implementation using which you can access dataset easily and purchase the labels during purchase phase |
🚀 Submission
- Prepare your runtime environment
- Make submissions by pushing your code repository
- Get scores, iterate and improve! 💪
More details for active participation in present in SUBMISSION.md
📎 Important links
- 💪 Challenge Page: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022
- 🗣️ Discussion Forum: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022/discussion
- 🏆 Leaderboard: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022/leaderboards
- 👥 Find Teammates: https://discourse.aicrowd.com/t/looking-for-teammates-reply-here/7225/1
- 💬 Chat with other participants: https://discord.gg/ZHBDGEZabY