ZEW Data Purchasing Challenge 2022 - Starter Kit
This repository is the main Data Purchasing Challenge template and Starter kit. Clone the repository to compete now!
This repository contains:
- Documentation on how to submit your models to the leaderboard
- The procedure for best practices and information on how we evaluate your agent, etc.
- Starter code for you to get started!
Coming Soon
- Baselines
Note: You can also make submissions online using the notebook present here.
Table of contents
- ZEW Data Purchasing Challenge 2022 - Starter Kit
- Table of contents
- 🏆 About the Challenge
- 💪 Getting Started
- 👥 Participation
- 🧩 Repository structure
- 🚀 Submission
- 📎 Important links
- ✍️ Maintainers
🏆 About the Challenge

In short: You have to classify images. Some images in your training set are labelled but most of them aren't. How do you decide which images to label if you have a limited budget to do so?
In more detail: You face a multi-label image classification task. The dataset consists of synthetically generated images of painted metal sheets. A classifier is meant to predict whether the sheets have production damages and if so which ones. You have access to a set of images, a subset of which are labeled with respect to production damages. Because labeling is costly and your budget is limited, you have to decide for which of the unlabeled images labels should be purchased in order to maximize prediction accuracy.
Each of the images have a 4 dimensional label representing the presence or the absence of ['scratch_small', 'scratch_large', 'dent_small', 'dent_large']
in the images.
What's special about this challenge? ⭐
As you would have noticed the challenge name is "Data Purchasing Challenge". Wonder why? 😉
This challenge features online evaluation in which your submissions don't only train & predict online. BUT go through purchase phase as well.
What is a Purchase Phase? 🤔
This challenge has subset of the dataset which is unlabelled. During the purchase phase, your model is provided with a fixed budget. Your model can use that budget and ask images to be labelled using purchase_label
function.
In this sense, participants have to make a data purchasing decision.
We hope you are as excited as we are!! 🤩
💪 Getting Started
Download Dataset
# Go to the data directory
cd data/
# Listing dataset files
aicrowd dataset list -c data-purchasing-challenge-2022
# Downloading debug dataset (6MB)
aicrowd dataset download -c data-purchasing-challenge-2022 debug.tar.gz
# Downloading all dataset files (~1G)
aicrowd dataset download -c data-purchasing-challenge-2022
Don't have AIcrowd CLI installed? 🥺
You can install it here or Download Datasets without CLI.
Dataset Distribution
A quick distribution of the dataset is as follows:

The publicly released dataset is for local experiments and validating your code base. The private dataset which is used for all the phases during evaluation is different from the publicly released one.
Using this repository
This repository contains a submission template.
# Clone the repository
git clone https://gitlab.aicrowd.com/zew/data-purchasing-challenge-2022-starter-kit.git
cd data-purchasing-challenge-2022-starter-kit
# Install dependencies
pip install -r requirements.txt
# Download the dataset, and place it in `data/` folder
# Check Download Dataset section above.
# Run codebase locally
python run.py
This runs all the phases (pre_training, purchase & prediction) locally and returns your score.
👥 Participation
The participation flow look as follows:

Quick description about all the phases:
-
Runtime Setup
You can userequirements.txt
for all your python packages requirement. In case you are advanced developer and need more freedom, checkout all the other supported runtime configurations here. -
Pre-Train Phase
It is your typical training phase. You need to implementpre_training_phase
function and it will have access totraining_dataset
(instance of ZEWDPCBaseDataset). Learn more about it by referring to inline documentation here. -
Purchase Phase
In this phase you have access to unlabelled dataset as well, which you can probe till your budget lasts. Learn more about it by referring to inline documentation here. -
Prediction Phase
In this phase, you have access to a test set, and you are supposed to make predictions using your trained models. inline documentation here
🧩 Repository structure
Required files
File | Description |
---|---|
ZEWDPCBaseRun (class in run.py ) |
Entry point to your implementation. Your code goes here |
local_evaluation.py |
Run your codebase locally on all the phases |
aicrowd.json |
A configuration file used to identify the challenge and resources needed for evaluation |
requirements.txt |
List of PyPI packages that should be installed for your code to run |
submit.sh |
Utility script to submit your codebase as submission to this challenge. |
Other important files
File | Description |
---|---|
data/ |
Directory containing dataset (you don't need to upload dataset for submissions) |
evaluator/evaluation_metrics.py |
Helps your generate score for your run locally |
evaluator/dataset.py |
Dataset wrapper implementation using which you can access dataset easily and purchase the labels during purchase phase |
🚀 Submission
![]() |
![]() |
- Prepare your runtime environment
- Make submissions by pushing your code repository
- Get scores, iterate and improve! 💪
GitLab submission
We have added a quick submission utility script as part of this starter kit, to keep things simple. You can make submission as follows:
./submit.sh <unique-submission-name>
Example: ./submit.sh "bayes v0.1"
In case you don't want to use above utility script due to different usecases, details information about it is available in SUBMISSION.md.
Notebook submission
You can also make submissions online using the notebook present here.
Evaluation hardware and timeouts
In Round 1, your code will have access to machine with 4 CPUS, 16 GB RAM, 1 NVIDIA T4 GPU and 3 hours of runtime per submission. In the Round 2 of this competition, your code will be evaluated across multiple budget-runtime constraints which will be announced later.
📎 Important links
- 💪 Challenge Page: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022
- 🗣️ Discussion Forum: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022/discussion
- 🏆 Leaderboard: https://www.aicrowd.com/challenges/data-purchasing-challenge-2022/leaderboards
- 👥 Find Teammates: https://discourse.aicrowd.com/t/looking-for-teammates-reply-here/7225/1
- 💬 Chat with other participants: https://discord.gg/ZHBDGEZabY