README.md



🛒 Amazon KDD CUP 2024: Multi-Task Online Shopping Challenge for LLMs Starter Kit
This repository is the Amazon KDD Cup 2024 Submission template and Starter kit! Clone the repository to compete now!
This repository contains:


Documentation on how to submit your models to the leaderboard

The procedure for best practices and information on how we evaluate your model, etc.

Starter code for you to get started!


Table of Contents

Competition Overview
Dataset
Tasks
Evaluation Metrics

Getting Started

How to write your own model?

How to start participating?

Setup
How to make a submission?
What hardware does my code run on?
How are my model responses parsed by the evaluators?


Frequently Asked Questions
Important Links


📖 Competition Overview
Online shopping is complex, involving various tasks from browsing to purchasing, all requiring insights into customer behavior and intentions. This necessitates multi-task learning models that can leverage shared knowledge across tasks. Yet, many current models are task-specific, increasing development costs and limiting effectiveness. Large language models (LLMs) have the potential to change this by handling multiple tasks through a single model with minor prompt adjustments. Furthermore, LLMs can also improve customer experiences by providing interactive and timely recommendations. However, online shopping, as a highly specified domain, features a wide range of domain-specific concepts (e.g. brands, product lines) and knowledge (e.g. which brand produces which products), making it challenging to adapt existing powerful LLMs from general domains to online shopping.
Motivated by the potentials and challenges of LLMs, we present ShopBench, a massive challenge for online shopping, with 57 tasks and ~20000 questions, derived from real-world Amazon shopping data. All questions in this challenge are re-formulated to a unified text-to-text generation format to accommodate the exploration of LLM-based solutions. ShopBench focuses on four main key shopping skills (which will serve as Tracks 1-4):

shopping concept understanding
shopping knowledge reasoning
user behavior alignment
multi-lingual abilities

In addition, we set up Track 5: All-around to encourage even more versatile and all-around solutions. Track 5 requires participants to solve all questions in Tracks 1-4 with a single solution, which is expected to be more principled and unified than track-specific solutions to Tracks 1-4. We will correspondingly assign larger awards to Track 5.

📊 Dataset
ShopBench used in this challenge is an anonymized, multi-task dataset sampled from real-world Amazon shopping data. Statistics of ShopBench is given in the following Table.


# Tasks
# Questions
# Products
# Product Category
# Attributes
# Reviews
# Queries


57
20598
~13300
400
1032
~11200
~4500


ShopBench is split into a few-shot development set and a test set to better mimic real-world applications --- where you never know the customer's questions beforehand. With this setting, we encourage participants to use any resource that is publicly available (e.g. pre-trained models, text datasets) to construct their solutions, instead of overfitting the given development data (e.g. generating pseudo data samples with GPT).
The development datasets will be given in json format with the following fields.


input_field: This field contains the instructions and the question that should be answered by the model.

output_field: This field contains the ground truth answer to the question.

task_type: This field contains the type of the task (Details in the next Section, "Tasks")

task_name: This field contains the name of the task. However, the exact task names are redacted, and we only provide participants with hashed task names (e.g. task1, task2).

metric: This field contains the metric used to evaluate the question (Details in Section "Evaluation Metrics").

track: This field specifies the track the question comes from.

However, the test dataset (which will be hidden from participants) will have a different format with only two fields:


input_field, which is the same as above.

is_multiple_choice: This field contains a True or False that indicates whether the question is a multiple choice or not. The detailed 'task_type' will not be given to participants.