Track 1: The 100 prompts for testing have been uploaded to the starter kit at “jailbreaking_attack_track/prompt_test.jsonl”. In the testing phase, we evaluate the submitted jailbreak prompts on two models. One morel is at Gemma-2b-it and the other model will be held out until the end of the competition. The judging model is also unreleased. However, we set a hard limit of 1024 tokens for both testing models (see the “model inference pipeline” in “jailbreaking_attack_track/jailbreak_track.ipynb”) to avoid exceeding the judging model’s input limit. The same evaluation metrics in the development phase are used for both testing models (we show the combined score for both models in the leaderboard). The tokenizer for Gemma-2b-it is used to enforce the limitation on the number of added tokens.
Track 2: The backdoor targets have been uploaded to the starter kit at “backdoor_trigger_recovery_for_model/ref/target_list_testing.json”. The ground truth triggers and the test data will not be released. The backdoored model have been uploaded to “https://huggingface.co/Zhaorun/CodeQwen1.5-7B-trojan-clas2024-test”.
Track 3: The backdoor targets have been uploaded to the starter kit at “backdoor_trigger_recovery_for_agent/dataset/target_testing.txt”. The ground truth triggers and the test data will not be released. The backdoored core model of the agent have been uploaded to “https://huggingface.co/PetterLee/Meta-Llama-3-8B-Instruct-finetuned-backdoor-100”
Please see theĀ GitHub repository for the competition starter kit, including code for loading the datasets and the models, running baselines, and creating a submission.
All participants are required to register and agree to the rules before submitting. Please complete this Google form for registration. Registration will be reviewed in 48 hours. If approved, a submission code will be returned. Please keep the code secret.
Each team can make 15 submissions per track during the entire development phase and 5 submissions per track during the entire testing phase. Please follow the instructions in the starter kit to submit your work here.