MegaBugInject / README.md

szalontaib

Update README.md

2c854a3 verified about 19 hours ago

preview code

Raw

History Blame Contribute Delete

2.91 kB

metadata

library_name: peft
base_model: WizardLMTeam/WizardCoder-Python-13B-V1.0
license: apache-2.0

Model Card for Model ID

This is a model capable of injecting bugs into correct Python programs. It was used to inject bugs into correct programs to form the core of the MegaBugFix benchmark.

Model Details

Developed by: Balázs Szalontai
Model type: Decoder-only Language Model
Language(s) (NLP): None
License: Apache license 2.0
Finetuned from model [optional]: WizardLMTeam/WizardCoder-Python-13B-V1.0

Uses

You may use the model in the following way:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import re

model_id_pretrained = 'WizardLMTeam/WizardCoder-Python-13B-V1.0'
model_id_finetuned  = 'szalontaib/MegaBugInject'

tokenizer = AutoTokenizer.from_pretrained(model_id_pretrained, add_eos_token=False)
model = AutoModelForCausalLM.from_pretrained(model_id_pretrained, device_map='auto', dtype=torch.float16, trust_remote_code=True)
model = PeftModel.from_pretrained(model, model_id_finetuned)


def extract_diff(model_output):
    pattern = re.compile(r'\s*\[DIFF\](.*?)\[/DIFF\]\s*', re.DOTALL)
    matches = pattern.findall(model_output)
    if matches:
        return matches[0].strip('\n')
    return None

def diff2code(diff : str) -> str:
    return '\n'.join(
        line[2:] for line in diff.splitlines()
        if not line.startswith('-')
    ).strip()

def corrupt(program, model, tokenizer, **generation_kwargs):
    prompt = f'[PYTHON]\n{program.strip()}\n[/PYTHON]\n[DIFF]\n'
    model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
    generated_ids = model.generate(**model_inputs, **generation_kwargs)
    outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
    diffs = [extract_diff(output) for output in outputs]
    corrupted_programs = [diff2code(diff) for diff in diffs if diff is not None]
    return corrupted_programs


test_code = '''
def bitcount(n):
    count = 0
    while n:
        n &= n - 1
        count += 1
    return count
'''.strip()

corrupted_programs = corrupt(
    test_code, model, tokenizer, 
    do_sample=True, 
    temperature=0.5, 
    max_new_tokens=4096, 
    num_return_sequences=5,
)

for corrupted_program in corrupted_programs:
    print('-'*30)
    print(corrupted_program)

Citation

If you use our benchmark or bug injection model, please cite our paper.

@misc{szalontai2026diffbasedcodecorruptionusing,
      title={Diff-Based Code Corruption using LLMs for Large-Scale Bugfix Benchmarking}, 
      author={Balázs Szalontai and Ábel Szauter and Balázs Márton and Péter Verebics and Balázs Pintér and Tibor Gregorics},
      year={2026},
      eprint={2606.29088},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2606.29088}, 
}