--- library_name: peft base_model: WizardLMTeam/WizardCoder-Python-13B-V1.0 --- # Model Card for Model ID This is a model capable of injecting bugs into correct Python programs. It was used to inject bugs into correct programs to form the core of the MegaBugFix benchmark. ## Model Details - **Developed by:** Balázs Szalontai - **Model type:** Decoder-only Language Model - **Language(s) (NLP):** None - **License:** Apache license 2.0 - **Finetuned from model [optional]:** WizardLMTeam/WizardCoder-Python-13B-V1.0 ## Uses You may use the model in the following way: ```python import os import torch from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline from peft import PeftModel model_id_pretrained = 'WizardLMTeam/WizardCoder-Python-13B-V1.0' model_id_finetuned = 'szalontaib/MegaDiffInject' tokenizer = AutoTokenizer.from_pretrained(model_id_pretrained, add_eos_token=False) model = AutoModelForCausalLM.from_pretrained(model_id_pretrained, device_map='auto', dtype=torch.float16, trust_remote_code=True) model = PeftModel.from_pretrained(model, model_id_finetuned) def diff2code(diff : str) -> str: return '\n'.join( line[2:] for line in diff.splitlines() if not line.startswith('-') ).strip() def corrupt(program, tokenizer, model, temperature=0.5, sample_size=1): prompt = f'[PYTHON]\n{program.strip()}\n[/PYTHON]\n[DIFF]\n' generator = pipeline( model=model, tokenizer=tokenizer, task="text-generation", dtype=torch.float16, device_map="auto", temperature=temperature, do_sample = (temperature>0), num_return_sequences=sample_size, eos_token_id=tokenizer.eos_token_id ) outputs = generator(prompt, max_new_tokens=4096) outputs = [output['generated_text'][len(prompt):] for output in outputs] diffs = [output.removesuffix('\n[/DIFF]') for output in outputs] corrupted_programs = [diff2code(diff) for diff in diffs] return corrupted_programs test_code = ''' def bitcount(n): count = 0 while n: n &= n - 1 count += 1 return count '''.strip() corrupted_programs = corrupt(test_code, tokenizer, model, temperature=0.5, sample_size=5) for corrupted_program in corrupted_programs: print(corrupted_program) print('-'*30) ```