Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +74 -1
adapter_config.json +29 -0
adapter_model.bin +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,76 @@
 ---
-license: apache-2.0
 ---

 ---
+library_name: peft
+base_model: WizardLMTeam/WizardCoder-Python-13B-V1.0
 ---
+# Model Card for Model ID
+This is a model capable of injecting bugs into correct Python programs. It was used to inject bugs into correct programs to form the core of the MegaBugFix benchmark.
+## Model Details
+- **Developed by:** Balázs Szalontai
+- **Model type:** Decoder-only Language Model
+- **Language(s) (NLP):** None
+- **License:** Apache license 2.0
+- **Finetuned from model [optional]:** WizardLMTeam/WizardCoder-Python-13B-V1.0
+## Uses
+You may use the model in the following way:
+```python
+import os
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
+from peft import PeftModel
+model_id_pretrained = 'WizardLMTeam/WizardCoder-Python-13B-V1.0'
+model_id_finetuned  = 'szalontaib/MegaDiffInject'
+tokenizer = AutoTokenizer.from_pretrained(model_id_pretrained, add_eos_token=False)
+model = AutoModelForCausalLM.from_pretrained(model_id_pretrained, device_map='auto', dtype=torch.float16, trust_remote_code=True)
+model = PeftModel.from_pretrained(model, model_id_finetuned)
+def diff2code(diff : str) -> str:
+    return '\n'.join(
+        line[2:] for line in diff.splitlines()
+        if not line.startswith('-')
+    ).strip()
+def corrupt(program, tokenizer, model, temperature=0.5, sample_size=1):
+    prompt = f'[PYTHON]\n{program.strip()}\n[/PYTHON]\n[DIFF]\n'
+    generator = pipeline(
+        model=model,
+        tokenizer=tokenizer,
+        task="text-generation",
+        dtype=torch.float16,
+        device_map="auto",
+        temperature=temperature,
+        do_sample = (temperature>0),
+        num_return_sequences=sample_size,
+        eos_token_id=tokenizer.eos_token_id
+    )
+    outputs = generator(prompt, max_new_tokens=4096)
+    outputs = [output['generated_text'][len(prompt):] for output in outputs]
+    diffs = [output.removesuffix('\n[/DIFF]') for output in outputs]
+    corrupted_programs = [diff2code(diff) for diff in diffs]
+    return corrupted_programs
+test_code = '''
+def bitcount(n):
+    count = 0
+    while n:
+        n &= n - 1
+        count += 1
+    return count
+'''.strip()
+corrupted_programs = corrupt(test_code, tokenizer, model, temperature=0.5, sample_size=5)
+for corrupted_program in corrupted_programs:
+    print(corrupted_program)
+    print('-'*30)
+```

adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/home/bszalontai/balazs_munka/codellama/models_hf/wizard-coder-13b-python",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 1024,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 512,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "up_proj",
+    "v_proj",
+    "gate_proj",
+    "k_proj",
+    "q_proj",
+    "lm_head",
+    "down_proj"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da29e0cbbf68e9b0141f8826d39386aed16b2c184fa5873cc2ab6e18880dd0e2
+size 8087351638