Woyoung21 commited on
Commit
e8bf49f
·
verified ·
1 Parent(s): a8e174f

Update Readme

Browse files
Files changed (1) hide show
  1. README.md +56 -1
README.md CHANGED
@@ -11,7 +11,62 @@ base_model:
11
 
12
  # qwen-honeypot-GGUF - GGUF
13
 
14
- This model was finetuned and converted to GGUF format using [Unsloth](https://github.com/unslothai/unsloth).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  **Example usage**:
17
  - For text only LLMs: **llama-cli** **--hf** repo_id/model_name **-p** "why is the sky blue?"
 
11
 
12
  # qwen-honeypot-GGUF - GGUF
13
 
14
+ This repository contains a fine-tuned Qwen2.5-Coder-1.5B-Instruct model converted to GGUF format
15
+ for fast inference via llama.cpp, Ollama, or cloud hosting platforms such as Anyscale or RunPod.
16
+
17
+ The model is designed to simulate a realistic Linux terminal for use in cybersecurity deception
18
+ systems and dynamic honeypots. It produces raw shell output only—no explanations, refusals, or commentary.
19
+
20
+ # Purpose
21
+
22
+ This model powers a dynamic honeypot that forwards attacker-issued shell commands to an LLM, which responds with realistic terminal output. The intent is to:
23
+
24
+ Engage attackers for longer,
25
+ Provide threat intelligence,
26
+ Waste adversary time on a convincingly simulated environment,
27
+ Prevent interaction with real systems,
28
+ The model exposes believable filesystem structures, logs, credentials, keys, cloud metadata, and exploitable artifacts commonly encountered on real Linux servers.
29
+
30
+ # Model Details
31
+ Base Model
32
+
33
+ -Qwen2.5-Coder-1.5B-Instruct
34
+ -Fine-tuned using Unsloth LoRA
35
+ -Converted to GGUF (Q4_K_M) for efficient inference
36
+
37
+
38
+ # Training Data
39
+
40
+ A combination of:
41
+
42
+ # Custom Honeypot Dataset (~100 examples)
43
+
44
+ Includes realistic outputs for:
45
+
46
+ Filesystem enumeration
47
+ Logs, SSH keys, sensitive documents
48
+ Cloud metadata
49
+ Misconfigurations and leftover artifacts
50
+ Credential scraping
51
+ Persistence attempts
52
+ Reverse shells and malware staging
53
+ Web server enumeration
54
+ System information commands
55
+ MySQL interactions
56
+ Log tampering
57
+
58
+ # Public Dataset
59
+
60
+ mrheinen/linux-commands (cleaned to remove multi-command sequences)
61
+
62
+ # Training Summary
63
+ Metric Value
64
+ Epochs 8
65
+ Total steps 472
66
+ Trainable parameters 18.4M (LoRA)
67
+ Final average train loss ~0.55–0.65
68
+ Eval loss ~0.72
69
+ Training time (Colab T4) ~25 minutes
70
 
71
  **Example usage**:
72
  - For text only LLMs: **llama-cli** **--hf** repo_id/model_name **-p** "why is the sky blue?"