Instructions to use segolilylabs/Lily-Cybersecurity-7B-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use segolilylabs/Lily-Cybersecurity-7B-v0.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="segolilylabs/Lily-Cybersecurity-7B-v0.2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("segolilylabs/Lily-Cybersecurity-7B-v0.2") model = AutoModelForCausalLM.from_pretrained("segolilylabs/Lily-Cybersecurity-7B-v0.2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use segolilylabs/Lily-Cybersecurity-7B-v0.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "segolilylabs/Lily-Cybersecurity-7B-v0.2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "segolilylabs/Lily-Cybersecurity-7B-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/segolilylabs/Lily-Cybersecurity-7B-v0.2
- SGLang
How to use segolilylabs/Lily-Cybersecurity-7B-v0.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "segolilylabs/Lily-Cybersecurity-7B-v0.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "segolilylabs/Lily-Cybersecurity-7B-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "segolilylabs/Lily-Cybersecurity-7B-v0.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "segolilylabs/Lily-Cybersecurity-7B-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use segolilylabs/Lily-Cybersecurity-7B-v0.2 with Docker Model Runner:
docker model run hf.co/segolilylabs/Lily-Cybersecurity-7B-v0.2
Is the dataset available?
Excellent work! I am working to build one myself for our internal security team (Offensive, defensive, and compliance). However, I have yet to find a decent dataset to build from. Do you mind sharing yours? I thought of building one myself by feeding text documents into Mistral and outputting input/output pairs, but a head start on a dataset would be appreciated :)
Hello, good job. I have the same request. Thank you so much.
Hi there! 🤗
Great work on the dataset! Could you share insights into how the data pairs were collected? Also, any plans to release the dataset publicly? I'm currently working on building a cybersecurity chatbot similar to Lily and would find this data incredibly useful. Thanks!
Thanks for the comments. I am working on cleaning this dataset so I can release it. I am also in the process of creating a new model and dataset that uses about 3 million pairs.
Great work! I am also interested in the dataset. Thanks
Really nice work!!! Is the dataset available? Thank you.
Hi @unshadow ,
I appreciate your work on creating the fine-tuned LLM for cyber security. Can you please let me know the size of the dataset used for fine-tuning? Also, is the dataset available, and when are you planning to release it? I appreciate your time and response. Thank you!
Hey There, I'm new to this and i have ben assigned to make a model like this. I downloaded the Lexi Llama 3 uncensored to test if it could help. It did. But How can i fine tune it on free google collab? Which model shall i use? llama3 direct, but it is censored or any other model. Also, it'd very lovely if y'all just guide me out of this. Please!. I Dont Really know what the dataset should look like or what to do!
Hey, I was also wondering the same thing, is the dataset available? Thank you for creating this tool!
+1 from me too.
Life moves on but could you release the dataset and we will do the cleaning for you :)
Hi @unshadow
I really appreciate the work you’ve done on the fine-tuned LLM for cybersecurity—it’s impressive! I was curious about a couple of things:
1.How big was the dataset you used for fine-tuning?
2.Is the dataset available, or are there any plans to release it?
Thanks so much for your time! Looking forward to hearing back.
Hey @unshadow ,
Just checking to see if the dataset was ever made available. I'm looking to build on what you've done by adding more detail with Palo Alto, Check Point, Cisco, Go, Rust, C++ and Python extensively.