How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
# Run inference directly in the terminal:
llama cli -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
# Run inference directly in the terminal:
llama cli -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
Use Docker
docker model run hf.co/QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF:
Quick Links

QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF

This is quantized version of ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1 created suign llama.cpp

Model Description

This model is a fully fine-tuned version of the "meta-llama/Meta-Llama-3-8B-Instruct" model with a 30GB Turkish dataset.

The Cosmos LLaMa Instruct is designed for text generation tasks, providing the ability to continue a given text snippet in a coherent and contextually relevant manner. Due to the diverse nature of the training data, which includes websites, books, and other text sources, this model can exhibit biases. Users should be aware of these biases and use the model responsibly.

Transformers pipeline

import transformers
import torch

model_id = "ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Sen bir yapay zeka asistanฤฑsฤฑn. Kullanฤฑcฤฑ sana bir gรถrev verecek. Amacฤฑn gรถrevi olabildiฤŸince sadฤฑk bir ลŸekilde tamamlamak. Gรถrevi yerine getirirken adฤฑm adฤฑm dรผลŸรผn ve adฤฑmlarฤฑnฤฑ gerekรงelendir."},
    {"role": "user", "content": "Soru: Bir arabanฤฑn deposu 60 litre benzin alabiliyor. Araba her 100 kilometrede 8 litre benzin tรผketiyor. Depo tamamen doluyken araba kaรง kilometre yol alabilir?"},
]

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
print(outputs[0]["generated_text"][-1])

Transformers AutoModelForCausalLM

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Sen bir yapay zeka asistanฤฑsฤฑn. Kullanฤฑcฤฑ sana bir gรถrev verecek. Amacฤฑn gรถrevi olabildiฤŸince sadฤฑk bir ลŸekilde tamamlamak. Gรถrevi yerine getirirken adฤฑm adฤฑm dรผลŸรผn ve adฤฑmlarฤฑnฤฑ gerekรงelendir."},
    {"role": "user", "content": "Soru: Bir arabanฤฑn deposu 60 litre benzin alabiliyor. Araba her 100 kilometrede 8 litre benzin tรผketiyor. Depo tamamen doluyken araba kaรง kilometre yol alabilir?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Model Contact

COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department
https://cosmos.yildiz.edu.tr/
cosmos@yildiz.edu.tr


license: llama3

Downloads last month
136
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuantFactory/Turkish-Llama-8b-Instruct-v0.1-GGUF