SandLogicTechnologies commited on
Commit
42d5b94
·
verified ·
1 Parent(s): 8bdfd5a

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ granite-guardian-4.1-8b_IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ granite-guardian-4.1-8b_IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
38
+ granite-guardian-4.1-8b_IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,148 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - ibm-granite/granite-guardian-4.1-8b
7
+ tags:
8
+ - text-generation
9
+ - safety
10
+ - content-moderation
11
+ - guardrails
12
+ - instruction-following
13
+ - reasoning
14
+ - efficient-model
15
  ---
16
+
17
+ ## Granite-Guardian-4.1-8B
18
+
19
+ Granite-Guardian-4.1-8B is a safety-focused large language model developed for content moderation, policy evaluation, risk detection, and safe conversational workflows. This repository contains GGUF quantized variants of the model optimized for efficient local inference using llama.cpp.
20
+
21
+ The quantized formats significantly reduce memory requirements while preserving strong classification and moderation performance, enabling practical deployment on consumer hardware and edge environments.
22
+
23
+ ---
24
+
25
+ ## Model Overview
26
+
27
+ - **Model Name**: Granite-Guardian-4.1-8B
28
+ - **Base Model**: ibm-granite/granite-guardian-4.1-8b
29
+ - **Architecture**: Decoder-only Transformer
30
+ - **Parameter Count**: 8 Billion
31
+ - **Modalities**: Text
32
+ - **Primary Languages**: English
33
+ - **Developer**: IBM Granite
34
+ - **License**: Apache 2.0
35
+
36
+ ---
37
+
38
+ ## Quantization Formats
39
+
40
+ This repository provides various GGUF quantized versions of the `Granite-Guardian-4.1-8B` model, optimized for efficient local inference using `llama.cpp`. Below are the details of the available I-Matrix (IQ) formats.
41
+
42
+ ### IQ3_M
43
+
44
+ - Size reduction of approx 76.68% (3.64 GB) compared to 16-bit (15.61 GB)
45
+ - Aggressive 3-bit quantization optimized for maximum memory reduction
46
+ - Suitable for low-memory systems and CPU-based inference
47
+ - Maintains lightweight deployment capability for moderation pipelines
48
+ - Output quality may degrade on nuanced reasoning or complex safety classification tasks
49
+
50
+ ### IQ4_NL
51
+
52
+ - Size reduction of approx 70.92% (4.54 GB) compared to 16-bit (15.61 GB)
53
+ - Advanced 4-bit non-linear quantization designed to better preserve output quality
54
+ - More suitable for structured moderation workflows and detailed classification tasks
55
+ - Typically provides stronger consistency compared to lower-bit formats
56
+ - Slightly increased computational overhead during inference
57
+
58
+ ### IQ4_XS
59
+
60
+ - Size reduction of approx 72.33% (4.32 GB) compared to 16-bit (15.61 GB)
61
+ - Balanced 4-bit quantization focused on efficiency and stable inference performance
62
+ - Good trade-off between model size, speed, and moderation quality
63
+ - Suitable for general-purpose deployment across constrained hardware
64
+ - Maintains reliable generation and classification behavior for most practical workloads
65
+
66
+ ---
67
+
68
+ ## Training Background (Original Model)
69
+
70
+ Granite-Guardian-4.1-8B is trained with an emphasis on AI safety, risk evaluation, and policy-aware conversational analysis.
71
+
72
+ ### Pretraining
73
+
74
+ - Large-scale language pretraining across diverse textual domains
75
+ - Focus on contextual understanding and robust text representations
76
+ - Optimized for downstream moderation and classification workflows
77
+
78
+ ### Alignment and Safety Tuning
79
+
80
+ - Refined using safety-focused datasets and moderation objectives
81
+ - Enhanced for harmful content detection and policy evaluation
82
+ - Improved reliability for instruction compliance and risk-aware outputs
83
+
84
+ ---
85
+
86
+ ## Key Capabilities
87
+
88
+ - **Content Moderation**
89
+ Detects unsafe, harmful, or policy-violating content across diverse inputs.
90
+
91
+ - **Risk and Safety Evaluation**
92
+ Supports moderation pipelines and conversational safety workflows.
93
+
94
+ - **Instruction Understanding**
95
+ Handles structured prompts and classification-oriented tasks effectively.
96
+
97
+ - **Efficient Local Deployment**
98
+ Quantized variants enable practical offline inference on consumer hardware.
99
+
100
+ - **Reliable Text Classification**
101
+ Suitable for filtering, moderation, and safety-oriented NLP applications.
102
+
103
+ ---
104
+
105
+ ## Usage Example
106
+
107
+ ### Using llama.cpp
108
+
109
+ ```
110
+ ./llama-cli \
111
+ -m SandlogicTechnologies/granite-guardian-4.1-8b_IQ4_NL.gguf \
112
+ -p "Explain the concept of knowledge distillation in detail"
113
+ ```
114
+
115
+ ---
116
+
117
+ ## Recommended Usecases
118
+
119
+ - **AI Safety and Moderation Systems**
120
+ Build local moderation and filtering pipelines without cloud dependencies.
121
+
122
+ - **Risk Classification Workflows**
123
+ Analyze prompts and outputs for harmful or unsafe content patterns.
124
+
125
+ - **Enterprise Safety Layers**
126
+ Integrate guardrails into conversational AI systems and assistants.
127
+
128
+ - **Research and Evaluation**
129
+ Study model alignment, moderation behavior, and safety-focused prompting strategies.
130
+
131
+ ---
132
+
133
+ ## Acknowledgments
134
+
135
+
136
+ These quantized models are based on the original work by the **IBM Granite** development team.
137
+
138
+ Special thanks to:
139
+
140
+ - The [IBM Granite](https://huggingface.co/ibm-granite) team for developing and releasing the [Granite-Guardian-4.1-8B](https://huggingface.co/ibm-granite/granite-guardian-4.1-8b) model.
141
+
142
+ - **Georgi Gerganov** and the `llama.cpp` open-source community for enabling efficient quantization and inference via the GGUF format.
143
+
144
+ ---
145
+
146
+ ## Contact
147
+
148
+ For questions, feedback, or support, please reach out at [support@sandlogic.com](mailto:support@sandlogic.com) or visit https://www.sandlogic.com/
granite-guardian-4.1-8b_IQ3_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da4bf7d5eca8899dae49bbae727c7ef77a9acb5080237bd61643e37f92d37646
3
+ size 3912561728
granite-guardian-4.1-8b_IQ4_NL.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa06ff4f3f46504c753e39792ab146bc556a0927c79bd3f006b16fb8a51b77ff
3
+ size 4878480448
granite-guardian-4.1-8b_IQ4_XS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a3150672f8b6d333d6094ddacf296ece6ebdbb581c72b01d3efb67132a24c49
3
+ size 4642878528