Text Generation
Transformers
Safetensors
Chinese
neuronspark
snn
spiking-neural-network
neuromorphic
conversational
custom_code
Instructions to use Brain2nd/NeuronSpark-0.9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Brain2nd/NeuronSpark-0.9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Brain2nd/NeuronSpark-0.9B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Brain2nd/NeuronSpark-0.9B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Brain2nd/NeuronSpark-0.9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Brain2nd/NeuronSpark-0.9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Brain2nd/NeuronSpark-0.9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Brain2nd/NeuronSpark-0.9B
- SGLang
How to use Brain2nd/NeuronSpark-0.9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Brain2nd/NeuronSpark-0.9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Brain2nd/NeuronSpark-0.9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Brain2nd/NeuronSpark-0.9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Brain2nd/NeuronSpark-0.9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Brain2nd/NeuronSpark-0.9B with Docker Model Runner:
docker model run hf.co/Brain2nd/NeuronSpark-0.9B
| """ | |
| NeuronSpark: SNN 隐状态空间语言模型 — HuggingFace 接口 | |
| 用法: | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "checkpoints_sft/", trust_remote_code=True, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("checkpoints_sft/") | |
| """ | |
| from typing import Optional | |
| import torch | |
| from transformers import PreTrainedModel, GenerationMixin | |
| from transformers.modeling_outputs import CausalLMOutputWithPast | |
| from configuration_neuronspark import NeuronSparkConfig | |
| from model import SNNLanguageModel | |
| class NeuronSparkForCausalLM(PreTrainedModel, GenerationMixin): | |
| """ | |
| SNN 语言模型 — CausalLM 接口。 | |
| 封装 SNNLanguageModel,提供 HuggingFace 标准接口: | |
| - forward(input_ids, labels) → CausalLMOutputWithPast | |
| - generate() 支持(通过 GenerationMixin) | |
| """ | |
| config_class = NeuronSparkConfig | |
| supports_gradient_checkpointing = True | |
| def __init__(self, config: NeuronSparkConfig): | |
| super().__init__(config) | |
| self.model = SNNLanguageModel( | |
| vocab_size=config.vocab_size, | |
| D=config.D, | |
| N=config.N, | |
| K=config.K, | |
| num_layers=config.num_layers, | |
| D_ff=config.D_ff, | |
| v_th_min=config.v_th_min, | |
| ) | |
| def get_input_embeddings(self): | |
| return self.model.embed_tokens | |
| def set_input_embeddings(self, value): | |
| self.model.embed_tokens = value | |
| def get_output_embeddings(self): | |
| # tied head: 输出复用 embed_tokens.weight | |
| return self.model.embed_tokens | |
| def forward( | |
| self, | |
| input_ids: torch.Tensor, | |
| labels: Optional[torch.Tensor] = None, | |
| attention_mask: Optional[torch.Tensor] = None, | |
| **kwargs, | |
| ) -> CausalLMOutputWithPast: | |
| """ | |
| 前向传播。 | |
| Args: | |
| input_ids: (batch, seq_len) token IDs | |
| labels: (batch, seq_len) 目标 token IDs(可选,用于计算 loss) | |
| attention_mask: 兼容参数(SNN 无 attention,忽略) | |
| """ | |
| if labels is not None: | |
| out = self.model(input_ids, target_ids=labels) | |
| # 计算 masked loss | |
| loss_mask = (labels != 0).float().view(-1) | |
| loss = (out.last_loss * loss_mask).sum() / loss_mask.sum() | |
| # 加 ponder cost | |
| if out.ponder_cost is not None: | |
| loss = loss + 0.01 * out.ponder_cost | |
| return CausalLMOutputWithPast(loss=loss) | |
| else: | |
| out = self.model(input_ids) | |
| return CausalLMOutputWithPast(logits=out.logits) | |
| def prepare_inputs_for_generation(self, input_ids, **kwargs): | |
| """generate() 所需的输入准备。""" | |
| return {"input_ids": input_ids} | |
| def generate( | |
| self, | |
| input_ids: torch.Tensor, | |
| max_new_tokens: int = 256, | |
| temperature: float = 1.0, | |
| top_k: int = 50, | |
| eos_token_id: Optional[int] = None, | |
| **kwargs, | |
| ) -> torch.Tensor: | |
| """ | |
| 自回归生成(直接调用 SNN 的 generate 方法)。 | |
| """ | |
| return self.model.generate( | |
| prompt_ids=input_ids, | |
| max_new_tokens=max_new_tokens, | |
| temperature=temperature, | |
| top_k=top_k, | |
| eos_token_id=eos_token_id, | |
| ) | |