How to convert

by quangdung - opened Mar 30

Mar 30

Could you share how you converted the model? When I converted it to 8-bit, I successfully loaded the model but it crashed when I performed the inference.

wangjazz

Owner Mar 30

I used MNN's llmexport.py (from the transformers/llm/export/ directory in the MNN repo).
Conversion command:

cd MNN/transformers/llm/export

python llmexport.py
--path /path/to/HY-MT1.5-1.8B
--dst_path /path/to/output
--export mnn
--quant_bit 4
--quant_block 64
--lm_quant_bit 4
--act_bit 16
--embed_bit 16
--mnnconvert /path/to/MNNConvert

＊＊Key notes for 8-bit conversion:
If you want 8-bit instead, change --quant_bit 8 --lm_quant_bit 8. However, the
crash during inference might be caused by:

tie_word_embeddings — This model uses tied embeddings. The converter should
detect it automatically, but check your config.json to ensure
"tie_word_embeddings": true is set.
Model type mapping — MNN needs hunyuan_v1_dense as the model type. Make
sure your MNN version includes this mapper (it was added relatively recently).
I'd recommend using MNN >= 3.0.0 (I'm currently on 3.4.0).
llm_config.json settings — After conversion, make sure backend_type and
precision are set correctly:
{
"llm_model": "llm.mnn",
"llm_weight": "llm.mnn.weight",
"backend_type": "cpu",
"thread_num": 4,
"precision": "low",
"memory": "low"
}
MNN version — If you're using an older MNN version that doesn't have
hunyuan_v1_dense support in model_mapper.py, the export may succeed but
inference will crash. Update to the latest MNN repo.

The export_args.json in the output directory records the exact parameters used
— you can cross-check with mine.

quangdung

Mar 31

Thank you for your enthusiastic help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment