How to convert
Could you share how you converted the model? When I converted it to 8-bit, I successfully loaded the model but it crashed when I performed the inference.
I used MNN's llmexport.py (from the transformers/llm/export/ directory in the MNN repo).
Conversion command:
cd MNN/transformers/llm/export
python llmexport.py
--path /path/to/HY-MT1.5-1.8B
--dst_path /path/to/output
--export mnn
--quant_bit 4
--quant_block 64
--lm_quant_bit 4
--act_bit 16
--embed_bit 16
--mnnconvert /path/to/MNNConvert
**Key notes for 8-bit conversion:
If you want 8-bit instead, change --quant_bit 8 --lm_quant_bit 8. However, the
crash during inference might be caused by:
- tie_word_embeddings — This model uses tied embeddings. The converter should
detect it automatically, but check your config.json to ensure
"tie_word_embeddings": true is set. - Model type mapping — MNN needs hunyuan_v1_dense as the model type. Make
sure your MNN version includes this mapper (it was added relatively recently).
I'd recommend using MNN >= 3.0.0 (I'm currently on 3.4.0). - llm_config.json settings — After conversion, make sure backend_type and
precision are set correctly:
{
"llm_model": "llm.mnn",
"llm_weight": "llm.mnn.weight",
"backend_type": "cpu",
"thread_num": 4,
"precision": "low",
"memory": "low"
} - MNN version — If you're using an older MNN version that doesn't have
hunyuan_v1_dense support in model_mapper.py, the export may succeed but
inference will crash. Update to the latest MNN repo.
The export_args.json in the output directory records the exact parameters used
— you can cross-check with mine.
Thank you for your enthusiastic help.