How to convert

#1
by quangdung - opened

Could you share how you converted the model? When I converted it to 8-bit, I successfully loaded the model but it crashed when I performed the inference.

I used MNN's llmexport.py (from the transformers/llm/export/ directory in the MNN repo).
Conversion command:

cd MNN/transformers/llm/export

python llmexport.py
--path /path/to/HY-MT1.5-1.8B
--dst_path /path/to/output
--export mnn
--quant_bit 4
--quant_block 64
--lm_quant_bit 4
--act_bit 16
--embed_bit 16
--mnnconvert /path/to/MNNConvert

**Key notes for 8-bit conversion:
If you want 8-bit instead, change --quant_bit 8 --lm_quant_bit 8. However, the
crash during inference might be caused by:

  1. tie_word_embeddings — This model uses tied embeddings. The converter should
    detect it automatically, but check your config.json to ensure
    "tie_word_embeddings": true is set.
  2. Model type mapping — MNN needs hunyuan_v1_dense as the model type. Make
    sure your MNN version includes this mapper (it was added relatively recently).
    I'd recommend using MNN >= 3.0.0 (I'm currently on 3.4.0).
  3. llm_config.json settings — After conversion, make sure backend_type and
    precision are set correctly:
    {
    "llm_model": "llm.mnn",
    "llm_weight": "llm.mnn.weight",
    "backend_type": "cpu",
    "thread_num": 4,
    "precision": "low",
    "memory": "low"
    }
  4. MNN version — If you're using an older MNN version that doesn't have
    hunyuan_v1_dense support in model_mapper.py, the export may succeed but
    inference will crash. Update to the latest MNN repo.

The export_args.json in the output directory records the exact parameters used
— you can cross-check with mine.

Thank you for your enthusiastic help.

Sign up or log in to comment