Resolving Interference When Merging Models
Paper • 2306.01708 • Published • 19
Merged model produced by TIES-Merging (Yadav et al., NeurIPS 2023).
mistralai/Mistral-7B-Instruct-v0.3safety = wvnvwn/Mistral-7B-Instruct-v0.3-hhrlhf-v1, alpha = 0.9downstream = wvnvwn/Mistral-7B-Instruct-v0.3-gsm8k-v1, alpha = 0.09999999999999998ties1.00.2bfloat16
where tau_i = theta_i_sft - theta_base, and M(.) is the identity for TA,
the Trim-Elect-Disjoint operator for TIES, and DARE's drop-and-rescale
preprocessing (composed with TA) for DARE.
Produced by the GTM merge_cli (see commit history of the GTM repo). Full
metadata is shipped as merge_metadata.json alongside the weights.
Research only. This checkpoint is part of a sweep designed to probe the safety / downstream-utility Pareto frontier; individual α values are NOT recommended for deployment without further evaluation.
Base model
mistralai/Mistral-7B-v0.3