Smaug - Japanese Language Support

#11

by msmmpts - opened Feb 8, 2024

Feb 8, 2024

Hi team,

I was wondering if Smaug model included Japanese datasets during its training phase. If Yes, could you please the Japanese contents on which Smaug model has been trained?

ArkaAbacus

Feb 8, 2024

We did not utilise any Japanese datasets during the training of Smaug, and it does not appear as though the model we started from (https://huggingface.co/moreh/MoMo-72B-lora-1.8.7-DPO) did either.
Once we release our technique paper in a couple of weeks though you could try to replicate the process with some Japanese datasets added in :)

nonetrix

Feb 14, 2024

Yes no decent Japanese open source LLMs exist that would be nice

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment