A compact 126M language model built with MEGA rather than a standard transformer stack.
←
Model Card Embed
Give your model card an architecture graph in 10 seconds.
Paste your model ID → copy HTML
→ paste into README.md.
Granularity on card
Block