A compact 126M language model built with MEGA rather than a standard transformer stack, with a 4096-token context length.
Architecture graph for embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead.
Interactive architecture graph for embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead, visualized from Hugging Face model metadata.