A 125M custom GPT-X2 language model with RoPE, SwiGLU, grouped-query attention, and curriculum-trained code/math normalization.
Architecture graph for embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16.
Interactive architecture graph for embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16, visualized from Hugging Face model metadata.