A compact 126M language model built with MEGA rather than a standard transformer stack, with a 4096-token context length.
Architecture graph for deepseek-ai/DeepSeek-V4-Pro.
Interactive architecture graph for deepseek-ai/DeepSeek-V4-Pro, visualized from Hugging Face model metadata.