Commentary

  • This was a great video explaining the key difference on how Deepseek did the LLM game differently, a concept called MoE mixture of experts. A network where the LLM will branch out to a specific network where it can use the weights more efficiently instead of the entire weights. Nice thinking, this feels so high-level view, how exciting or frustrating it would be to do that in a low-level and actually hands-on with the actual model.