Exploring DeepSeek’s approach to LLM on Computerphile

Published on 2025-02-01T00:00:00Z

Link: https://www.youtube.com/watch?v=gY4Z-9QlZ64

Type: links

Tags: #links

Commentary

This was a great video explaining the key difference on how Deepseek did the LLM game differently, a concept called MoE mixture of experts. A network where the LLM will branch out to a specific network where it can use the weights more efficiently instead of the entire weights. Nice thinking, this feels so high-level view, how exciting or frustrating it would be to do that in a low-level and actually hands-on with the actual model.