Chris Manning (top 3 NLP/Machine Learning researchers in the world) believes the Deepseek 6m dollar training costs due to the optimizations discussed in their paper
While a lot of the things discussed in the Deepseek paper have been verified, what has garnered the most skepticism is the training cost.
Chris manning, whose highly regarded as one of the top 3-5 NLP researchers in the world, gave a talk yesterday, which was live tweeted
https://x.com/atroyn/status/1884700131884490762
"deepseek have succeeded at producing models with large numbers of experts (256 in v3). combined with multi-head latent attention, plus training in fb8, dramatically reduces training costs. @chrmanning buys the $6M training compute cost."
He buys the 6 million dollar training cost claimed.