https://preview.redd.it/tzcr6fyh21fe1.png?width=933&format=png&auto=webp&s=972b4483626194424ba05e3ab466096c06abe344
You can RL post-train your small LLM (on simple tasks) with only 10 hours of H100s.
https://x.com/jiayi_pirate/status/1882839370505621655