What's the most effective training for multigpu? Deepspeed vs Unsloth multigpu training?
I have had an amazing time with unsloth, but I have learned unsloth does not support deepspeed.
Is it faster to use deepspeed without unsloth, or use unsloth and data parallelism?
If it makes a difference, I was planning on using stage 2.