What's the most effective training for multigpu? Deepspeed vs Unsloth multigpu training?

I have had an amazing time with unsloth, but I have learned unsloth does not support deepspeed.

Is it faster to use deepspeed without unsloth, or use unsloth and data parallelism?

If it makes a difference, I was planning on using stage 2.