DGX Spark + Mac Studio: Disaggregated LLM Inference With EXO How splitting prefill and decode across NVIDIA's Blackwell box and an M3 Ultra delivers a 2.8x speedup on Llama-3.1 8B. May 1 Shivam Malani Hardware