A Hitchhiker's Guide to ML Training Infrastructure