Autoscaling for ML Inference IPA [2024]: Autoscaling for ML Inference Pipeline Sponge [2024]: Autoscaling for ML Inference Pipeline Dynamic SLO Problem: Different Assumptions
costs while adapting to the dynamic workload. Using a set of model variants simultaneously provides higher average accuracy compared to having one variant. Inference Serving Systems should consider accuracy, latency, and cost at the same time.
costs while adapting to the dynamic workload. Using a set of model variants simultaneously provides higher average accuracy compared to having one variant. Inference Serving Systems should consider accuracy, latency, and cost at the same time. InfAdapter!
(more responsive) (more cost efficient) Sponge! How can both scaling mechanisms be used jointly under a dynamic workload to be responsive and cost efficient while guaranteeing SLOs?
goals The variability space (design space) of (composed) systems is exponentially increasing Systems operate in uncertain environments with imperfect and incomplete knowledge Goal: Enabling users to f ind the right quality tradeoff Lander Testbed (NASA) Turtlebot 3 (UofSC) Husky UGV (UofSC) CoBot (CMU)