Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

Published in NeurIPS, 2023