Building reliable LLM applications means handling the cascade of failures that emerge at scale. Drawing from experience processing billions