Neural Retrieval at Scale: Semantic Search for Large Language Models
Priya Mehta, Jordan Kim, Alex Rivera
We present a novel approach to web-scale neural retrieval that achieves state-of-the-art performance on standard benchmarks while maintaining sub-200ms latency. Our method combines dense retrieval with a proprietary re-ranking stage trained on billions of real-world queries.