Google’s TurboQuant has the internet joking about Pied Piper from HBO's "Silicon Valley." The compression algorithm promises to shrink AI’s “working memory” by up to 6x, but it’s still just a lab experiment for now.
Inference is dirt cheap in comparison. Hundreds to thousands of concurrent users can be served by hardware costing in the high-thousands to low-ten-thousands.
Training those same foundational models is weeks to months of time on tens to hundreds of millions worth of hardware.
Inference is dirt cheap in comparison. Hundreds to thousands of concurrent users can be served by hardware costing in the high-thousands to low-ten-thousands.
Training those same foundational models is weeks to months of time on tens to hundreds of millions worth of hardware.