Back    Zoom +    Zoom -
Google Research Releases Compression Algorithm TurboQuant to Reduce AI Model Memory Usage
Recommend
1
Positive
2
Negative
1
Google Research released TurboQuant, a training-free compression algorithm that can compress the KV cache of large language models (LLM) to 3 bits without affecting model accuracy, on Tuesday (24th), according to foreign media.

In benchmark tests on Nvidia (NVDA.US)'s H100 GPUs, compared to unquantized 32-bit keys, the 4-bit TurboQuant can increase the efficiency of computing attention logits by up to 8x, while reducing the KV cache memory by at least 6x.

Related NewsFed Interest Rate Decision for in the United States is 3.75%, unchanged from its last period. The forecast was 3.75%.
Memory stocks Sandisk (SDNK.US) and Micron Technology (MU.US) cascaded 3.5% and 3.4% each overnight (25th).
AASTOCKS Financial News
Website: www.aastocks.com