posts
In Pursuit of Fast KV-Cached Attention for Apple Neural EngineBuilding a memory-friendly KV Cache with static shapes
October 2024
LLMs for your iPhone: Whole-Tensor 4 Bit QuantizationShrinking models for Apple Silicon
March 2024
Inside Apple's 2023 Transformer ModelsWhat can we learn from them?
November 2023
No Frills Time Series Compression That Also WorksSo you have some time series data and you want to make it smaller?
August 2023