stephenpanaro.com - blog

stephen • blog

home rss

posts

In Pursuit of Fast KV-Cached Attention for Apple Neural Engine

Building a memory-friendly KV Cache with static shapes

October 2024

LLMs for your iPhone: Whole-Tensor 4 Bit Quantization

Shrinking models for Apple Silicon

March 2024

Inside Apple's 2023 Transformer Models

What can we learn from them?

November 2023

No Frills Time Series Compression That Also Works

So you have some time series data and you want to make it smaller?

August 2023