Our low-end edge LLM demo video

We are working on local LLMs on resource-limited edge devices. This video demonstrates our KV cache sharing approach, in whicn two Raspberry Pi Zero 2W devices share their KV cache via the middle cache server to reduce TTFT (time to first token) when inferencing similar prompts. [Paper]