FlexGen; Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation - Foundation Model Inference: https://github.com/FMInference/FlexGen (GitHub)
Simon Willison's summary: https://simonwillison.net/2023/Feb/21/flexgen/
Hackernews: https://news.ycombinator.com/item?id=34869960
I think nanoGPT excites me *more*, but still...