Public
- Public
- Network
- Groups
- Popular
- People

Conversation

Notices

Adrian Cochrane (alcinnz@floss.social)'s status on Thursday, 23-Feb-2023 02:47:37 JST Adrian Cochrane

FlexGen; Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation - Foundation Model Inference: https://github.com/FMInference/FlexGen (GitHub)
Simon Willison's summary: https://simonwillison.net/2023/Feb/21/flexgen/
Hackernews: https://news.ycombinator.com/item?id=34869960
I think nanoGPT excites me *more*, but still...
In conversation Thursday, 23-Feb-2023 02:47:37 JST from floss.social permalink
Attachments
1. GitHub - FMInference/FlexGen: Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation.
  
  Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput large-batch generation. - GitHub - FMInference/FlexGen: Running large language models like OPT-175B/GP...
2. FlexGen
  
  from @simonw
  
  This looks like a very big deal. FlexGen is a paper and accompanying code that massively reduces the resources needed to run some of the current top performing open source …
3. No result found on File_thumbnail lookup.
  
  https://news.ycombinator.com/item?id=34869960I

Feeds