Conversation

Moon (moon@shitposter.club)'s status on Thursday, 10-Aug-2023 02:47:15 JST

Moon

in reply to

@meso they're trying to work out a way to ban self-hosted AI but they can't so far because anybody with a good GPU can do it. But they're thinking hard.

Disinformation Purveyor :verified_think: likes this.

☻ (mrsaturday@shitposter.club)'s status on Thursday, 10-Aug-2023 02:48:36 JST

☻

in reply to

Moon

@Moon @meso The amount of backpedaling and attempting to put the genie back in the bottle is insane, but not surprising. Seeing Tim Berners-Lee say if he could do the internet over again he'd make it easier to censor tells me everything I need to know about where tech is today.

Disinformation Purveyor :verified_think: likes this.

JAJAX (jajax@clubcyberia.co)'s status on Thursday, 10-Aug-2023 02:48:42 JST

JAJAX

in reply to

Moon

@Moon @meso I tried to get chatGTP to draw ascii art of a cat with an ampersand in its mouth representing a mouse-- It refused until I told it ampersands represented cinnamon rolls.

Disinformation Purveyor :verified_think: likes this.

Disinformation Purveyor :verified_think: (thatguyoverthere@shitposter.club)'s status on Thursday, 10-Aug-2023 02:49:57 JST

Disinformation Purveyor :verified_think:

in reply to

Moon
JAJAX

@JAJAX @Moon @meso you might enjoy playing around with alpaca

Moon (moon@shitposter.club)'s status on Thursday, 10-Aug-2023 02:49:58 JST

Moon

in reply to

@meso there are youtube videos that walk through every step. I kind of muddled through it. I'm generating cute girls right now but I am soon gonna try to generate text like stories and stuff.

JAJAX (jajax@clubcyberia.co)'s status on Thursday, 10-Aug-2023 02:49:58 JST

JAJAX

in reply to

Moon

@Moon @meso Yeah I want a chatgpt clone generating millions of threatening letters for me to send to three-letter organizations per second

meso (meso@the.asbestos.cafe)'s status on Thursday, 10-Aug-2023 02:49:59 JST

meso

in reply to

Moon

@Moon do you have any resources on spinning up local AI

Moon (moon@shitposter.club)'s status on Thursday, 10-Aug-2023 02:51:42 JST

Moon

in reply to

Christmas_Man

@Christmas_Man @meso dunno yet I just stumble through using google.

guizzy (in exile) (guizzy@shitposter.club)'s status on Thursday, 10-Aug-2023 02:51:42 JST

guizzy (in exile)

in reply to

@Moon @Christmas_Man @meso This is the guy making 4bit quantized models for home use: https://huggingface.co/TheBloke
GPTQ models are for GPU based inference, GGML are for CPU based inference (though you can get speed boost from moving some of the load on your GPU).

I'd recommend installing Oobabooga's Text Generation Webui. It's the Automatic1111 of LLMs: https://github.com/oobabooga/text-generation-webui

With 24Gb VRAM, you can run GPTQ 13b to 20b models with room to spare for extended (over 2048 token) context and keeping Stable Diffusion loaded at the same time. Or you are supposed to be just about able to run 30b models with 2048 context on a headless linux machine. Expect double digit tokens per second. Answers will pop up in seconds.

With GGML models your RAM is going to be your limit, and speed is going to depend on CPU, GPU, RAM speed and how much you can offload to GPU/VRAM. But in general it's likely to be MUCH slower than GPTQ, but if you're running as big a model as you can fit in your machine, expect single digit tokens per second. Expect to wait sometimes over a minute for an answer. Sometimes it's worth it, sometimes not. I've heard people say that the returns from 30b to 70b are quite a bit diminished (ie: it's not really noticeably smarter, just different).