Conversation
Notices
-
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 05:58:39 JST Maybe 100 max_connections is too much for two instances on one machine. Decreased to 60, let's see if that makes any difference in CPU usage. -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 06:03:02 JST It doesn't. The issues started cropping up after I merged upstream, but the only change of note is disabling JIT by default. Turning it back on did nothing.
Снимок экрана_20240401_000225.p… -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 06:04:27 JST @munir Some old-ass AMD APU from 2012, but it doesn't matter as it worked just fine until recently. -
munir (munir@fedi.munir.tokyo)'s status on Monday, 01-Apr-2024 06:04:28 JST munir @mint Is this for Postgres? What kinda CPU are you running? -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 06:07:10 JST @munir Already did pg_repack yesterday, no substantial improvement. -
munir (munir@fedi.munir.tokyo)'s status on Monday, 01-Apr-2024 06:07:11 JST munir @mint maybe you need a db clean up or something? -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 06:16:43 JST Might as well ask the experts (@i). The "idle" postgres processes consume a shitton of CPU time. Ecto stats in Phoenix dashboard always show the longest running queues (up to 15 seconds, after which it's apparenly either resolved or dropped) to be this:
SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = associated_object_id(a0."data") INNER JOIN "users" AS u2 ON (a0."actor" = u2."ap_id") AND (u2."is_active" = TRUE) INNER JOIN "users" AS u3 ON (a0."actor" = u3."ap_id") AND (u3."invisible" = FALSE) WHERE ($1 && a0."recipients") AND (a0."actor" = $2) AND (a0."data"->>'type' = $3) AND (not(o1."data"->>'type' = 'Answer')) AND (not(o1."data"->>'type' = 'ChatMessage')) ORDER BY a0."id" desc nulls last LIMIT $4 -
:blank: (i@declin.eu)'s status on Monday, 01-Apr-2024 06:54:33 JST :blank: @mint run an analyze if you haven't, but at some point you have to give up and scale the shit heap vertically (less work or more dakka)
remember pleroma will only pool 20 connections by default, so not spawning too many idle worker forks is a good idea, since they'll just eat up ram and spread the cache thin in the pgtune math
you could go down to 20+20+5 and be fine probably, since no amount of extra queries waiting on IO will help get rid of the underlying stall likes this. -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 07:05:22 JST @i Just to make sure, analyze is VACUUM ANALYZE, right? Decreasing max_connections to 45 still spawns a bunch of idle connections sitting at ~90% of CPU. I doubt I'm at the point of no return since all the CPU usage issues cropped up very suddenly (about after I merged upstream a couple days ago). -
:blank: (i@declin.eu)'s status on Monday, 01-Apr-2024 07:19:37 JST :blank: @mint no vacuum since you just repacked it, ANALYZE VERBOSE; should be enough to recreate the statistics
try bisecting the code merge then, haven't run into it yet because i can't be bothered to merge my changes to pull in everything likes this. -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 07:24:03 JST @i Rolled back to the merge before that, I'll see if that makes any difference. So far no change. -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 07:32:33 JST @i A couple minutes later, load average at 2.05 compared to ~3.80-4.50 before. Something's definely got fucked up between bb0b17f4 and 987f44d8. -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 07:35:45 JST @i Now it's at 1.15, there are a few bursts of BIND/idle postgres processes, but I assume it's a normal operation. :wereback: -
:blank: (i@declin.eu)'s status on Monday, 01-Apr-2024 07:38:39 JST :blank: @mint time to actually read all of them then, at least the fire has been put out for now and the fact upstream are retards is once again cemented likes this. -
(mint@ryona.agency)'s status on Monday, 01-Apr-2024 07:48:01 JST @i Aside from disabling JIT by default and a changed notification query, nothing really piques my mind. Load average definely feels better than before the rollback, at least, it still jumps to 3.5 before stabilizing, but maybe it's normal during burger hours.
-