> If it's targeted, that's great, but a meg of stuff, something useful should have been in there.
Not really. The only good stuff in there would be the Ecto stats but they're not granular enough to be useful. Someone sharing the raw Postgres stats from pg_exporter would have been better.
> but "the directory listing is 187MB" is a real problem (even if it's not a pointer-chase, you're still reading 187MB from the disk and you're still copying 187MB of dirents into userspace and `ls` takes 45s), and that gets marked as a dup of a "nice to have" S3 bug, but this is the default Pleroma configuration. It's stuff you can't write off, you know? You hit real constraints.
Where is the "ls" equivalent happening for Pleroma? Fetching a file by name is not slow even when there are millions in the same directory.
> Yeah, BEAM using CPU isn't the bottleneck, though. Like I said in the post you're replying to, it's I/O.
BEAM intentionally holds the CPU in a spinlock when it's done doing work in the hopes that it will get more work. That's what causes the bottleneck. It might not look like high CPU usage percentage-wise, but it's preventing the kernel from context switching to another process.
And what IO? Can someone please send dtrace, systemtap, some useful tracing output showing that BEAM is doing excessive unecessary IO? BEAM should be doing almost zero IO; we don't read and write to files except when people upload attachments. Even if you're not using S3 your media files should be served by your webserver, not Pleroma/Phoenix.
> That is cool, but if you could do that for fetching objects from the DB, you'd have a bigger bump.
patches welcome, but I don't have time to dig into this in the very near future.
> Anyway, I am very familiar with fedi's n*(n-1)/2 problems. (Some time in the future, look for an object proxy patch.)
plz plz send
> But you know, back-pressure, like lowering the number of retries based on the size of the table, that could make a big difference when a system gets stressed.
patches welcome. You can write custom backoff algorithms for Oban. It's supported.
> You could ping graf; it's easier to just ask stressed instances than to come up with a good way to do stress-testing.
Everyone I've asked to get access to their servers which were stuggling has refused except mint. Either everyone's paranoid over nothing or far too many people have illegal shit on their servers. I don't know what to think. It's not exactly a motivator to solve their problems.
> Oh, yeah, so 403s? What counts as permanent?
Depends on what it is. If you get a 403 on an object fetch or profile refresh they're blocking you, so no point in retrying. If it was deleted you get a 404 or a 410, so no point in retrying that either... (when a Delete for an activity you didn't even have came in, it would put in the job to fetch the activity it was referencing... and kept trying to fetch it over and over and over...)
> You think you might end up with a cascade for those? Like, if rendering TWKN requires reading 10MB...
No, I mean it was hanging to fetch latest data from remote server before rendering the activity, which was completely unnecessary. Same with rich media previews -- if it wasn't in cache, the entire activity wouldn't render until it tried to fetch it. Stupid when it could be fetched async and pushed out over websocket like we do now.
> The schedule doesn't bug me. ... the following bug is a big problem, that's a basic thing that was broken in a release. Some kind of release engineer/QA situation could have caught it.
Again, it wasn't broken in a release. There were two different bugs: one was that we used the wrong source of truth for whether or not you were successfully following someone. The other bug became more prominent because more servers started federating Follow requests without any cc field and for some reason our validator was expecting at least an empty cc field when it doesn't even make sense to have one on a Follow request.
You seem to have a lot of opinions and ideas on how to improve things but nobody else on the team seems to give a shit about any of this stuff right now. So send patches. I'll merge them. Join me.
@feld@p >Even if you're not using S3 your media files should be served by your webserver, not Pleroma/Phoenix. Pretty sure it was served by Phoenix for the longest time until the docs changed to recommend using separate media subdomain in the wake of XSS paranoia. Also frontend with all its JSON blobs and emoji annotations in Enochian is served by Phoenix by default as well (I'm using a bunch of hacky nginx rules to serve it directly from my VPS instead of fetching them from my homeserver eery time a new tab is loaded).
@mint@feld Yeah, I forgot who said it but really early on, I remarked that Phoenix couldn't keep up and was chewing CPU because if Elixir is going to be some immutable stateless deal that it needs to have a call for IO.copy(fd1,fd2) *somewhere*, and I got an objection from one of the devs, I forget who, but there's not really a way around it: there's one way to get data out of one FD and into another efficiently and in a pure-ish functional language you have to just write some C and wrap it in something that fits the language's semantics, because it's a fundamentally FP-unfriendly task. @j probably remembers this because we both came to the same conclusion around the same time and we traded notes on nginx configuration.
@feld@p >Everyone I've asked to get access to their servers which were stuggling has refused except mint Well, giving access to the server is kinda like letting someone access your bank deposit box to check if your bills aren't counterfeit. The server I gave you access to isn't even mine and its owner was essentially MIA, so I didn't have much qualms about it.
@feld@j@mint Yeah, that's why it's got to be lower-level; you don't want to put something semantically ugly in the higher level but you end up with a read giving you an immutable linked list of bytes instead of repeatedly refilling the same preallocated, aligned, contiguous region of memory. To do it without burning CPU, you can't allocate anything: it has to be loop{syscall; conditional break; syscall; conditional break;}. So the buffer can't cross into the user-level of the runtime unless you have an exact GC and you somehow have an internal allocator that can recognize that you're doing that. The former exists and I've never seen the latter, but you can tell it to copy one FD to the other without breaking the language and you can do that with a static buffer at a lower level where the semantics aren't expected to hold.
@p@j@mint I think this would be achieved with Stream.reduce or one of the other Stream functions. All IO related stuff (anything unixy exposed as a file, network stuff, etc) can be streamed. It obviously would've be as efficient as C because after so many reductions (function calls, mostly. Stream internally is gonna be calling itself in a loop reading chunks) it will be preempted and another process would get to run
@p@mint@j All the IO stuff in BEAM/Erlang/Elixir is written in C. Same with the networking stuff like the TCP/UDP functionality, cryptography, JSON in latest OTP release, etc
@feld@j@mint Yeah, but last I checked (which has admittedly been a minute), there's no "Just stream this FD to this other FD until you get EOF on the first one" call. You read, then write, you've gotta do it yourself.
@i@feld@j@mint How tricky is it to just call Erlang stuff from Elixir? Seems relatively easy, right? (Unless I am forgetting something, I think you can just call it directly.)
@feld@p@mint@j OTP will still let you call that fd to fd copy via sendfile(2), falling back to read and write only if the host platform doesn't support it
Presumptuously assuming here that you need Erlang and Elixir for #Pleroma and precisely nothing else, cuz I had literally never heard of either of those languages before getting involved in #fedi
> Presumptuously assuming here that you need Erlang and Elixir for #Pleroma and precisely nothing else,
You do need those to hack on Pleroma because Pleroma is written in Elixir, but there are dependencies that are written in C, too. There's probably other stuff in there.
> cuz I had literally never heard of either of those languages before getting involved in #fedi
I did some Erlang at an old job; I'd never done Elixir but I saw people talking about it.
It gets easier to pick up languages the more you pick up anyway. And then you know, like anyone can pick up awk in 30 minutes, most of the shells anyone uses are scripting languages you can pick up gradually.