@phnt@nukie@Tij@mint if we can figure out what gets caught in the fast crash loop we can change the way it starts that service to prevent that from crippling the app
@xianc78@mint@vhns A couple GEOM fixes I think? I don't remember anything else. They massively forked it by stripping down the kernel to remove a ton of syscalls and then they added new custom ones that addressed their game console needs
Bluecoat gave a bunch of network stack improvements years ago, the most significant before Netflix.
Dell/EMC has realized their mistake and is now working to upstream their changes so they don't have to run really old kernels. They have storage and NFS fixes.
Microsoft donates code and drivers to make it work efficiently under Hyper-V. Citrix did the same for Xen.
Intel does a lot, AMD does some, Nvidia/Mellanox does a good amount for network drivers
@mint@vhns Honestly I don't understand when companies do this. I see it a lot and it seems petty and insulting. Like, they piss money away on the most worthless shit all the time who is going to notice $50k to each open source project they use. Who? Nobody. Except shareholders.
@mint@vhns WhatsApp gave FreeBSD a million dollars -- before Facebook acquired them I think.
Netflix gives a ton of code and employs people. I want to say Netflix has given the largest code contributions overall. I wonder if anyone has done analysis on that specifically. What percent of the codebase was by which corporation that uses it
@phnt@phnt@mint I'm a little confused about why there are duplicate/retry jobs firing so quickly as there are some things in the logs that shouldn't be happening successively like that. They may not be the root cause but they're definitely adding to the pressure and we can address it
@mint there may be a couple things we can move out of MRFs but I think this discussion has to happen to make everyone understand exactly what these do and that they're basically a firewall/ACL/content normalization policy set by the admin and really shouldn't be possible to bypass for security purposes
@mint@phnt the I/O should really be minimal which is what's baffling
it is possible to make Oban specifically use a dedicated SQLite database in newer releases (a dedicated postgres is possible too, of course)
but let's roll back to the Oban version you never had issues with and work from there. If it still has problems we know it's something else (Postgrex?)
@mint@phnt okay, my feld/debugging branch has a clean rollback of Oban to the 2.13.6 version. There is a migration that needs to run. The Oban Live Dashboard is not compatible if you were using it, so sorry about that. But let's see if this gives you stability again
@mint Like I understand why, but it baffles me this has not been like number one priority to solve. Bundle compatibility libs for older releases if you have to, I don't care. Just make it work.
@i@NonPlayableClown@mint@hj I think we could have a dedicated tombstones table so we can check for deleted there first, would probably help. Needs deeper investigation.
Some of the delete work cascading is annoying and I think we could do better there. I've already helped slow the pain by making deletes lower priority and with a narrower queue.
We have some triggers I am suspicious of and need to investigate deeper, but my memory is that these are totally Postgres FTS related. On that note I think even if you choose to use another search backend we still waste energy on the GIN/RUM indexing...
I think also there is an expensive operation for things like number of likes on a post. I wear I saw this count being embedded directly in the Object JSON and that's crazy to me if that's how it was implemented. So deletes require rewriting that row? Holy shit that's gonna bring pain and make your table a Swiss cheese mess over time.
Those post stats (likes, replies, quotes) should probably be in their own table, use FKs, etc. Counts could be cached, and a new activity or a delete should trigger a job to update the count async instead of blocking the commit
Stuff like that could help lower resource instances tremendously
Note I just fixed a bug where we appeared to have been updating the "unreachable_since" field for every instance on every successful activity published due to the test not matching reality. Imagine how many wasted queries that is😭😭😭
@i@NonPlayableClown@mint@hj I don't get why everyone's so mad about Pleroma'a schema. Where's the core issue? Are people just upset that there's jsonb instead of a billion columns? Is there a pervasive impression that those jsonb columns are slow?
@mint@NonPlayableClown@i@hj I just ran into some very serious usability bugs on mobile and when I asked him about it he was like "sorry, working on the Nostr stuff for now" so I just stopped using it
@i@mint Biggest problem with AdminFE is the difficulty of parsing the config because it's not already JSON or something more approachable. It's got some pretty gross hardcoded stuff in it. I personally want to replace it with one written in LiveView where it can directly work with the native config data structures no problem. If we could just get an initial PoC up that even controls one settings group the rest of the work is easy if not just a little tedious, but it could happen fast
@hj@i@mint We cannot rely on PleromaFE to be the one true FE for Pleroma forever. Soapbox isn't bad, and someone could take it over now that Gleason has mostly abandoned it for the fediverse. Phanpy is super cool. Something else could come out tomorrow, you could get hit by a bus, Lain could make an entirely new UI with his AI experiments, etc. Just can't predict the future.
I really wish I had time to do more work in this area but when I have time I gravitate towards problems I already know I can solve, and that's mostly backend work 😓
Admin of bikeshed.party, not-active-enough FreeBSD developer and ports-secteam & portmgr alumni. My thoughts are my own, unless they're not. 🧐Team Pleroma 👯♀️Posts are probably satire.