@mint if it's interesting to you, I forked the Oban Lifeline plugin and made a new one called Lazarus which is configured to revive a stuck/dead/orphaned job even if it was at its last attempt due to max_attempts: 1
if the job has failed multiple times though, it lets it go
the original Lifeline plugin would throw away a bunch of our jobs just because they're max_attempts: 1 and it's not super necessary to have them tried again, just dumb that if they failed hard for a totally unhandled reason it wouldn't try again
@mint I found it's possible for background queue to get stuck because of super long timeout (15 mins) and some other jobs which were missing timeouts (defaults to infinity), so I've fixed these issues. Some other tweaks in here too.
These changes do not have anything directly to do with the ReceiverWorker, but it may be possible that Oban is not scheduling those jobs because of existing running jobs being stuck. This is unclear to me and doesn't feel like it should work that way in the BEAM, so it could be an Oban-specific behavior with how it is choosing to execute available work.
Investigation is still ongoing until I am certain nothing else could be causing this.
@mint I am actively investigating this, trying to find any possible reason this is happening.
My best guess so far is orphaned jobs making Oban think it can't run more jobs because they're dead / stuck in "executing" state.
This should really never happen because Oban itself doesn't crash, but I guess if you restarted Pleroma and it didn't clean itself up gracefully this could happen.
Any chance some of these are Docker deployments or the service could have crashed and restarted automatically due to low resources (OOM, etc)?
@creamqueen@coolboymew btw if you only use it for cooking, buy unsalted butter. It works better. You'll have to compensate with a little salt in your recipes, but salted butter retains water which is not what you want when you're cooking/baking/frying.
@i@mint I think it could be reasonably common for there to be stale "executing" jobs in the table that are left indefinitely and should be recycled due to them remaining from a crash/failure/unclean shutdown. I check mine occasionally but so far I haven't found any.
Admin of bikeshed.party, not-active-enough FreeBSD developer and ports-secteam & portmgr alumni. My thoughts are my own, unless they're not. 🧐Team Pleroma 👯♀️Posts are probably satire.