Conversation
Notices
-
@p
@TradeMinister
> Birthday problem or pigeonhole principle (depending if you are a fan of statistics or discrete math). Enough people in a room and the coincidences start being something you can rely on.
I forgot that you're using, if I understand, a content-addressable filesystem where the objects are hashed and thenceforth referenced by hash: two objects with the same hash (repeated images for example) are treated as one object (with a refcount of two if you're doing that). So collisions are good, except in the corner-case where two different objects collide. Disaster if you're a SpaceX rocket, but a filesystem for social media, no biggie.
> Well, not necessarily; it has been a minute since I collected garbage!
If you're not actually deleting blocks, it may be more like cache management than what I think of as GC, but I never formally studied either. Actual GC, for me, would be looking for data blocks that nothing references, orphans with refcount < 1. If you're looking at what a given node should keep a copy of, it's a somewhat different question.
> It is actually intentional that the slush is a mess; you can reject some keys while still participating in block propagation, so what you do with your node affects only you.
It seems like a decentralized distributed filesystem would sort of require this. It's inefficient, but having a bunch of nodes requesting data by content hash, other nodes putting that data on the Net, all without a central manager or directory, is going to be messy.
> This would need dumbing down for me to comment.
> Once you've broken it into blocks, there are six lanes: blocks that a user has signed with his key ("Head of my tree is $x, witness my seal, here affixed in anno domini MMXXII"), explicit blocks (i.e., children of blocks signed by users of a given node or by users that the node is interested in), and network blocks (unaddressed slush that you have received and are passing on to other nodes: "I have received 4543a82f8f5a7914eaccaf626d311babaa0bd58cbaa9944748e8ba17c179ee8c and 302e3be1a049d2ad498d64af7c9d87ad64d7e41b7f077db80a67ff8516e2645b and 6ee94fe372d94918c75144ffa34c0fa0eb471bd0ee047241abaad0d31ea99298 and can relay those if you need them".). The other three lanes are the same, but for incoming data.
OK, I think I get this now.
> Yeah, my reasoning is if we just go by how reliably it cooperates, we don't need to treat flaky or malicious nodes differently.
Both state and nonstate actors will have nodes trying to crack the protocol, de-anonymize users and locate nodes, perhaps inject forged data. If the protocol is rugged enough, their nodes will reveal themselves.
> I am a big fan of ARM still.
It's a good architecture family.
> This is the benefit of content-addressed storage.
It's always been an interesting idea.
> That sounds like a fun story. (Also, to hear the contemporaries talk about BSD, you could probably have gotten stoned just by standing close enough to their door.)
I guess they were already growing super-sinsemilla in California then. Whatever they had was seriously one-hit, and nobody had just one. Don't remember alcohol being part of their room-party, but then I barely remember the party, or probably anything within a day or so either side of it.
> you can get away with posting with a really, *really* low-spec system and a couple of kilobytes of persistent storage. (You need more in order to read posts, or at least enough RAM.)
If one needs to run a node to participate, then nodes should run on phones and be economical about data.
-
@TradeMinister
> One might hope that after decades of browsers crashing with half an hour of input in some buffer, they would have recovery mechanisms. One would be disappointed.
My solution is to just use the external editor plugin that only runs in extremely deprecated versions of Firefox, before I stopped patching Firefox. acme is immune to Firefox crashing, and if everything comes down, I know how to find a text file.
(Speak of the devil, and he's in your midst: Firefox died because I kicked off a `make` in the background, and didn't realize that it was C++ and it ate a large amount of RAM.)
> I'd never thought about it before, but some degree of unreliability might be OK in some applications.
Yeah. I don't think it will occur. I don't think anyone's found a collision in IPFS yet, and the block sizes are even smaller in Revolver. (venti uses SHA-1 and because of venti's block-splitting algorithm, I've never heard of anyone finding a collision.) So I'm not worried about that, but if it does happen, the impact should be small.
> That's an interesting idea: the network as storage, the raw data as caches of the network.
I wish I could say it's due to my own insight, but I've cribbed very heavily from other people's research.
> Handshakes could drive one nuts. How does one know one's ack was received? What happens if it isn't?
Ah, in this case, I meant "n*(n-1) connections between servers". Everyone has to talk to everyone for everyone to get everyone's posts. There isn't going to be 100% saturation for any given post, but at present, for 20,554 nodes, that's 422,446,362 edges. (Revolver actually grew out of a sketch for a fix for this: it started as half an experiment in developing an object proxy.)
> the way I *think* ACK worked was something like
It's got TCP sequence numbers (which are nowadays non-sequential, to prevent spoofing).
> Truly old-skool; that's how we used to roll, had to given our ridiculous hardware constraints.
Thank you, sir.
> what phones specs, and more important, data access, the middle class of Argentine will have in a year.
This is actually a really good plan. I think we have some Argentinians on FSE; in any case we have a few Chileans, and although they're not quite the same culturally, I suspect it's roughly the same in terms of median phone specs.
-
@p
@TradeMinister
> (Speak of the devil, and he's in your midst: Firefox died because I kicked off a `make` in the background, and didn't realize that it was C++ and it ate a large amount of RAM.)
Both are such pigs that the kernel OOM code (out of memory) had to decide what to kill.
> Ah, in this case, I meant "n*(n-1) connections between servers". Everyone has to talk to everyone for everyone to get everyone's posts. There isn't going to be 100% saturation for any given post, but at present, for 20,554 nodes, that's 422,446,362 edges. (Revolver actually grew out of a sketch for a fix for this: it started as half an experiment in developing an object proxy.)
I've got this vague notion that some sort of ring architecture might work here, maybe hierarchical. Instead of every post being blasted to every node, have something more like frequent mail delivery.
> the way I *think* ACK worked was something like
It's got TCP sequence numbers (which are nowadays non-sequential, to prevent spoofing).
> Truly old-skool; that's how we used to roll, had to given our ridiculous hardware constraints.
Thank you, sir.
> what phones specs, and more important, data access, the middle class of Argentine will have in a year.
This is actually a really good plan. I think we have some Argentinians on FSE; in any case we have a few Chileans, and although they're not quite the same culturally, I suspect it's roughly the same in terms of median phone specs.
-
@TradeMinister The browser crashed while I was writing this, so I am going to forget some obvious things.
> So collisions are good, except in the corner-case where two different objects collide.
The worst impact, if you can find a colliding SHA-256 (and find one in less than 8kB), is that there is a corrupted block referenced from a higher-level block. This is hopefully unlikely, but shouldn't be a major problem.
> If you're not actually deleting blocks, it may be more like cache management
Yeah, blocks are being deleted in cases where storage matters. (Although, if there is enough redundancy, then notionally, all blocks are just cache of the network.)
> It's inefficient, but having a bunch of nodes requesting data by content hash
It is, yeah, but them's the breaks. I think it has a good chance of being more efficient than fedi's handshake explosion.
> Both state and nonstate actors will have nodes trying to crack the protocol, de-anonymize users and locate nodes, perhaps inject forged data.
If I'm lucky enough to succeed!
> If the protocol is rugged enough, their nodes will reveal themselves.
This is the hope, but people are endlessly inventive, and I've watched what happened with Tor, I2P, Bittorrent, IPFS. I think the best I can do is avoid the major architectural botches and try to play it conservative but adaptable.
> I barely remember the party, or probably anything within a day or so either side of it.
Ha!
> If one needs to run a node to participate, then nodes should run on phones and be economical about data.
Indeed. The node running on FSE is eating only 32MB RAM at present, an encouraging sign. The target is full-featured node on small ARM box (which should, at least vaguely, translate to nodes running on phones), and I think I can pull that off for the initial release. It's designed such that nodes can have multiple users (though it's better to use your own node).
-
@p
> The browser crashed while I was writing this, so I am going to forget some obvious things.
One might hope that after decades of browsers crashing with half an hour of input in some buffer, they would have recovery mechanisms. One would be disappointed.
> The worst impact, if you can find a colliding SHA-256 (and find one in less than 8kB), is that there is a corrupted block referenced from a higher-level block. This is hopefully unlikely, but shouldn't be a major problem.
Again, unacceptable in a Mars-probe landing system, but OK in a meme-swapping network. I'd never thought about it before, but some degree of unreliability might be OK in some applications.
> Yeah, blocks are being deleted in cases where storage matters. (Although, if there is enough redundancy, then notionally, all blocks are just cache of the network.)
That's an interesting idea: the network as storage, the raw data as caches of the network.
> It is, yeah, but them's the breaks. I think it has a good chance of being more efficient than fedi's handshake explosion.
Handshakes could drive one nuts. How does one know one's ack was received? What happens if it isn't? It's been many decades, but the original TCP spec thought about all this, and had a SYN/ACK system that was totally reliable and seemed fairly light on useless chitchat. I forget what SYN did (must have stood for Synchronize) but the way I *think* ACK worked was something like ACK 1024, meaning 'I have received the first 1024 bytes of your data'. And because the underlying IP did not guarantee much of anything, the cases of out-of-order data, ACKs not matching, etc all had to be dealt with, and were. It was all so damn good that we're still using it, and the French philisophe X.25 never went very far.
> I think the best I can do is avoid the major architectural botches and try to play it conservative but adaptable.
Truly old-skool; that's how we used to roll, had to given our ridiculous hardware constraints.
> The target is full-featured node on small ARM box (which should, at least vaguely, translate to nodes running on phones), and I think I can pull that off for the initial release. It's designed such that nodes can have multiple users (though it's better to use your own node).
I'd recommend considering what phones specs, and more important, data access, the middle class of Argentine will have in a year. I say Argentina because 1. 🏆🏐🇦🇷 with a team of 🇦🇷 and not 🐵; 2. It's an intelligent population with very limited budgets and cell data access; 3. Maldacena! Quantum gravity Goooooaal!; 4. Argentiiiiiiiina! 5; I don't gaf about Africa, India etc or care what they have to say (except Boers: Boers are cool).
-
@TradeMinister
> Both are such pigs that the kernel OOM code (out of memory) had to decide what to kill.
If I didn't have this browser, I think most of my RAM would sit idle.
> I've got this vague notion that some sort of ring architecture might work here, maybe hierarchical. Instead of every post being blasted to every node, have something more like frequent mail delivery.
FIDOnet!
The way the propagation works as things stand has to do with who is following who; posts associated with the activities of anyone on FSE is following make their way here. So posts, replies, likes, etc., the idea being that the posts likely to be of interest to people on this instance are the posts of interest to the people we're following. It's a really carefully designed system.