Conversation
Notices
-
@pernia index it with a real db for internal use, if you're concerned or implementing some flouridated shit. OTHERWISE, the thing is laid out so nginx just serves everyhting straight off the disk. Doesn't get a lot faster than that.
-
@pernia no you dummy there is very little need to read from disk at all when serving AP objects, you just put the json in a file, then when faggot a requests /objects/cocksucker you just print it out at him. The EXACT same thing (plus more work) has to happen to read the json out of the db
out going federation is written just once to disk and sent out form where it is
you can literally just be clever about how you save posts you made, and the posts you are interested in reading, and you're done. no seeking no thing. webfinger can even be a static file everything can be static files if you're clever tat's how you can get AP servers written in such ridiculous ways
-
@pwm well for writes and lookups i know relational databases use write-optimized and read-optimized data structures, specifically so you don't strain the disk so much and get more out of it. doing it straight of the disk would mean doing sweeps and shit to look stuff up, which is slower
-
@pernia @vic caddy is for zoomers who are scared of config files longer than 6 lines
-
@vic @pwm caddy is for NIGGERS.
nah but i've heard its good. just wanna do shit the autistic way and use as much from base as possible
-
@pernia @pwm give caddy a shot
-
@pwm damn, ok thats really cool. i forgor the Btree is for just index shit mb.
snac2 is fucking BASED then damn. and is nginx the only webserver that streams files? does openbsd's httpd have those funi optimizations u think?
-
@pwm ok so if i understand correctly:
>backend searched object by id in db
>db check with page manager to get data off disk into memory
>once in memory, u can send the data thru internetz
thats scenario a. so i assume by page manager u mean the mmu? or the one provided by the OS (dont know terminology) and not something else. In that case, wouldn't moving json from disk to memory have to happen anyway? why would it be slower in a db than from disk?
and wouldn't reading the data from disk be faster since its a B tree, rather than reading the file sequentially?
then in scenario b, that would mean reading the file sequentially to load it from disk to memory, with the page manager/mmu doing its thing, but having nginx do it directly rather than it going thru snac2 first. so to be faster the db+backend overhead would have to be greater than the savings u get from the B tree.
i'm sure i'm missing a few things here. idk what u mean by "page manager" and which tuples ur talking abt.
-
@pernia
> i assume by page manager u mean the mmu?
the page manager is the component of the database (it's part of the software, not the OS) responsible for reading and writing pages. It usually has a LRU cache of pages which it has recently fetched from disk so it can sometimes return them quicker. Pages can come in several types that indicate what information is stored in them (data tuples, table definitions, indexes, mappings of tables to which pages contain data for that table) but the big one here is the data page. Pages are addressed by their page number, which is literally just the order they are in (usually). A data page holds data tuples. Data tuples are the rows in a table, and they can be logically addressed by (page_number, row_id).
> wouldn't moving json from disk to memory have to happen anyway? why would it be slower in a db than from disk?
It does have to happen anyway but when you get it from a database instead of ripping it straight from a file, first you have to go and find the data you want, and then call fopen. If you just know which file you want to rip json out of, then you can skip all the work of locating it, and just call fopen.
> and wouldn't reading the data from disk be faster since its a B tree, rather than reading the file sequentially?
indexes are b trees, data tuples are just sort of chucked in there in the order they are created usually unless you are doing something fancy like maintaining a physical sort order within the pages, which would be really expensive for CRUD operations as you would have to shuffle potentially your entire table around for every insert.
> then in scenario b, that would mean reading the file sequentially to load it from disk to memory,
nginx does this and it does it in fancy optimized ways that stream the file, rather than load the entire file into memory in one big buffer and then flush it out.
Scenario b is faster if you engineer the files to be laid out in such a way that you don't have to look for them. Placing them strategically means that you just know where they are based on filename. If you did have to search them with like grep and shit then yes that would be much slower.
You have some misconceptions about where exactly the b-tree comes into play. The b-tree powers indexes. To fetch indexed data you first consult the index by traversing its b-tree (fast), and then you still have to fetch the data from its data page if the index tuple wasn't indexing the field you wanted in the first place (which it wasn't in our scenario). The index IS way faster than doing a sequential scan of every datapage that has data for a given table, and checking each tuple in it for the one or however many your query wants. With an index you know the address of the data you want but you still have to fetch it off the disk (unless it's cached by the page manager but let's pretend it isn't).
The database CAN'T be faster than simply reading off a static file. It is simply more work to be done, work that is a superset of the work done by just ripping the file off the disk and out onto the network.
The limitation is that not every scenario allows you to engineer the database out of the picture. This is not a universally applicable strategy. The database offers flexibility and makes difficult things possible, but the realization here is that all you're really doing is serving a static file, and that this isn't necessarily a difficult thing (if you're clever about it).
-
@pernia
okay so scenario a:
You need json from a row in a database (one of your posts) because someone wanted you to serve it so that it federates or some shit. we also suppose the posts are indexed by like, id and that we have that from the request. The database has to check an index for the id of that post, which is pretty quick, BUT then it has to actually go get the json (it probably wouldn't be in the index -- this is called a covering index, if it has all the information you need --, because that would effectively double up the size of the object data) so it looks at the page number and row id pointed to by the index (this is the logical location of the data in the database).
Then the database asks the page manager for the page it needs. In this scenario that page is not in memory, and the page manager must read it from disk. with this page loaded into memory, we then grab the tuple we wnat BUT WAIT, the json data is bigger than the remaining size available in the page and spills over into another page so we have to ask the page manager for that page, and any subsequent spillover pages until we are done reading all the json we want (each page fetch necessitates a new disk read if that page is not already in memory, and, since these are pages full of nothing but json from one row, it's highly likely that they are not already in memory).
THEN after all that we can stream the json data out to whatever is handling the http response
OR
scenario b:
We receive a request for an object. We have cleverly named all our objects to that the file path maps to the url path, and have told nginx about this mapping and where to look.
nginx just serves the file (it is very fast at this), and the request never touches our backend.
-
@pwm hmm damn. explain to me the "+more work" i didn't think it would be like that
-
@pwm @pernia @vic "Yeah I Want My WEB SERVER To Also Be An ACME CLIENT" - statements dreamt by utterly deranged.
-
@pwm @pernia @vic Just put certbot/acme.sh/whatever into crontab, how fucking hard could that be.