wait a fucking second...
it has been ZERO DAYS since I found an off-by-one error in a game from 1990
wait a fucking second...
it has been ZERO DAYS since I found an off-by-one error in a game from 1990
So here's the zoomed out (F1) view in Railroad Tycoon.
It's supposed to show the whole map. It doesn't.
Here's some lakes north of Winnipeg. We can see that they go right up to the edge of the map, right?
So let's hit F4 and zoom in and NOPE.
The eastern part of the lake that goes off the map actually stops before touching it!
SIDNEY K. MEIER, THIS IS UNACCEPTABLE
The zoomed-out view is missing a single line of tiles from the top of the map!
@flyingsaceur heh
@foone (*squints) So you found one yesterday?
It does it on all 4 cardinal directions.
I notice there's a white rectangle around the viewport... did they accidentally make the 1-pixel-wide border cover up the edges of the map, instead of bordering it?
I noticed another error: the code that determines which shorelines to draw breaks at the edge of the map, making it look like there's a big vertical island out in the pacific
This only happens on the West and South borders. The North and East borders are phantom-island free.
Sounds like AN OFF BY ONE ERROR TO ME
apparently the save file grows by 4 bytes for every single piece of track you put down
I think I just found an off-by-two error?
if you are in F4 mode and right click on the (1,1) tile, it ends up focusing on the top-right instead of top-left
interesting: The game's internal coordinate system is a 16-bit unsigned integer, which starts in the top-left at 0,0, then it continues vertically down and then to the right.
although the coordinates are... weird. Some of them aren't on the map, I think.
yeah.
So the point I'd call (1,0) is point 200.
So 199 should be at (0,199) right? the bottom left coordinate?
NOPE! that point is 191.
Where's 191-199? NOT ON THE MAP
this would make some amount of sense if they were just stuffing the coordinates into a bitfield but that's 200 decimal, not hex, so... no.
also why combine the X+Y coordinates into this idea of point numbers? the map is 256x192.
You can store coordinates in two bytes and it'll make more sense and be simpler to deal with.
okay, the 4 bytes added are encoded like this:
16-bit ushort: tile number
8-bit int: track tile to use (from tracks.pic, 1-based)
8-bit flag:
1: bridge-type 1
2: bridge-type 2
4: has-bridge
16: double-wide track
bridge types are set up like:
00: tunnel
01: wood
10: stone
11: iron
@foone What's the largest possible valid save file?
@crobbler something like 212,576 bytes?
track tiles are <63
if the number has the 64 bit on, it means it's for the second player.
I'm guessing/hoping this is encoded the sensible way, with the third player having 128 on, and fourth player being 64|128?
arg. all this starts at a fixed offset, except it stops being fixed if you name a train
@crobbler it gets tight if you have more than one save, though, especially since you might be using 360kb 5.25" disks
@foone That's much better than I expected. Still fits on a floppy.
if bridge types 1 and 2 are on (11), it means there's a station there.
trying to figure out the naming thing in a highly scientific way:
I keep rewinding time and naming my train with increasingly long names, one character at a time
I have ASS, FUCK, and BITCH so far
crap. I can't find any obvious header stuff that changes length based on the name of your train name.
and I can't read it backwards because I don't know how many tracks there are!
lemme hit it from the other end:
maybe there's a header field that says how many tracks there are? then I can just read backwards from EOF
got it. offset 0x3738 is a 16-bit int of how many tracks there are.
so we just need to SEEK_END to -4*num_tracks, and it works fine
huh. I somehow bugged the game without even trying!
I started a new map, connected La Crosse to Waterloo, and built a train at La Crosse to take mail and passengers between the two.
It drives to Waterloo, then complains it can't get back to La Crosse. The path is "impossible" it says.
for some reason it thinks they both go way off to the west.
I want to test my theory on how railroad player numbering works, but player 3 refuses to build any fucking track!
STOP PLAYING THE STOCK MARKET AND BUILD SOME FUCKING TRAINS
FINALLY.
yeah, it works how I expected.
also my track number thing is wrong.
that seems to just be for P1 tracks, so I need to figure out where the other player track counts are and sum them together
I found the other player's track count, which seems to store all the other players together... plus 1024.
I don't know why. But yeah, it stores it as a 16bit int right after the player's count (at 0x3738), but it's offset by 1024. Strange.
I also found another bug: There's a limit to how long a train can pathfind. I set up a Grand Rapids MI to Mobile, AL route, which goes through San Francisco, and the train tells me it's an impossible route. No, it's not, it just can't pathfind that far
also the "miles of track" indication on your end of fiscal period summary is incorrect. that route is well above 4500 miles, but my summary says I've build 1314 miles of track
interesting: the game doesn't defragment the track list.
so if you build 8 track segments, then delete some in the middle, it doesn't re-order them, it just overwrites those track segments with all zeros
and it never fills those spots back in, it seems. it just adds them to the end.
OH MY GOD IT'S A BLOCKCHAIN!
(/joke)
I have a bad idea:
so the game stores saves in two files.
the RR?.SVE file, and the RR?.MAP file. (? is 0-3, for each of the four slots)
What if I swap them around? like, save to slot 1, then make a new game in slot 2, then swap the maps around? how will that work, if at all?
so I've connected Chișinău, Moldova to Vinnytsia, Ukraine.
or Kisinev to Vinnica, as the game says.
and now I'm connecting Knoxville, Tennessee to Lexington, Kentucky.
TIME FOR A SWITCHAROO!
So, it turns out stations remember what city they're connected to, even when they're now in the ocean. The tracks instantly turned to ferries, and Vinnytsia still accepts all cargo (as if there's a city connected) but doesn't produce any mail or passengers, because there are no mermaids in the atlantic
Belgrade is in Maryland now. That seems normal.
I think it's because there's some data in the .SVE file that tells it what map we're on, so it's using Europe city locations. Even though there's no city tiles there (because that's stored in the map), it knows this is the general Belgrade location
(notice that this is clearly eastern america but my current is in pounds)
uhhh. I think this one is my fault
found it: the byte at 0x38E4.
00: Eastern US
01: Western US
02: England
03: Europe
@kawa ahh, yes. Project Azorian.
@foone True, there aren't any mermaids in the Atlantic.
Not since The Incident.
@mos_8502 I think the video card is in a text mode, but the game thinks it's in a graphics mode
@foone Did the terminal driver explode?
@foone new pride flag dropped
@lilbatscholar don't even joke about that. I'm always on the hunt for more pride flags to add to my pride flag program
ahh, I figured out why my old hacking attempted died out: I don't know how to decode the images in the PIC/MAP files.
(it uses the same format for maps as for images, yes)
but I have a workaround where I use a hacked version of the EXE which renders a given image as the startup logo
the problem is that this renders it slightly wrong, I think. some palette indices are mapped to the same colors
which wouldn't be a problem, it looks fine, except the MAP variant of the PIC format uses that to distinguish different map features
so like, here on my test map we've got a lumber mill and a coal mine
but on the MAP file, both of those end up as that purple color
so I need to figure out on my own how to decode the PIC/MAP files.
which is a problem because they're compressed.
and you'd think they'd skip out on compressing saved games, but NOPE! they do.
I think it's some kind of LZ* compression, given how it seems back-references change when I save two maps with one pixel difference
it's also possible this is a wild goose chase and the different industry blocks are stored in the SVE file, not the MAP
nope. it's definitely in the map. I did the SVE/MAP switcheroo and it still knows a factory is a factory.
back to the decompilation mines
oh goody function pointers, my favorite
oh yeah they interleaved the write-file and read-file functions by misusing JMP
because to have fread() and fwrite() as separate functions would waste something like 33 bytes of code!
so it opens the file, sets a pointer to a function, then calls another function which calls the pointed-at-function, which reads 512 bytes.
that could have been simpler
oh god I think I found the compression functions but it seems it involves abusing the stack. like, it switches to a separate stack when it's time to decompress
and it's not a normal decompression function where there's a in-buffer and out-buffer.
instead it's interleaved with the read function, because the data to decompress could be as big as 14 kilobytes, and we can't waste that much RAM
I got distracted and extracted the text out of an apple II game about sex.
this is what happens when you have ADHD and a reverse engineering soul
"JMP BP"
yeah, this assembly was written by a human.
emphasis on ASSembly
WHY WOULD YOU JUMP TO THE FRAME POINTER?
it's at the start of the function, too.
like, you just called a function, why not jump to the frame pointer?
that makes no sense!
I set a breakpoint at the address 1000:0AD6
it hit at 093D:7D0A
bug in DOSBox?
No! just the weirdness that is 16-bit segmented x86!
I'm afraid, Dave. Dave, my mind is going. I can feel it. I can feel it. My mind is going. There is no question about it. I can feel it.
the whole thing about call BP makes even less sense when you follow up the call chain and find out that it sets BP to SP.
So the frame pointer is set to the stack pointer. So I guess they're just pushing a short call address to the stack, and jumping to that?
oh god is this some kind of weird recursion?
oh okay, I think the (de)compression routine is line based.
I found that the "decompress wrapper" function is getting called with CX set to 320, which is how wide the screen (and the saved maps) are
A fun thing about the saved maps is that they're 320x200.
The game maps are actually only 256x192, but it saves more than that
ahh, 0x2E96, a hex number that makes floating point calculators confused
naturally in the middle of the "decompress the saved-game map" function it thunks out to another EXE. I'm not even sure how that's possible on DOS, unless they implemented their own EXE loader?
but yeah it happens because they have the graphics code implemented in separate EXE files, and the game thunks into them. Even when loading a map
wait the code is loading a 320x200 pixel bitmap and immediately drawing it into VRAM
but this game supports CGA
does it have to use a completely different map loading routine for CGA mode!?
I don't understand. It's loading the map file and writing it right out to VRAM.
It's not saved in RAM anywhere that I see. So does it just read the map back out VRAM to be able to process it?
rails
why is there an EXE header at offset 0x17400 in your binary?
and 0x15800
and 0x16600
and 0x18600
and 0x19000
and 0x19800
and 0x1A000
and 0x1AC00
and 0x1B400
and 0x1C200
and 0x1CE00
and... THEY KEEP GOING
this file has INTERNAL OVERLAYS?!
@tthbaltazar no, a dos header, and then many more DOS headers
@foone
you mean a dos header pointing to a coff header?
I thought the UNP extraction stuff had removed those and merged it all together.
apparently not.
this is a slight nightmare.
my program has RET'd into a function that simply doesn't exist in the version my ghidra sees
in other words, TIME FOR GO TO BED
at least I know where the decompression functions are. I can extract those and analyze them better.
but NOT TODAY
@foone
Player 3 is late-stage capitalism?
@smammy which is impressive, because this game takes place at the height of robber barons! that was pretty late-stage capitalism in itself
@foone railroad tycoon. Recently covered this on my amiga cracking stream. Easy to break which Co fused me because by most accounts the cracked fucked this one up royally back in the day.
@h0ffman neat!
hopefully that's not the case here, because just about every DOS copy I've seen in the wild has been cracked, and who knows what they broke along the way?
okay manually stepping through the decompression confirms my theory that the decompressed MAP file contains data specifying which types of tile there are, data that's subsequently discarded by the renderer.
So far, I've got this:
00
01 Ocean
02 Clear
03 Forest
04
05 Coal Mine
06
07 FootHills
08 Oil Well
09 River
0A
0B Hills
0C Village
0D
0E
0F Mountains
okay I've made a key discovery that makes some of the weirdness make more sense:
it doesn't just read the map file once.
It reads it at least twice.
So I think it reads it to show you, then it reads it again to populate the internal map storage
I found this out by modifying it while it is being loaded: the changes appear on the preview image, but not on the map itself, and my breakpoint for the decode-this-memory-chunk function kept getting hit after the map was done loading
well I stepped through all the invocations of this function (which I have insightfully named fopen_wrapper_wrapper1) from when you select a save game to load, and it turns out it loads the map, then several static PIC files, and that's all.
So... the map is loaded elsewhere.
so it loads the MAP file
then faces.pic
sprites.pic
track.pic
locos1.pic...
and nothing else.
So either it's smuggling out the map data and I'm wrong about it being loaded twice, or there's a separate fopen function somewhere else.
time to drop down to watching interrupts!
so from selecting save, it calls int 21,AH=3D with:
RR0.SVE
RR0.MAP
its own EXE (overlays!!!! *shakes fist*)
FACES.PIC
CITIES1.DTA
SPRITES.PIC
TRACKS.PIC
the EXE again
LOCOS1.PIC
and that's all.
No double-load. So it must be doing some tricky map data smuggling that I'm not aware off
but wait, if it's smuggling out the map data, why don't my changes get reflected in the loaded level?
Is it re-decompressing it, maybe?
if so, there's only one place it could do that... and it's in an overlay.
fuck.
and that overlay is NEVER CALLED!
please, PLEASE tell me they don't save the map inside the graphics overlay(s)
that would a terrible separation of functionality.
this function is DOUBLE THUNKED?!
whelp, I dug into the EGRAPHICS.EXE file and found what was getting called, and it turns out... it's writing the pixels to VRAM.
that's all. No map smuggling here.
EGRAPHICS.EXE is amusing. The entry pointer leads to a function at 1000:0010, which starts with:
INC BP
INC DI
PUSH DX
INC CX
PUSH AX
DEC AX
DEC CX
INC BX
INC BP
POP AX
INC BP
which doesn't make a lot of sense, honestly.
that's because it's not x86 code.
it's the ASCII string "EGRAPHIC.EXE"
to which all I can say is:
INC ESI
PUSH EBP
INC EBX
DEC EBX
POP EDI
POP ECX
DEC EDI
PUSH EBP
okay, time to take another approach:
I'm gonna go into debug mode, then capture one line of decompressed image. Then I'll finish the decompression, and dump the RAM of DOSBox, and search to see if that line still exists in RAM, and if so, where.
this'll obviously fail if they store it in a modified form, however
or worse, they never store it uncompressed at all.
maybe they just re-decompress the needed lines for the lap every time you pan around?
this game WILL run on an original IBM PC (if upgraded to 512kb of ram)
maybe they just really slowly decode it to save RAM?
unrelated note, I should hack the amiga version too
it looks like they went all-out on redrawing a LOT of the graphics
interestingly, Railroad Tycoon Deluxe seems to use basically the same save/map format... except they don't compress the maps.
I have no idea why they'd change that.
See? Uncompressed.
if I can figure out the compression method I may be able to re-inject the Deluxe maps into the regular game.
that'd be good, because Railroad Tycoon Deluxe is a sin against anyone with eyes
it's not a question anyone has asked, but now you know:
you can't copy the xgraphic.exe file from Deluxe into the original game. it doesn't work
and it seems that railroad tycoon and railroad tycoon deluxe cannot read each other's PIC formats.
@viraptor just the usual, wanting to look inside games I spent a lot of time with
@foone what's the end goal? Extracting assets, or full recompilation?
okay it definitely doesn't store the maps in memory as-decoded. I did a search: nothing.
@mima I know, right? it ruins all my plans!
@foone oh, I guess I need new Friday night plans.
I realized my plan to try and VM-hax the decompression routine won't really work simply, because of the annoying side-effect of how the disk-reading works:
basically, while decompressing, it'll run out of data to decompress and then jump back into the read-file function through a function pointer
so realistically to VM-hax the decompression routine, I'd need to stuff not only the decompression routine into a VM, but my own code to set up the decompression routine and to emulate file access
so what if I attack it in a completely different way?
I've got it running in DOSBox already... what if I attack THE VIDEO DRIVER? it gets called with the pointer to each decoded line! I could modify it to write that out to somewhere I control
or I could actually figure out the compression algorithm.
like an adult.
but I don't wanna!!!!
okay so I have extracted some data. I have a 48-byte file that I know extracts to a 320-byte map line.
it's actually more like 160 bytes since only the lower nibble is used, but still.
20 61 in the input results in a 01 written to the output.
how.
why?
we may never know
OH GOD I DESTROYED THE WEST COAST AGAIN
and now the area around Vancouver is a desert that looks like a swamp
@SteveSyfuhs it is now, but it wasn't when I started!
@foone Foone I believe that's the east coast
So, this decompression function:
3+1 functions interleaved. (The +1 is for the external read-disk which can be called at any time.)
It uses 13 global variables
SI, DI, DX, and CX are used to track data between the functions.
And if that wasn't bad enough, it features TWO STACKS which it swaps a few times
of the 13 global variables, I currently understand one of them, and maybe half of another
if that wasn't bad enough, the middle of the three functions DOES NOT HAVE A RETURN INSTRUCTION.
Instead it has some code which does some weird things with popping the frame pointer off the stack, then another variable, then jumping to the frame pointer
it's possible this code doesn't make sense because it's not all the code.
with these weird push/pop/jmp instructions, it might be effectively encoding short JMPs into the compressed data
I have figured out TWO global variables now.
one of them specifies if we're decoding 8-bit bytes or 16-bit words.
so the for loop inside the internal decompress, it starts at 8, compares it is less than 9, and then every loop through it adds 16?
WHAT
I think I'm up to three
but I also found a table that's 768 bytes long. so I'm also kinda down by 767
and experimentally that weird stack-jump only goes to two locations in the calling method. so not as nightmarish as I feared
I've run out of brains for today. But I will continue
so I think this compression format is specialized to writing nibbles. like, I think you always get results that are of the format 0x0N where N=0-F.
and the 16bit mode actually works by writing 0x0N00.
fortunately I don't think I need to care about 16bit mode? it's available there but there's no code that can turn it on, I think
it might actually be 0x0N0N? but again, it doesn't seem to be possible to turn it on, so... who cares
god this is terrible.
so there's like three ways things can get pushed onto the secondary stack, not counting the one ACTUAL CALL that still happens, despite this not being a "real" stack, but some of them are out of the "dictionary" which is even more confusing, and some are seemly... from saved values? saved when?
I can't easily tell WHEN they get saved because, of course, ghidra can't track when this code accesses [8112] because it doesn't know when the data segment changes. for this code the answer is "never" but it doesn't know that
and the dictionary makes no sense.
it's 768 bytes long, and it's formatted as:
FF FF 00
FF FF 01
FF FF 02
FF FF 03
...
FF FF FE
FF FF FF
okay great. but then there's code that searches it to find a "slot", looking for one where the FF FF is FF FF. So clearly that's a marker saying it's an empty slot, and it's trying to find a free slot, right?
except I step through the decompression of the first line of code and THE DICTIONARY NEVER CHANGES
I think I need to officially stop looking at the decompilation for this code. it is SO misleading
REVERSE ENGINEERING COMPRESSION CODE IS THE WORST
this is actually decompression code, to be accurate.
and I know there IS compression code somewhere in this binary.
I should probably go find that. maybe it'll be easier to understand that, then write my own decompression code based on it
I hope I'm misunderstanding this code, because if I'm not, they fucked up their map-writing code by allocating 24 bytes less than they needed
the programmers of this code don't seem to believe in passing arguments between functions except in extreme conditions
or they pass width/height of a bitmap into fopen() and then fopen() does nothing with them, just stuffs them into global variables
<Garth Marenghi> I KNOW PROGRAMMERS WHO USE PURE FUNCTIONS AND THEY'RE ALL COWARDS
they're also checking the return code of this function after calling it, but it always returns 0.
if it fails, the call to allocate memory leaves 00 in AX. If it doesn't fail, it explicitly sets AX to 0.
so no matter what happens, this function returns zero.
Did it fail to allocate ram? TOO BAD, WE CONTINUE ON!
@sekoiatree yes! some of which are reused for different purposes in different situations
@foone@digipres.club wh.. what do they do instead, global variables?
@foone Have you written more software than you’ve used?
@michaelgemar that would be hard to do!
THE SAVE GAME FUNCTION THUNKS INTO THE VIDEO DRIVER OVERLAY!?
I need to go find my copy of Sid Meier's autobiography. maybe there's a whole section on what exactly he was smoking back in the late 80s
oh sweet lord jesus it's reading from the video ram
@aburka AHHHHHHHHHHHHHHHH
@foone The sample code for the API I'm using just puts parameters into environment variables and then calls the pure function
WHY WOULD YOU STORE YOUR MAP IN THE VIDEO RAM YOU FUCKING WEIRDO
dosbox-x needs a "dump vram" option
any the weird thing is that I don't think the map is IN vram when it goes to save it.
it specifically draws it to vram, then reads it back out
but I found the compression code. it's... simpler? slightly? at least it seems to only use ONE goddamn stack
AHH THE SAVE GAME FUNCTION IS CALLED FROM ANOTHER OVERLAY
if it's the fucking graphics driver I'm going to set this game on fire
thank fuck, it's not, it's one of the other internal overlays.
the ones I don't yet have analyzed in any way. I need to figure out how to easily extract all of those and stuff them into ghidra in some fashion
minor breakthrough:
the save_game_map function repeatedly calls compress_map(N,M):
N is a repeat count, M is the tile it's saving!
So there's some very basic RLE going on here.
now, compress_map does the following:
first, it checks for the tile being 0x90. I don't know what 0x90 means. ignoring that.
it checks to see if the repeat_count is > 3: if so, it calls inner_compress with the tile, then again with 0x90, then again with a calculation based on the tile count. It does this in a loop, counting down to tile_count = 3 (decrementing it every time)
then it loops trough the last 3 tile counts, calling inner_compress with the tile.
and if the repeat count is less than (or equal to 3), it just calls inner_compress with the tile, that number times.
which kinda seems like it'd write a file like this:
all bytes are written as themselves, except 0x90, which is written as 0x90 00.
repeats are written as the tile, then 0x90, then a repeat count, which I think might just be the number minus one
very simple RLE, right?
except that's absolutely not what the file ends up looking like.
I think this compression looks so weird because it's actually two compression methods on top of each other. They implemented a basic RLE format, then after it didn't compress well enough, they plugged a second layer on top of (or under?) it
because that's the thing with this format: as is, it'd make pretty bad use of the space. most bytes are going to be 0x0N or 0x90 or occasional spans longer than 16.
so I think they added another layer of indirection, doing something like encoding the 4 bits used for tiles into less than that, with special handling of some sort for the 90 and the spans. like some bit-based encoding with variable-length integers
@foone I believe that in one episode Garth said he had written more books than he had read.
@michaelgemar oh right, I forgot the reference!
@clairely_undaunted I think it just means that you too have been slightly damaged by programming for so long
@foone i guess after a while one has “seen it all”
@foone it’s hillarious that nearly all the things you find astounding in this process make sense to me
okay so, the first level of encoding is RLE.
The map is RLE'd into the following stream of tuples:
18 01
0b 03
02 01
03 03
01 0c
05 03
01 09
where it's the tile followed by the repeat count.
that gets turned into a stream of bytes passed into inner_compress, where bytes are represented as themselves, as repeats (>3) are are NN 90 MM, where NN is the byte and MM is the repeat count.
So the above turns into:
01 90 18
03 90 0B
01 01
03 03 03
0C
03 90 05
09
now the final transformation is the one I don't understand, but I can cheat and look at the output file to know what it turns into.
It turns into this, although I'm not sure where it stops:
20 61 18 00 69 41 80 00 03 12 32 20 58 20 01 41 07 0E 21 11 10 40 F0 40
so yeah. Compression is a 3 step process:
Map -> RLE tuples -> RLE Bytestream -> Encoded bits on disk
I understand the first and second transformations, and I don't the third.
so we've got a function that should write 3 16-bit words to memory. where do we store them?
1. on the stack?
2. AX, BX, CX?
3. pass a pointer to memory?
4. FUCK THAT NOISE, we store 1 in AX, the other in BP (yes the frame pointer), and the third? it's a hardcoded memory location. Fuck you.
guess which one railroad tycoon picked?
well I can at least put aside the brainhurty figuring out stuff for some bookkeeping. Now that I know where to inject, I can inject my own tiles into the encoding process, and map out what the rest of the tile types are
@SvenGeier as MST3K said, "It's the Eighties. Do a lot of coke and vote for Ronald Reagan!"
@foone this may be the right moment to note that much of what we call "the eighties" today was mostly an epiphenomenon of the production, distribution, and consumption of cocaine...
I've Trantored/Coruscanted the west coast!
it's all cities, all the time.
well, mostly. some tiles didn't get changed to cities (I'm not sure why) and some cities are actually industries. It seems it uses the same tile number for cities and industries, and determines what's what based on their position?
ugh I keep crashing this game that I'm hacking the code of at runtime
who could have foreseen this?
okay finally got it working.
Here we go. From Alabama to the Pacific ocean (and beyond, technically), it's all city.
And here's what happens when you zoom in: It's not ACTUALLY all city, every 8th tile is actually industry
@Nekoplanet Nah, it came out on PC first. There were ports, but they were all to 16bit systems like the PC (and came years later).
Although who knows? maybe they prototyped it on something else and then ported to PC
@foone the more you describe the system, the more I think this was an app not designed originally on pc. Maybe it started on an 8 bit system?
now here's the question: does the industry pattern break at the bottom of the screen? because that'd mean it's going by tile numbers (which are discontinuous), not the x/y coords
it's discontinuous!
so yeah, it uses tile numbers.
so it turns out it uses the positional thing for more than just cities.
tile_type 5 (which I previously had marked as "coal mine", turns out to be a patchwork of lumber mills, coal mines, and oil wells!
here's the really weird part. This map, despite looking like that, is actually all just the same tile. it's tons and tons of the same tile.
So it should compress really well and look sensible in the output file, right?
NOPE. it's 5k and makes no more sense.
here's the 0A ("farm") tile: it turns out it's a mix of farms, ranches, and grain elevators.
@clairely_undaunted I'm sure that's completely unrelated to the instance you're on. It's not like you'd have any interest in "no thoughts, head empty" :)
@foone i just know how the sausage was/is made, i look forward to blissful ignorance someday
0b (foothills) are not special. set it to all 0b, and you really get all 0b
villages are also paper mills, stock yards, factories, and food proc.
@clairely_undaunted heh. makes me think of a security engineer who is really into hypnosis because it's just finding exploits in human brains!
@foone absofuckinglutely though tbh it’s just a different type of programming
okay checked the rest (except rivers, because... no)
so 05 and 08 are mines, 0a is farms/ranches/grain elevators, 0c is village/industries, 0e is cities and industries, and the rest (at least in 16 and under) are single-types.
judging by how this works
I think it might be writing data files in 47-bit chunks?
so, a datapoint:
a map entirely full of 01 and a map entirely full of 02 are different by 4 bytes, scattered around the 5k file.
it's possible the intermediate RLE is making this weirder than it should be. I should try hacking the map to be fully one type PRIOR to the intermediate RLE step
so I need to write 256 bytes into RAM, without a pointer to it, in... 5 bytes.
oh boy
okay maybe I can take out a function call
that gives me like, 45 bytes. more than enough
MOV AL, 01
MOV DI, [BP-6]
MOV CX, 320
REP STOSB
that's 10 bytes.
well my code was supposed to generate a world that's entirely water, but I got the 05 or 08 world. strange.
@Computeum good point. I'll stick a CLD in there
@foone Mind the direction flag
oh I'm a doofus.
I was trying to hack the game to save a game where every tile is 01, so that it would be "pure" and not mixed 01 tiles and 01-spans.
but I have already made the mixed map: I can just load it, and SAVE IT BACK OUT
and done.
I was so busy trying to use my HACKER SKILLS to realize I didn't need to
okay so a map of all the same tile results in an 83 byte file, which only varies in one byte (0x7) which is the specified byte that it's full of.
now the reason it's not smaller is that a map actually looks like this:
It's a 320x200 pixel image, with only the top-left 256x192 actually used for the map. Most of those 83 bytes are probably encoding the rest of this area
NOPE! I decoded (using my load-it-as-a-logo trick) the all-1 and all-2 tiles and got this. They're fully one color.
so what the heck
well I interestingly corrupted it
yeah I'm randomly corrupting it, and this isn't any RLE. This is definitely some kind of LZ*-style compression
who wants to play Railroad Tycoon on Memory Uninitialized World?
fun fact: because the saved game map files are the same format as PIC, you can swap them around. Want to build a railroad on Lenin's face? YOU CAN!
@jtlg that's the left image!
@foone What does the zoomed-out version look like?
@tryst I'll have to do that next. Most of my TV stuff is analog ntsc but I'm sure I can hook that up somehow
@foone For a worrying moment I thought this was Railroad Tycoon over SSTV
god help me I don't want to have to write an extension to link ghidra and dosbox but that might be easier on my brain than figuring out this whole compression algorithm without being able to see all the RAM at once
I might just write my own debugger. the dosbox one is slightly terrible.
basically I need to watch the input buffer, the output buffer, about 15 random global variables, the half dozen or so local variables across 4 functions, and then a 768 byte dictionary
@foone that first picture looks like what I call Rainbow Death Snow when it appears on HD-SDI video sources that have Become Unhappy
Feed it to a Harmonic Electra X encoder and you receive... a Harmonic Electra X non-encoder, until you reboot it and are subjected to the longest five minutes of your entire life as you're dead in the water and off air
it's kinda telling about how my brain works that I keep running into reverse engineering projects that stall out with "to continue, I need to write my own debugger for the platform I'm targeting"
@curtmack yeah, something like that would be awesome
@foone An idea I've been mulling for a while is to hack up a copy of MAME/MESS so it dumps the entire state of the system whenever something changes. The idea is that you could then build external tools to sort through this data and trace cause and effect, which could be a huge boon to reverse engineers. And of course it would be able to keep track of things like memory segments and overlays (or for game consoles, RAM and ROM banking) as the code is executing.
@StompyRobot yes why not? why not turn a simple "hack one game" project into a multi-year epic I'll never finish?
@foone but what if you wrote a generic, retargetable, debugger construction kit?
@curtmack yeah. and even if slow, you could have it be togglable: you turn it on, hit GO on the one function you're reversing, wait for it to finish, then turn it off
@foone Obviously this would slow down emulation, but I think it would still be bearable, if not fullspeed. MESS could send a simple command to an external process telling it what changed, and the external process could manage recording to a database.
I think I'm gonna try abusing external memory reading into letting me watch the "dictionary" array through the course of a compression. that might tell me some interesting info
that's one of the problems with being a game hacker: you don't use "good" solutions. you might try editing the RAM of your program from an external program. and once you realize you can do that, you realize you can use it everywhere
here's why I hate reversing compression code:
there's no shortcuts.
I am a fucking MASTER at finding shortcuts. I have endless tools and tricks and "I know a guy"s and research methods and everything. I can get out of just about any work I supposedly need to do, by doing things in a different way.
not compression.
compression is the ultimate "you stare at this code until you understand it. good fucking luck"
and you can't avoid it. you just have to
and I have EXACTLY the wrong kind of brain to be any good at that. my brain is optimized for "let's take a different route and avoid the long slog"
I do not have a smart brain, I have a clever brain. they are not the same. and this is annoyingly one problem I can't clever may way around
@Sweetshark exactly
@foone So you are featuring a breadth-first search brain? Welcome to the club!
this debugger needs a "step backwards" command
@foone you know about debuggers that do have such a command then, yes
@thorsummoner yes. dosbox-x doesn't have it, though
ahh I've hit another function where ghidra has decided it can no longer rename variables
that certainly helps me reverse this shit! I have to remember what uVar5 means
it's physically impossible to stare at x86 code for long without wanting to design a RISC architecture
because good lord
although it's amusing when you see code like:
MOV BX, AX
ADD BX, AX
ADD BX, AX
SHL BX, 1
MOV AX, word ptr[BX+4]
to read data from AX*6+4.
like come on, x86. don't you have a way to do a multiplication+addition pointer interdirected read in a single instruction? you're losing your touch!
it probably does but this code is targeting 8086 so it couldn't use it yet
maybe I just need to write an x86 emulator
like you do
I haven't written an x86 emulator since... oh yeah the last time I tried to reverse engineer a compression algorithm
@alys it was at 3227 days! I was out, but they pulled me back in
@foone time for a DAYS SINCE FOONE WROTE AN X86 EMULATOR counter.
I've gotten weirder and I tried dumping a compressed map into a bitmap renderer, and there's a suspicious pattern at 9-bits wide.
is this a 9-bits-per-element format? why would you do this, sidney?
okay it only appears in my super-compressed files. not in the semi-compressed ones or the regular compressed files.
so it's probably nothing. sadly.
okay so I've done an interesting experiment: I turned off the RLE.
So the fully-compressed max-RLE version of the file is 83 bytes.
Turning off RLE means that the intermediate compression is just 01 for 64,000 times.
So how big is the output file, given that a standard map is something like 13k?
it's 423 bytes!
and there's again an interesting pattern at 9-bits wide.
made a BIG discovery:
the labs.PIC file in railroad tycoon is identical to the one in Covert Action, released by Microprose the same year.
and you know WHY that's a big deal?
THERE'S TOOLS THAT CAN EXTRACT COVERT ACTION FILES!
it's LZW! the same one that powers GIF
SID MEIER'S RAILROAD TYCOON VIOLATED THE UNISYS GIF PATENT
well it's not decoding it exactly right, but it's... close?
And it works. We have decoding of Railroad Tycoon MAP files!
this thread has been going for a week and I finally got the result I needed and the answer has been sitting in a github repo since 2018!
(okay not really. I had to modify the code, and the modifications were only possible using what I learned over this week)
@andrea yeah! I've thought about doing something like that for DOS games, plus a strings database to help match reused libraries
@foone I always dreamed of having a public hash database, where you put the hash of a file, and it tells you whether the file is used somewhere else
so I need to do some stuff to modify this to not write PNG files but instead write binary files I can easily parse, and then I need to reverse engineer the algorithm for handling the weird tiles, where what they are depends on their position
as well as put this code online and make a pull request.
but all that can wait. it's 11pm and I've been working on this project for A WEEK and my brain hurts
time to close every tab and program I have open.
it's over. I won.
all that's left is bookkeeping
@viraptor sigh.
I'LL DO IT TOMORROW
assuming I can. I have limited colors here!
@foone it's over? I honestly expected to see a trans flag as a map tomorrow 😂
@viraptor what do you want me to do, remap the palette used by the game?
actually I should research that
@andrea the image format isn't the problem, it's just that I don't need the conversion to colors. I want it as numbers 0-15. So I'm just gonna fwrite() it
@foone not an expert, but PPM might be the right image format for you: it's a short header followed by 8-bit encoding of each pixel, uncompressed
@NanoSector ahh, x86. the lingua franca of computers
@foone Is this how Xorg got its x86 emulator to run vbioses :blobcatthink:
@dpflug I sure did! I'll never learn!
@foone
Did you just clever your way around this?
bah. dosbox doesn't let me set breakpoints on io ports.
WHY NOT?
ugh. so the palette-setup call is into an external overlay and called from an internal overlay.
I hate 16bit DOS programming
I'm sure that's fine
@Ephraim_Bane always!
@foone Is there more to do with this format?
so the EGA/VGA driver handles colors in a weird way: The main EXE passes a list of 16 colors, and then the driver does this to all of them:
((x&0xF)+8)&0x17
why not JUST CHANGE THE LIST OF COLORS YOU TELL IT TO SET?
that list, btw, is the following:
00 01 02 03 04 00 06 07 08 09 0A 0B 0C 0D 0E 0F
COLOR 5 KNOWS WHAT IT FUCKING DID
this translation means the actual colors end up being:
00 01 02 03 04 00 06 07 10 11 12 13 14 15 16 17
I don't know why they didn't want to use that last color slot.
@Flux yep
@foone does it really drop the 4th bit?
Turning off the color-shifting logic makes the intro a bit uglier
it's still not working. I think something deeply fucky is happening inside my virtual VGA card that's making it not properly set EGA colors.
@HoustonDog I'm not sure. The game has separate graphics drivers for hercules
@foone
can it do Hercules Graphics card mode … say, 640x350 with something resembling sixteen “shades” of crosshatching or brightness to differentiate?
* may look ‘better’ to modern eyes
(using “graphics” in their product name always seemed a stretch)
I don't get it.
I'm using the following:
INP 0x3CA; Clear 3c0 index
OUTP 0x3C0, 01 ; set 3c0 index to 1
OUTP 0x3C0, 0x27 ; set palette entry 1 to 0x27: 100111: blue/green/red on, secondary red on. This should result in color FFAAAA: a pink.
INP 0x3CA
OUTP 0x3C0, 0x20 ; re-enable video
but instead of a pink color for index 1, I'm getting grey. AAAAAA grey.
it's in EGA/VGA video mode 0D: that's 320x200, 16 colors, 8 pages.
I don't know why it'd be interpreting my colors as weird alternative things
that grey color is color index 07: so it's acting like it's truncating the high-bits.
ugh. maybe the mod register has the shift register turned off? that supposedly enables "CGA graphics mode" which might mean 16-colors only.
unfortunately that's a read-only register
ah-ha! I found a hint:
EGA apparently only supports color remapping in 350-line mode!
yeah that's fine
there's my pink!
it's just rendering completely incorrectly because I'm in color mode 0D instead of 10, so I'm seeing WAY more than I should and things are kinda terrible and different.
so one way I could fix this is to stay in mode 0D (320x200x16) but use the VGA palette registers instead of the EGA registers.
That'd lose the compatibility with EGA, but it's 2023, I don't think I need to worry TOO hard about that.
it helps if you don't have your DOSBox set to emulate an EGA card, I'm sure
THAT WAS THE WRONG INDEX OH GOD
@foone I would listen to a podcast that is nothing but highly esoteric DOS era tribal knowledge which has since become LORE
@shram86 god yeah
but there we go.
There's my pink. Just needed to use the VGA registers instead of the EGA registers
there we go. Using my patches to pic2png/png2pic and modifying EGRAPHICS.EXE to program the colors differently and a minor fix to the GAME.EXE file to send post-processed colors instead of the default ones...
BEHOLD: THE Trans(continental) Railroad Tycoon!
now I just need to write this onto a floppy disk (or two) so I can take it down to my pentium and get a picture on a CRT monitor
I need to figure out where to stuff 60 bytes of VGA manipulation code though, my existing solution is kinda DOSBox-specific
turns out the copyright notice is 63 bytes long
@JLab8 No, the original. I have a lot of hate for deluxe
@foone deluxe?
@ranvel well, I mean, this is the video card where they programmed the BIOS to the EEPROM backwards because someone got the address pins backwards when designing the PCB, so they just reversed the order of the bytes
@foone it sounds like what is quickly becoming apparent is that EGA wasn't just poorly documented, someone was covering their tracks so that no one would discover all of their poor decisions
MY FLOPPY DISK HAS DEVELOPED A BAD TRACK ZERO?
TRANSPHOBIA!
and another.
I even tried on a different disk.
HOW MANY FLOPPIES ARE GONNA DIE ON MY TODAY?
I grabbed another random floppy disk and it turns out it has been formatted with NTFS.
I love that it's 2023 and I just did "copy *.* A:" like it's 1981.
DOS is forever. DOS will never die
There we go. The trans-continental railroad tycoon.
if you go into the game to see what's what, it turns out the blue is clear land, the pink is ocean, and the white is mountains
also, my trans agenda has apparently sunk the city of Redding, CA.
@foone totally here for the cool and refreshing pink lemonade ocean
@thepi that's the gender fluid!
so I figured out the patterns of the tile types 5 and 8 (they're identical!) and E (it repeats vertically every 32 vertical tiles and pretends the map is 205 tiles tall) but tile 0xA is kicking my ass.
it's... weird. like I think it might be repeating every... 224 tiles? or 219, depending on how tall it thinks the map is?
ugh. I'm gonna have to write some hacky code to explore the map and figure out what tiles are at what locations. that's gonna be a pain.
anyway here's my partial rendering of a full Railroad Tycoon map. All unhandled tiles (4, 9, A, C, D) are in black.
You ever stay up until 5:30 am and you're really really tired but you also kinda want to write some image-recognition code to extract Railroad Tycoon maps from the game?
I have taken 1,324 screenshots. that should help.
okay I have now similar-image recognized my 1324 screenshots.
There are apparently 347 positions where grain mills can spawn, and 353 locations where livestock farms can spawn.
now here's the real question: do I figure out the pattern? or do I just stuff that list into my script?
FUCK something in my recognition is wrong. Those numbers are not correct
FUCK
my recognition is right.
my screenshots were NOT.
I think my program was taking screenshots while the game was still redrawing the screen, and so it got partial map updates
yeah look.
two cursors. that's not right.
ARGH this process already takes 15 minutes to run and now I need to fix it by slowing it down more
I ran it again... and it didn't work. FUCK
okay. regenerated.
197 grains, 196 livestock.
(0, 118), (1, 137), (2, 156), (3, 175), (8, 14), (9, 33), (10, 52), (11, 71), (12, 90), (13, 109), (14, 128), (15, 147), (16, 166), (17, 185), (21, 5), (22, 24),
what kind of pattern even is this
okay I gave up and just stuck the list of known positions into the script.
Here it is with 0xA (Farms/Grain/Ranch) added
this is terrible. did you even pay attention to what you were making, Sidney?
I of course mean these 4 stray pixels.
whatta heck
wow. the river layout code worked first try! I'm a better programmer than I thought.
The "landing"s aren't working yet though, because I don't know how they work. Some rivers are picked to be Landings instead of Rivers. Why? no idea
uh oh. there's some incorrectly placed blocks
hmm. I can't load up the riverworld (damn you Philip José Farmer!) because it hangs the game
okay I have a HALF RIVER WORLD, which does load properly.
@1000millimeter isn't that just Portal? :)
@foone wouldn't it be fun to have different cursors for left and right click?
okay I think I've found a problem.
I can generate single-tile maps and they work exactly the same in my renderer
but then I render a real map and a bunch of things are misaligned
so I think I've fundamentally misunderstood how this works.
my theory was that a given tile would default to a standard thing, but then it would pick a secondary thing depending on where it is on the map. Things like "every 6th tile can be a coal mine".
but I don't think this is actually how the game works. I think it's more like there's a single running accumulator which affects ALL types of special tiles.
meaning my individual per-tile work isn't helping. fuck.
I'm gonna need to either do a LOT of testing with variations... or find the code. Either are going to be hard
@1000millimeter well one of the main mechanics involves creating wormholes in space, a blue one and an orange one, which link to each other.
blue is left click, orange is right click.
@foone I fear I don't know portal
anyway I have touched that shit yet, but I finished up the ocean stuff:
gah this is weird.
the tracks tilesheet is 20x20, right?
but the rest of the tiles are 16x16.
found another bug:
the game doesn't use the same bridge logic for AI players as the human player. This means that they can make really really long bridges
completely normal train layout
oh god
you can have double tracks, right?
but where are the tiles for the double tracks stored?
answer: NOWHERE! they get generated at runtime!
I thought I might could cheat and just pull the generated tiles from RAM but I forgot, this is EGA: it's PLANAR
yeah I can't fully decode this (TOO PLANAR) but I'm pretty sure there's no doubled tiles here. it generates them when it draws the double-tracks
yeah it seems to basically draw the regular track, but it offsets it tangent to the track direction.
2 pixels in each direction
gah.
There's 4 types of stations: and three types of stations in the tiles.
these fuckers generate one of the 4 variants at runtime
they store the type of the station outside of the tracks list, which is weird. annoying.
anyway this is where I'm leaving it for now:
stations are all rendered as Depots and the special industries/landings are all being located wrong. But that's a lot of progress!
@foone I distinctly remember the instructions telling me that there were forbidden station configurations you could not use (basically any where the tracks curved while in the station). I wonder if this is why. What's odd is I swear I remember the manual having illustrations of a couple of forbidden configurations in the game's pixel art style
THEY FUCKING THUNK INTO THE GRAPHICS DRIVER TO READ MAP CONTENTS
I suspected this was true but now I definitely have confirmation
this is done for a simple understandable reason:
Sid Meier hates me personally
in any case I have discovered that the tile ids used when the game is running are different from the tile ids in the save files
The in-memory tile IDs are:
00 Clear
01 Forest
02 Desert
03 FootHills
04 Hills
05 Mountains
06 City
07 Village
08 Farm
09 Slums
0A Food Proc
0B Ranch
0C StockYard
0D Factory
0E Grain elevator
0F Paper mill
10 Landing
11 Lumber mill
12 Coal Mine
13 Steel Mill
14 PowerPlant
15 Oil Well
16 Refinery
17 Terminal
18 River
19 Ocean
1A Harbor
anyway it's proving how terrible this question is.
I want to find where the map is stored in memory, and that's a fundamentally incorrect question: it's not stored anywhere in memory. it's stored on the VRAM of the EGA card, because fuck you
and yeah, that VRAM is memory mapped, but not directly and not completely. You have to fuck with the planes to pull the data back out.
and I can't easily decode the function that does it, because it's not in the EXE. it's in the video driver!
and each one has to toggle through multiple EGA planes to fully pull out ONE single map position.
the worst part is that while there is a get_map_tile_at_position function, it gets called when the screen scrolls... for every single position on the screen
so the EGA planes get manipulated something like 768 times EVERY TIME YOU SCROLL, and that's even BEFORE YOU START DRAWING PIXELS
uVar1 = (uint)('\0' < (char)(*(byte *)(*(int *)(y * 2 + 0x241a) + (x >> 3)) & 0x80U >> ((byte)x & 7)));
this is ghidra-speak for "just read the fucking disassembly, it will make far more sense
076萌SNS is a social network, courtesy of 076. It runs on GNU social, version 2.0.2-beta0, available under the GNU Affero General Public License.
All 076萌SNS content and data are available under the Creative Commons Attribution 3.0 license.