@Herman_Hetherington@p :dracula: They said that they needed a stealth soldier, so I put my hands on the hibachi hot-plate at Benihana and burned my fingerprints off. They will never find me. :dracula2:
> interesting amount of misinfo sticking around from two decades prior
Arthur Whitney's microsecond trading bots still run with only one core enabled, and there was another hyperthreading security problem last year or the year before.
Because the box was flaky, I abandoned plans to sell hosting. This turned out to be a good thing: the benefit of the VMs all being operated by either me or people that I know personally is that I don't have to worry so much about anyone trying to do something shady to compromise the box. So most of those problems don't apply.
But what I do have is measurement, right, like I can see that it rarely maxes out any of the cores. It is at least not slower without hyperthreading, because there are idle cores and most of the cores don't even approach 100%. And there's the statistic, right, 90% of the time is spent on SpecEx while waiting for the memory bus, and even if I only half-believe it, I can measure. (Also I have a lot of thoughts on the size of icache and mostly isolated tests around them.)
I am interested in your thoughts on this. Don't take this the wrong way, but "misinfo sticking around" seems like you are curious whether my thoughts are wrong or not, which is an evaluation of a thing: I'm more interested in what your thoughts are, that's the thing and the evaluation of the thing is a degree removed.
Contention for the memory bus is the biggest performance hit on $current_year CPUs and aside from that, the biggest performance bottlenecks for Pleroma are, in order, disk I/O and network I/O. The CPUs on the box mostly sit idle, meaning that more hardware threads wouldn't accomplish anything but heating the machine up.
I suspect that it's faster with less NUMA-wrangling, but have not benchmarked it. I just know that on this workload, I could probably disable half the physical cores and still have no trouble.
And that's about FSE specifically rather than CPUs in general. I don't think anyone can say too much to me about FSE specifically, but CPUs are fabulously complicated nowadays so anyone that knows anything will usually know some things I do not.
> service on their lifecycle controller (Dell) and outputs the data from the iDrac.
Yeah, I already have freeipmi installed, it's how I get the hardware logs, it's where the bmc-watchdog comes from (to enable the hardware watchdog), etc. The other screenshot with all the fans and temperatures (CPUs and intake and exhaust and disks) and the amperage drawn per power supply and the voltage of the current coming through the power supply and the wattage consumed by the system.
I'm aware there's a closed-source version from the manufacturer. I'm saying I have access to the hardware logs, so it's not a question of how to get them, it's a question of why this didn't show up. Anything breaks, I can tell, but nothing shows up here: why? If we assume the problem was a flaky PSU, it's conceivable that it could send a spike down the line and this might cause problems that a simple interruption (where the hardware has enough battery to record the log entry) doesn't cause. I hope that's the case anyway: I don't know how to account for it otherwise.
> In linux you can even make a simple script that you can output the hardware status
Yes, I have that, it's what I used to produce the other screenshot, the third panel. It's also what I use to get the logs, and they don't have anything recorded in them. They say things like "Power interrupted" and the timestamp is after I file the ticket asking them to power-cycle the box.
This isn't a service, it's a physical machine on a rack at a datacenter. I didn't construct an email alert system or sign up for someone's email alert system or pay any money to pingability or whatever. I have the IPMI monitoring system from the other screenshot on SPC, and then I have the dashboard I built for network monitoring (and also where most of the IRC windows are). I check these things more often than I check my email. mushi_dash.png
Depending on your server manufacturer they already provide that service on their lifecycle controller (Dell) and outputs the data from the iDrac.
In linux you can even make a simple script that you can output the hardware status (of course as long as it's not system catastrophic). It's not a bad idea to keep in track of psu failure (as long as you have more than one), memory stick failure, and disk degradation.
I'm just still surprised there was never any email alerts setup for your PSU going down. Did both went down at the same time, or did one fail and the backup eventually died?