Mon 29 April 2024
tags: homelab
A year and a half ago, my lab went kaput.
It was the result of multiple failures that I never fixed before the next one popped up. Nextcloud needed updates (several, actually) and those went predicatably badly. While I tried getting Nextcloud back up, one of my ESXi hosts got corrupted and failed. The other host lost power and wouldn't come back up. In the space of a month, I went from having a decent lab to nothing at all. That's how it stayed until the middle of last month.
I decided that I wanted to start things from scratch the right way. I couldn't do custom DHCP due to being reliant on my home router, so I looked into VLANs. I have a 5-port managed switch with POE, so I grabbed a POE HAT for an old Raspberry Pi 3 that I have laying around and started tinkering. After 2 months of trying over and over... I gave up. I'm sticking to a single network for now, and I'll experiment and move things later. I was never able to set things up so that I could route traffic between them, and the isolation I was expecting just didn't work - I somehow brought my home network down 3 times.
Luckily for me, everything else basically just worked. I had installed the drives into my NAS several months back (as part of a move, I wanted to get rid of the bulk of 8 HDD boxes) but I hadn't actually done anything with the hardware. Grabbing a TrueNAS Core iso was easy, as was installation. The only hiccup I had was finding out that I put the M.2 SSD in the wrong slot, so four drives were disabled. 5 minutes going through the motherboard user manual and 2 minutes with a screwdriver got me back in business. The actual setup was also easy, no real difference between that and any other OS setup I've done. I was astounded at how simple it was to create a drive pool and set up RAID-Z2 on it. My only mistake here was that I forgot that I wanted to set up one drive as a hot spare. I doubt it will actually be an issue, but this was supposed to emulate the enterprise drive setup at my old job. Still, it works so I'm not going to sweat it.
Next I needed hosts to run software on. The SSD for the failed ESXi host was completely trashed (nothing would recognize it, even as a block device) so I swapped it out with a slightly smaller one. I also changed over to using Proxmox. Given that ESXi just cut support for free licenses (at least that's how I understand it), staying with VMWare was no longer an option. I tinkered with Proxmox back in 2018 but ultimately chose VMWare because I thought it would help my career. Now that I'm a Linux guy at an almost all-Windows job, I'm using what I feel will be most comfortable for me. I still haven't gotten the larger VM host to power on at all, so I think the PSU might just be fried. There's a dozen different explanations, but honestly I'll probably sell it for parts and pick up another USFF PC like my Thinkcentre Tiny.
Proxmox is basically a web GUI for running KVM and LXC on a Debian host. I can run VMs for other OSes, but I'll stick mostly with RHEL and its clones. It does have LXC templates from Turnkey that are all Debian-based, but I'm shooting for a bit of homogeneity here so those aren't as appealing. I did try their Nextcloud container image, but there's a bug in the installer that they've been working on for a few months. If that had worked, it probably would have been the main exception to the rule just to simplify the setup. Running things in containers is new for me, but for the most part I'm treating mine like VMs. The setup is faster and they seem to be lighter, so I'm happy with them overall. Plus, I can take a snapshot of servers like Nextcloud and roll back if an upgrade fails!
Right now I just have Nextcloud and a Docker host. My ISP is causing issues with opening ports, I can't see 80 or 443 from the outside at all. It's possible that there's some weird routing at my current apartment, but as far as I know I should be able to open those. My ISP claims they don't block them, but I have the ports open and a web server listening just to test connectivity. I might need to look into an external reverse proxy, or maybe just running alternate ports.
After a year and a half of no lab, it feels good to have something up and running. My confidence took a bigger hit than I realized when I let everything go down, but I'm trying to learn and do it better each time I rebuild. Hopefully there won't be more rebuilds in the future, just additions.