Why I Self Host

I have been running my own servers, on and off since college. They have been a great experimentation platform, and given me resources to tap into for my professional life. It wasn't always smooth sailing, but here's some of my takes, and why it's always been worth the hassle, at least for me.

A short history

The early days

The first "server" I built was during college when I took an old desktop, put Linux on it, and used it mostly as a network file store. No redundancy, but it was fun.

Later on during grad school, I had to setup the research group server1. Using DynDNS2, it was publicly accessible. That worked fine for a while, until a vulnerable Apache module turned it into a spam bot. I only found out when I got an annoyed email from a sysadmin in Bucharest, Romania, that it was spamming their network. Took it down for the day, cleaned it up, and made a note to apply patches more often in the future.

Today

Currently, I am running a Pi Cluster with 4 Raspberry Pi 4 Nodes, with 4 GB each, running K3s. While not a very powerful cluster, and only using SD Card for storage, it's enough to serve a static website, grafana, prometheus, and any random experiments. I also have a 1 node Framework Desktop that will run more "critical" systems, like a Postgres database for services that need it.

Why?

A lot of the services I run (or plan to) are usually available for free (or a reasonable subscription) else where. You can host a website cheaply on AWS with S3 and Cloudfront (assuming low traffic). You can host your 3D prints on Printables and the like, and I do. But there are a few advantages to having control of where your data ends up.

Learning

This has been the primary reason for this. The Pi Cluster in particular was a great learning experience. Testing a repeatable Ansible set of playbooks, getting K3S to work and deploying a few services using both plain Kubernetes templates and Helm is not something I get to do in my day job.

I find that I learn best when working on something practical. This is true from programming languages to infrastructure and networking. Having a "disposable" setup makes this a lot easier when learning.

Enshittification

It's an unfortunate reality of today's world. Free services get more restrictive, open source alternatives get yanked3, or the prices get constantly increased, with "AI" as the excuse4.

Having your own services protects from this reality, and you don't need to worry as much about increased costs, lost access, etc.

Private access

From the grad school days, I've learned to be wary when exposing anything over the internet. To this end, the cluster is currently private and I access it from anywhere using Tailscale. The free tier5 is enough for me and my partner, and we can access the services from anywhere. Of course, locally we can always use the local network.

Using Cert Manager, I have publicly valid certificates and DNS that points to the Tailscale IP address of a node in the cluster. This works great, and we don't get any annoying browser warnings. Making everything public is only a matter of updating the DNS records.

Your data, your rules

Keeping the data private means it's not going to be used to train AI models, and target ads at you. I'd rather have my data truly private, not private with an asterisk and a disclaimer, as the service can still read it but promises to play nice.

The downsides

Reliability

My setup is only as reliable as my internet and power are. A UPS helps with the power, but no backup to the internet connection. If Comcast is being its usual self, we'll get a few periods of downtime a week. For now this is not an issue, as the total downtime is less than 10 minutes a week, and mostly at night.

Data safety

If anything were to happen to the hardware, well, the data is gone. That is solved by having automatic backups to S36, and having a good 3-2-1 backup policy is critical.

Also, test your backups regularly. You don't want to find that your backups are broken when you really need them.

Cost

In particular, the upfront costs of getting the hardware. This is only getting more expensive these days7. However, if you've got some old hardware lying around, or go the used option, it's going to be less of a concern.

Security

If the spambot incident taught me anything, it's that if it's open to the internet it's a target. Data leaks could be more serious if it has private data, and you're the only one in charge and responsible for securing it. Keeping up to date with patches becomes even more critical the more sensitive the stored data. A solution like Tailscale sidesteps the issue by keeping everything private.

Conclusion

Is this for everyone? No. But if you want to learn, or care about where data ends up living, this is something worth looking into.

Footnotes

1

In our living room, because the university policies for a "hard-wired" server were beyond draconian, and made the hardware useless for the research we were doing

2

It was free in those days

5

I am aware that this could enshittify in the future. However, Headscale is an option, so I'm confident there's an offramp if I need it

6

For data durability, I haven't found anything that beats S3, or other key store offerings by a large cloud vendor.