So You Want to Build Your Own Data Center
Reminds me of the old Rackspace days! Boy we had some war stories:
- Some EMC guys came to install a storage device for us to test... and tripped over each other and knocked out an entire Rack of servers like a comedy skit. (They uh... didn't win the contract.)
- Some poor guy driving a truck had a heart attack and the crash took our DFW datecenter offline. (There were ballards to prevent this sort of scenario, but the cement hadn't been poured in them yet.)
- At one point we temporarily laser-beamed bandwidth across the street to another building
- There was one day we knocked out windows and purchased box fans because servers were literally catching on fire.
Data center science has... well improved since the earlier days. We worked with Facebook on the OpenCompute Project that had some very forward looking infra concepts at the time.This is a pretty decent write up. One thing that comes to mind is why would you write your own internal tooling for managing a rack when Netbox exists? Netbox is fantastic and I wish I had this back in the mid 2000s when I was managing 50+ racks.
My first colo box came courtesy of a friend of a friend that worked for one of the companies that did that (leaving out names to protect the innocent). It was a true frankenputer built out of whatever spare parts he had laying around. He let me come visit it, and it was an art project as much as a webserver. The mainboard was hung on the wall with some zip ties, the PSU was on the desk top, the hard drive was suspended as well. Eventually, the system was upgraded to newer hardware, put in an actual case, and then racked with an upgraded 100base-t connection. We were screaming in 1999.
It would be nice to have a lot more detail. The WTF sections are the best part. Sounds like your gear needs "this side towards enemy" sign and/or the right affordances so it only goes in one way.
Did you standardize on layout at the rack level? What poke-yoke processes did you put into place to prevent mistakes?
What does your metal->boot stack look like?
Having worked for two different cloud providers and built my own internal clouds with PXE booted hosts, I too find this stuff fascinating.
Also take utmost advantage of a new DC when you are booting it to try out all the failure scenarios you can think of and the ones you can't through randomized fault injection.
This is our first post about building out data centers. If you have any questions, we're happy to answer them here :)
This is how you build a dominant company. Good for you ignoring the whiny conventional wisdom that keeps people stuck in the hyperscalers.
You’re an infrastructure company. You gotta own the metal that you sell or you’re just a middleman for the cloud, and always at risk of being undercut by a competitor on bare metal with $0 egress fees.
Colocation and peering for $0 egress is why Cloudflare has a free tier, and why new entrants could never compete with them by reselling cloud services.
In fact, for hyperscalers, bandwidth price gouging isn’t just a profit center; it’s a moat. It ensures you can’t build the next AWS on AWS, and creates an entirely new (and strategically weaker) market segment of “PaaS” on top of “IaaS.”
If you’re using 7280-SR3 switches, they’re certainly a fine choice. However, have you considered the 7280-CR3(K) range? They're much better $/Gbps and more relevant edge interfaces.
At this scale, why did you opt for a spine-and-leaf design with 25G switches and a dedicated 32×100G spine? Did you explore just collapsing it and using 1-2 32×100G switches per rack, then employing 100G>4×25G AOC breakout cables and direct 100G links for inter-switch connections and storage servers?
Have you also thought about creating a record on PeeringDB?https://www.peeringdb.com/net/400940.
By the way, I’m not convinced I’d recommend a UniFi Pro for anything, even for out-of-band management.
The date and time durations given seem a bit confusing to me...
"we kicked off a Railway Metal project last year. Nine months later we were live with the first site in California".
seems inconsistent with:
"From kicking off the Railway Metal project in October last-year, it took us five long months to get the first servers plugged in"
The article was posted today (Jan 2025), was it maybe originally written last year and the project has been going on for more than a year, and they mean that the Railway Metal project actually started in 2023?
Was really hoping this was was actually about building your own data center. Our town doesn't have a data center, we need to go an hour south or an hour north. The building that a past failed data center was in (which doesn't bode well for a data center in town, eh?), is up for lease and I'm tempted.
But, I'd need to start off small, probably per-cabinet UPSes and transfer switches, smaller generators. I've built up cabinets and cages before, but never built up the exterior infrastructure.
Love these kinds of posts. Tried railway for the first time a few days ago. It was a delightful experience. Great work!
I would be super interested to know how this stuff scales physically - how much hardware ended up in that cage (maybe in Cloud-equivalent terms), and how much does it cost to run now that it's set up?
What brand of servers was used?
Awesome!! Hope to see more companies go this route. I had the pleasure to do something similar for a company(lot smaller scale though)
It was my first job out of university. I will never forget the awesome experience of walking into the datacenter and start plugging cables and stuff
I remember talking to Jake a couple of years ago when they were looking for someone with a storage background. Cool dude, and cool set of people. Really chuffed to see them doing what they believe in.
Can anyone recommend some engineering reading for building and running DC infrastructure?
First time checking out railway product- it seems like a “low code” and visual way to define and operate infrastructure?
Like, if Terraform had a nice UI?
weird to think my final internship was running on one of these things. thanks for all the free minutes! it was a nice experience
y’all really need to open source that racking modeling tool, that would save sooooo many people so much time
More to learn from the failures than the blog haha. It tells you what the risks are with a colocation facility. There really isn't any text on how to do this stuff. The last time I wanted to build out a rack there aren't even any instructions on how to do cable management well. It's sort of learned by apprenticeship and practice.
I'm surprised you guys are building new!
Tons of Colocation available nearly everywhere in the US, and in the KCMO area, there are even a few dark datacenters available for sale!
cool project none-the-less. Bit jealous actually :P
Why would you call colocation "building your own data center"? You could call it "colocation" or "renting space in a data center". What are you building? You're racking. Can you say what you mean?