Fly Machines: An API for Fast-Booting VMs

samwillis | 245 points

When I first starting using AWS a few years ago, having known generally what it was for far longer, I was flabbergasted it was at how slow it was to get an instance booted. I expected much less, thinking about things from first principals, even if you're literally talking about cold booting a physical machine via IPMI. But it seemed like everyone accepted that as the way it was and now I do too. So I'm glad people are still interested in making things fast.

Right now I'm doing Postgres stuff (RDS) and dealing with taking 10+ minutes to boot a fresh instance. I'm tempted to try out fly.io and their Postgres clusters but I'm afraid I'd be spoiled and hate my life after (my job has me stuck in AWS for the interminable future).

I would be interested to know where all that time is being spent in on the AWS side. To be a fly on the wall seeing their full, unfiltered logging and metrics.

boardwaalk | 2 years ago

There's something about the tone and content of fly.io blog posts that makes it impossible for me not to root for them. (It also helps that the DX is so great.) I've only had a chance to deploy toy apps to Fly.io, nothing at scale, yet, but it checks all my boxes.

chrisweekly | 2 years ago

Now they've got my attention. This is incredibly difficult to execute on. Kudos to the team there who figured it out. If fly is or can become profitable then they've got a chance at being around for a long time. I can see them as the new cloudflare.

asim | 2 years ago

> Fly Machines will help us ship apps that scale to zero sometime this year.

I think this is what will make Fly really exciting. Right now (if I understand right) you need to be paying for a VM 24/7 in every region you want your app available in, because it only scales down to 1. So it runs apps in regions close to users that you're willing to pay for 24/7. If they make scale-to-zero work in every region, then maybe you can just make every app global and if you have some occasional users in Australia then it can just spin up over there while you're getting requests. I think it's what will make many-regions feasible for every app.

mcintyre1994 | 2 years ago

> turns Docker images into running VMs

I honestly don't understand what's going on here. I thought we turned to Docker/containers because VMs were too heavy? Now we've got VMs that run Docker? (Not trying to be dense - what is the advantage?)

ryanianian | 2 years ago

What an exciting time to be a developer!

I am so excited about the future. We are seeing a bunch of announcements from multiple companies that make it possible for a single developer or small team to fairly cheaply run a global service without spending a whole lot of time on ops.

I am excited to see what people will come up with.

RcouF1uZ4gsC | 2 years ago

Really like the recent handful of smaller companies announcing more sorta serverless style building blocks.

It’s one of the major pluses of the big clouds yet their pricing isn’t always awesome. Smaller player can help push that down.

See also the DO announcement today. Probably won’t use that but glad about it anyway

Havoc | 2 years ago

The post states:

>"We're not done. You need something to run, right? Firecracker needs a root filesystem. For this, we download Docker images from a repository backed by S3. This can be done in a few seconds if you're near S3 and the image is smol."

I feel like I am missing something. If an S3 bucket is a requirement and I was interested in the isolation provided by Firecracker why wouldn't I just use AWS Fargate or Lambda which are both powered by Firecracker? If low latency was the concern, I can't imagine there being any lower latency than having my workload and storage being colocated in the same AWS Availability Zone.

bogomipz | 2 years ago

I was really excited when reading this, but realized the lack of a faster "warm" start makes this less ideal for my highly latency-sensitive use case on Lambda. Lambdas start much faster than 300ms when warm IME, and I'm hoping with enough sustained traffic (be it real or artificial), most requests will be warm.

I'd love to be able to supply some kind of memory snapshot in addition to the docker image to cut down on cold starts. Probably blocked on snapshot support in Firecracker according to another thread? Eagerly awaiting this since it could make Fly Machine the best of both worlds!

Not a fan of how Lambda makes me scale memory and compute in tandem, when my use case benefits so much more from compute than memory. I basically have to pay for 2+ gigs I'm never going to use to get the compute performance I want. Makes 0 sense.

lewisl9029 | 2 years ago

really great announcement

as far as i understand this will let me run VMs with specified Docker images?

i'm thinking of using something Fly.io to offer a dedicated hosting for my upcoming product, so when the customers sign up they get a new machine with an individual endpoint

the workload that needs to be running on those machines is quite intensive (like crawling web pages) and not very scalable when sharing resources

also can you give more details about your Nomad stack?

i was actually thinking of using Kubernetes or Docker swarm as API to deploy these workloads

ushakov | 2 years ago

Do you have recommendations for stateful workloads? Would the answer always be 'connect to an external DB/API for all state'?

E.g. if I need to run a bunch of processing, would it be A) spin up the micro-VM and pull from a queue service B) embed SQLite C) use some kind of in-memory store

TBH I've been waiting for years for someone to do 'firecracker as a service'. I must have searched that exact term about once per month.

bluelightning2k | 2 years ago

Does this mean I can spin up multiple instances of _the same_ application on the fly, each running on it's own VM?

For example, we have a queue that handles video encoding. I would like to have 0-N encoders running at the same time, based on demand.

Spin up time is important as well, since I typically provide test renders triggered from the UI.

perk | 2 years ago

Ok... So what are the tiktok accountants? All the bad financial takes on tiktok or something else?

viraptor | 2 years ago

What's the DB / compute break-even for this use case? I assume if you app uses 90% of CPU cycles on DB access, this is not the way to go. And if your app is 90% compute this is a nice solution.

funstuff007 | 2 years ago

Does Fly implement live migration under the hood?

dilyevsky | 2 years ago

Does this mean you can run a dev VM on demand like how Gitpod does?

yewenjie | 2 years ago

How does this compare to AWS Lambda's docker support

tomatowurst | 2 years ago

> We're not done. You need something to run, right? Firecracker needs a root filesystem. For this, we download Docker images from a repository backed by S3. This can be done in a few seconds if you're near S3 and the image is smol.

Lmao props to the team for getting this copy out unsanitized by (potentially) unchill bosses.

arthurcolle | 2 years ago

vm.boot(speed='fast')

bayesian_horse | 2 years ago

I have to make a reference to ointment- it is obligatory.

tag2103 | 2 years ago

I know some prominent HN users work for fly.io, and they seem to be doing some interesting work, but the absolutely glowing response that every blog post gets here on HN seems a bit nepotistic.

WatchDog | 2 years ago

This is really really exciting! I hope it enables more products built on top of full VMs with fast UX/DX.

I just wish I knew about this earlier because from what I read, I think we at Devbook [1] built pretty similar service for our product. We are using Docker to "describe" the VM's environment, our booting times are in the similar numbers, we are using Nomad for orchestration, and we are also using Firecracker :). We basically had to build are own serverless platform for VMs. I need to compare our current pricing to Fly's.

[1] https://usedevbook.com

mlejva | 2 years ago