Fly Machines: An API for Fast-Booting VMs
There's something about the tone and content of fly.io blog posts that makes it impossible for me not to root for them. (It also helps that the DX is so great.) I've only had a chance to deploy toy apps to Fly.io, nothing at scale, yet, but it checks all my boxes.
Now they've got my attention. This is incredibly difficult to execute on. Kudos to the team there who figured it out. If fly is or can become profitable then they've got a chance at being around for a long time. I can see them as the new cloudflare.
> Fly Machines will help us ship apps that scale to zero sometime this year.
I think this is what will make Fly really exciting. Right now (if I understand right) you need to be paying for a VM 24/7 in every region you want your app available in, because it only scales down to 1. So it runs apps in regions close to users that you're willing to pay for 24/7. If they make scale-to-zero work in every region, then maybe you can just make every app global and if you have some occasional users in Australia then it can just spin up over there while you're getting requests. I think it's what will make many-regions feasible for every app.
> turns Docker images into running VMs
I honestly don't understand what's going on here. I thought we turned to Docker/containers because VMs were too heavy? Now we've got VMs that run Docker? (Not trying to be dense - what is the advantage?)
What an exciting time to be a developer!
I am so excited about the future. We are seeing a bunch of announcements from multiple companies that make it possible for a single developer or small team to fairly cheaply run a global service without spending a whole lot of time on ops.
I am excited to see what people will come up with.
Really like the recent handful of smaller companies announcing more sorta serverless style building blocks.
It’s one of the major pluses of the big clouds yet their pricing isn’t always awesome. Smaller player can help push that down.
See also the DO announcement today. Probably won’t use that but glad about it anyway
The post states:
>"We're not done. You need something to run, right? Firecracker needs a root filesystem. For this, we download Docker images from a repository backed by S3. This can be done in a few seconds if you're near S3 and the image is smol."
I feel like I am missing something. If an S3 bucket is a requirement and I was interested in the isolation provided by Firecracker why wouldn't I just use AWS Fargate or Lambda which are both powered by Firecracker? If low latency was the concern, I can't imagine there being any lower latency than having my workload and storage being colocated in the same AWS Availability Zone.
I was really excited when reading this, but realized the lack of a faster "warm" start makes this less ideal for my highly latency-sensitive use case on Lambda. Lambdas start much faster than 300ms when warm IME, and I'm hoping with enough sustained traffic (be it real or artificial), most requests will be warm.
I'd love to be able to supply some kind of memory snapshot in addition to the docker image to cut down on cold starts. Probably blocked on snapshot support in Firecracker according to another thread? Eagerly awaiting this since it could make Fly Machine the best of both worlds!
Not a fan of how Lambda makes me scale memory and compute in tandem, when my use case benefits so much more from compute than memory. I basically have to pay for 2+ gigs I'm never going to use to get the compute performance I want. Makes 0 sense.
really great announcement
as far as i understand this will let me run VMs with specified Docker images?
i'm thinking of using something Fly.io to offer a dedicated hosting for my upcoming product, so when the customers sign up they get a new machine with an individual endpoint
the workload that needs to be running on those machines is quite intensive (like crawling web pages) and not very scalable when sharing resources
also can you give more details about your Nomad stack?
i was actually thinking of using Kubernetes or Docker swarm as API to deploy these workloads
Do you have recommendations for stateful workloads? Would the answer always be 'connect to an external DB/API for all state'?
E.g. if I need to run a bunch of processing, would it be A) spin up the micro-VM and pull from a queue service B) embed SQLite C) use some kind of in-memory store
TBH I've been waiting for years for someone to do 'firecracker as a service'. I must have searched that exact term about once per month.
Does this mean I can spin up multiple instances of _the same_ application on the fly, each running on it's own VM?
For example, we have a queue that handles video encoding. I would like to have 0-N encoders running at the same time, based on demand.
Spin up time is important as well, since I typically provide test renders triggered from the UI.
Ok... So what are the tiktok accountants? All the bad financial takes on tiktok or something else?
What's the DB / compute break-even for this use case? I assume if you app uses 90% of CPU cycles on DB access, this is not the way to go. And if your app is 90% compute this is a nice solution.
Does Fly implement live migration under the hood?
Does this mean you can run a dev VM on demand like how Gitpod does?
How does this compare to AWS Lambda's docker support
> We're not done. You need something to run, right? Firecracker needs a root filesystem. For this, we download Docker images from a repository backed by S3. This can be done in a few seconds if you're near S3 and the image is smol.
Lmao props to the team for getting this copy out unsanitized by (potentially) unchill bosses.
vm.boot(speed='fast')
I have to make a reference to ointment- it is obligatory.
I know some prominent HN users work for fly.io, and they seem to be doing some interesting work, but the absolutely glowing response that every blog post gets here on HN seems a bit nepotistic.
This is really really exciting! I hope it enables more products built on top of full VMs with fast UX/DX.
I just wish I knew about this earlier because from what I read, I think we at Devbook [1] built pretty similar service for our product. We are using Docker to "describe" the VM's environment, our booting times are in the similar numbers, we are using Nomad for orchestration, and we are also using Firecracker :). We basically had to build are own serverless platform for VMs. I need to compare our current pricing to Fly's.
When I first starting using AWS a few years ago, having known generally what it was for far longer, I was flabbergasted it was at how slow it was to get an instance booted. I expected much less, thinking about things from first principals, even if you're literally talking about cold booting a physical machine via IPMI. But it seemed like everyone accepted that as the way it was and now I do too. So I'm glad people are still interested in making things fast.
Right now I'm doing Postgres stuff (RDS) and dealing with taking 10+ minutes to boot a fresh instance. I'm tempted to try out fly.io and their Postgres clusters but I'm afraid I'd be spoiled and hate my life after (my job has me stuck in AWS for the interminable future).
I would be interested to know where all that time is being spent in on the AWS side. To be a fly on the wall seeing their full, unfiltered logging and metrics.