This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!
1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?
2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?
3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?
4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?
5. I have to ask - how do you think about open source here?
6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)
I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.
Founder of cunoFS here, brilliant to see lots of activity in this space, and congrats on the launch! As you'll know, there's a whole galaxy of design decisions when building file storage, and as a storage geek it's fun to see what different choices people make!
I see you've made some similar decisions to what we did for similar reasons I think - making sure files are stored 1:1 exactly as an object without some proprietary backend scrambling, offering strong consistency and POSIX semantics on the file storage, with eventual consistency between S3 and POSIX interfaces, and targeting high performance. Looks like we differ on the managed service vs traditional download and install model, and the client-first vs server-first approach (though some of our users also run cunoFS on an NFS/SMB gateway server), and caching is a paid feature for us versus an included feature for yours.
Look forward to meeting and seeing you at storage conferences!
Pretty sure we're in your target market. We [0] currently use GCP Filestore to host DuckDB. Here's the pricing and performance at 10 TiB. Can you give me an idea on the pricing and performance for Regatta?
Service Tier: Zonal
Location: us-central1
10 TiB instance at $0.35/TiB/hr
Monthly cost: $2,560.00
Performance Estimate:
Read IOPS: 92,000
Write IOPS: 26,000
Read Throughput: 2,600 MiB/s
Write Throughput: 880 MiB/s
I’m very interested in this as a backing disk for SQLite/DuckDB/parquet, but I really want my cached reads to come straight from instance-local NVMe storage, and to have a way to “pin” and “unpin” some subdirectories from local cache.
Why local storage? We’re going to have multiple processes reading & writing to the files and need locking & shared memory semantics you can’t get w/ NFS. I could implement pin/unpin myself in user space by copying stuff between /mnt/magic-nfs and /mnt/instance-nvme but at that point I’d just use S3 myself.
Any thoughts about providing a custom file system or how to assemble this out of parts on top of the NFS mount?
Founder of JuiceFS here, congrats to the Launch! I'm super excited to see more people doing creative things in the using-S3-as-file-system space. When we started JuiceFS back in 2017, applied YC for 2 times but no luck.
We are still working hard on it, hoping that we can help people with different workloads with different tech!
I love this space, and I have tried and failed to get cloud providers to work on it directly :). We could not get the Avere folks to admit that their block-based thing on object store was a mistake, but they were also the only real game in town.
That said, I feel like writeback caching is a bit ... risky? That is, you aren't treating the object store as the source of truth. If your caching layer goes down after a write is ack'ed but before it's "replicated" to S3, people lose their data, right?
I think you'll end up wanting to offer customers the ability to do strongly-consistent writes (and cache invalidation). You'll also likely end up wanting to add operator control for "oh and don't cache these, just pass through to the backing store" (e.g., some final output that isn't intended to get reused anytime soon).
Finally, don't sleep on NFSv4.1! It ticks a bunch of compliance boxes for various industries, and then they will pay you :). Supporting FUSE is great for folks who can do it, but you'd want them to start by just pointing their NFS client at you, then "upgrading" to FUSE for better performance.
Wow, looks like a great product! That's a great idea to use NFS as the protocol. I honestly hadn't thought of that.
Perfect.
For IBM, I wrote a crypto filesystem that works similarly in concept, except it was a kernel filesystem. We crypto split the blocks up into 4 parts, stored into cache. A background daemon listened to events and sync'ed blocks to S3 orchestrated with a shared journal.
It's pure magic when you mount a filesystem on clean machine and all your data is "just there."
In (March?) 2007 (correction 2008) myself and two other engineers in front of Bruce Chizen - Adobe's CEO in a small conference room in Bucharest demoed a photo taken with an iPhone automagically showing as a file on a Mac. I implemented the local FUSE talking to Ozzy - Adobe's distributed object store back then, using an equivalent of a Linux inode structure. It worked like a charm and if I remember correctly it took us a few days to build it. It was a success just as much as Adobe's later choices around http://Photoshop.com were a huge failure. A few months later Dropbox launched.
That kickstarted about a decade in (actual) research and development led by my team which positioned the Bucharest center as one of the most prolific centers in distributed systems within Adobe and of Adobe within Romania.
But I didn't come up with the concept, it was Richard Jones that inspired us with the Gmail drive that used FUSE with gmail attachments back in 2004 when I got my first while still in college https://en.wikipedia.org/wiki/GMail_Drive. I guess I'm old, but I find it funny to see Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS
Love this idea! Biggest hurdle though have been to have predictable Auth&IO across multiple Python/Scala versions and all other things (Spark, orchestrators, CLI's of teams of varying types of OS etc etc) add to that access logs.
SF3s/boto/botocore versions x Scala/Spark x parquet x iceberg x k8s etc readers own assumptions makes reading from S3 alone a maintenance and compatibility nightmare.
Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?
If so then you will win 100% I wouldn't even care about speed/cost if it's up to par with s3
I am not your target audience but I have been thinking of building a very minified version of this using [0] Pooch and [1] S3FS.
Right now we spend a lot of time downloading various stuff from HTTP or S3 links and then figuring out folder structures to keep them in our S3 buckets. Pooch really simplifies the caching for this by having a deterministic path on your local storage for downloaded files, but has no S3 backend.
So a combination of 2 would be to just have 1 call to a link that would embed the caching both locally and on our S3 buckets deterministically.
[0] https://www.fatiando.org/pooch/latest/ [1] https://s3fs.readthedocs.io/en/latest/
Does it mean I can use Lambda + SQLite + Regatta to build a real pay-as-you-go ACID SQL storage?
Edit: an production-ready (high durability) ACID SQL storage
This is fantastic! Interestingly, I was one of the early engineers at Maginatics [1], a company that built exactly this in 2011 - and Netflix was one of our earliest beta customers. We strived to be both SMB3 and POSIX compatible, but leaning into SMB3 semantics. We had some pretty great optimizations that gave almost local disk performance (e.g. using file and directory leases [2], async metadata ops, data and metadata caching, etc). EFS was just coming out at that point (Azure I think also had something similar in the works).
I'll be looking closely in what you're building!
[1] https://www.dell.com/en-us/blog/welcoming-spanning-maginatic...
[2] https://www.slideshare.net/slideshow/maginatics-sdcdwl/39257...
Fascinating. If this had been around a year ago, we could have used it in our datacenter build-out. For data source reasons, we record data in the cloud. In the past, we'd stick most of the data in S3 and only egress what we needed to run analysis on. The way we'd do that is that we have a machine with 16 * 30 TiB SSDs that acts as our on-prem cache of our S3 data. It did this using a slightly modified goofys with a more modified catfs in front of it, with both the cache and the catfs view exported over NFSv4. We had application-level switching between the cache and the export since our data was really read-only.
When the cache got full, catfs would evict things from it pretty simply. It's overall got a good design but has a few bugs you have to fix, and when you have 100 machines connecting to it, it requires some tuning to make sure that it doesn't all stall. But it worked for the most part.
Anyway, I think this is cool tech. I'm currently doing some bioinformatics stuff that this might help with (each genome sequence is some 100 GiB compressed). I'll give it a shot some time in the next couple of months.
This looks quite compelling.
But it's not clear how it handles file update conflicts. For example: if User A updates File X on one computer, and User B updates File X on another computer, what does the final file look like in S3?
There are quite some noteworthy alternatives like s3fs, rclone, goofys etc.
That is interesting, but I haven't read how it is implemented yet.
The hard part is a cache layer with immediate consistency. It likely requires RAFT (or, otherwise, works incorrectly). Integration of this cache layer with S3 (offloading cold data to S3) is easy (not interesting).
It should not be compared to s3fs, mountpoint, geesefs, etc., because they lack consistency and also slow and also don't support full filesystem semantics, and break often.
It could be compared with AWS EFS. Which is also slow (but I didn't try to tune it up to maximum numbers).
For ClickHouse, this system is unneeded because ClickHouse is already distributed (it supports full replication or shared storage + cache), and it does not require full filesystem semantics (it pairs with blob storages nicely).
Neat stuff. I think everybody with an interest in NFS has toyed with this idea at some point.
> Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.
How do you handle the cache server crashing before syncing to S3? Do the cache servers have local disk as well?
Ditto for how to handle intermittent S3 availability issues?
What are the fsync guarantees for file append operations and directories?
Noob Question: When an average person buys 2TB storage from a cloud provider one pays upfront for the entire thing. Would pricing for a product be made more competitive(vs dropbox) using such a solution?
It takes somtimes years to fill it up with photos, vidoes and other documents. Sounds like one could build a great killer low amortized – pay as you fill it up – service for people to compete with dropbox.
I don't see any other question about it, so maybe I just missed the obvious answer, but how do you handle POSIX ACLs? If the data is stored as an object in S3, but exposed via filesystem, where are you keeping (if at all?) the filesystem ACLs and metadata?
Also, NFSv3 and not 4?
Is this like JuiceFS? https://juicefs.com/
Reminds me of https://www.lucidlink.com/ for video editors. I quite like the experience with them.
Was taking a look at pricing features - melted down, paying per month doesn't seem like a bad option; still, the API features 1 hour SLA support for enterprise tier subscribers.
S3 bucket systems for cloud hosting services are typically encrypted through AES-256. SSE-S3 or SSE-KMS are available upon request.
[1]: https://aws.amazon.com/blogs/aws/new-amazon-s3-encryption-se...
Having the API hosted on Regatta's servers but integrating a POSIX-compliant bring-your-own compute would tighten up instance storage fees for the end-user.
[1]:https://aws.amazon.com/blogs/aws/new-amazon-s3-encryption-se...
How do you handle wrote concurrency?
If you different processes write on the same file at the same time, what do I read after?
I have a few qualms with this app:
1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
... I'm kidding, this is quite useful.
I really wish that NFSv3 and Linux had built-in file hashing ioctls that could delegate some of this expensive work to the backend as it would make it much easier to use something like this as a backup accelerator.
Congratulations on your launch from ObjectiveFS! There is a lot of interest in 1-to-1 filesystems for mixed workloads, hope you can capture a nice share of that.
Using NFS and being able to use an existing bucket is a nice way to make it easy to get started and try things out. For applications that need full consistency between the S3 and the filesystem view, you can even provide an S3 proxy endpoint on your durable cache that removes any synchronization delays.
Is this meaningfully different from https://github.com/s3ql/s3ql ?
S3 semantics are generally fairly terrible for file storage (no atomic move/rename is just one example) but using it as block storage a la ZFS is quite clever.
In 2024, you are better off dropping the file system abstraction entirely and just embracing object storage abstractions (and ideally, immutable write-once objects).
Source: personal experience, I've done the EFS path and the S3-like path within the same system, and the latter was much easier to develop for and troubleshoot performance. It's also far cheaper to operate.
You can have local caching, rapid "read what I wrote", etc. with very little engineering cost, no one at my company is dedicated to this because the abstraction is ridiculously simple:
1. It's object storage, not a file system. Embrace immutability.
2. When you write to S3, cache locally as well.
3. When you read from S3, check the cache first. Optionally cache locally on reads from S3.
4. Set cache sizes so you don't blow out local storage.
5. Tier your caches when needed to increase sharing. (Immutability makes this trivially safe.)
All that's left is to manage 'checked out files' which is pretty easy when almost all of them are immutable anyway.
Can you comment on how this is different from https://aws.amazon.com/blogs/aws/mountpoint-for-amazon-s3-ge... ?
Why are these solutions always using NFS? I'm asking out of curiosity, not judgement.
I've looked for a solution to write many small files fast (safely). Think about cloning thr Linux kernel git repo. Whatever I tested, the NFS protocol was always a bottleneck.
The title says POSIX but then it talks about NFS. So, what is it? Does it guarantee all POSIX semantics or not?
Your pricepoint is very bad. The overprovicioning statement in your Post indicated that you would be a 'cheap' alternative but 100gb for $5?
I'm also not sure that its a good architecture to have your servers inbetween my S3. If i'm on one cloud provider, the traffic between their S3 compatible solution and my infrastructure is most of the time in the same cloud provider. And if not, i will for sure have a local cache rcloning the stuff from left to right.
I also don't get your calculator at all.
People have been throwing out "POSIX" distributed file systems for a long time but this claim usually raises more questions than it answers. Especially since clients access it via NFSv3, which has extremely weak semantics and leaves most POSIX filesystem features unimplemented.
Wow, coincidentally I posted GlassBD (https://news.ycombinator.com/item?id=42164058) a couple of days ago. Making S3 strongly consistent is not trivial, so I'm curious about how you achieved this.
If the caching layer can return success before writing through to s3, it means you built a strongly consistent distributed in memory database.
Or, the consistency guarantee is actually less, or data is partitioned and cannot be quickly shared across clients.
I'm really curious to understand how this was implemented.
oh interesting, I'd love to mount this to Finder on Mac, and load a bunch of massive bioinformatics databases on there and treat it like another folder
I'm also using Cloudflare R2 (S3 compatible) and would love for that to work out of the box
That’s so nice see, because in the few days I had been tinkering with the concept of file system + blob storage but I had hard time com up with use-cases other than an unlimited Dropbox where you own the storage and truly pay as you go.
Ok that's cool but like... you could've just given me a bashscript to do the same thing instead of the pitchdeck-followup baggage of the n-th try at recreating the dropbox lottery shot from a decade and a half ago...
I realize it isn't your target use case, but I'm tempted to move all of my personal stuff stored in Google Drive over to this.
How does this compare to Amazon's own offering in this space, the "AWS Storage Gateway"? It can also back various storage protocols with S3, using SSDs for cache, etc. (https://aws.amazon.com/storagegateway/features/)
How did you choose the name Regatta?
I wish you luck, having looked at doing something similar years back, I don’t see the market. In the case of what I was involved in, it pivoted to enterprise backup.
Pretty cool. I'm excited about databases using this. Feels like Neon's PostgreSQL storage, but generalized to an FS.
Is this like FUSE with a cache? How does cache invalidation work?
All the best!
Interesting. Reminds me of FlexFS (https://flexfs.io/). I spoke to a very knowledgeable person there when investigating what to use but we ended up using EFS instead.
An annoying feature of EFS is how it scales with amount of storage, so when its empty its very slow. We also started hitting its limits so could not scale our compute workers. Both can be solved by paying for the elastic iops but that is VERY expensive.
I rejected EFS as a common caching and shared files layer, despite being technologically an excellent fit for my stack, because it is astronomically expensive. The value created didn’t match the cost.
When I got in touch about that, I was confronted with a wall of TCO papers, which tells me the product managers evidently believe their target segment to be Gartner-following corporate drones. This was a further deterrent.
We threw that idea away and used memcached instead, with common static files in a package in S3.
I guess I’m suggesting, don’t be like EFS when it comes to pricing or reaching customers.
> Currently, only the us-east-1 region is supported. Please contact support@regattastorage.com if you need to use a different region.
Bold choice, given what I know about us-east-1
If using EFS already, how would the pricing / performance compare? Or is that maybe not a use case for regatta storage?
Any plans to support on-prem object stores and not just S3?
Super interesting project. But I cannot understand why you support only EC2 instances as clients. For what it is worth, it looks strange and limiting. By default I expect to be able to use Regatta Storage from everywhere: from my local machine, from my Docker containers running elsewhere, etc.
The main reason of adopting object storage is to avoid the burden associated with POSIX file system APIs. And this renders the major motivation using an object storage pointless.
Also using a translation layer on top of S3 will not save your costs.
How does this compare to the log structured virtual disk concept from this paper? It sounds quite similar at a glance.
Can you elaborate on a few things with regards to your pricing:
* What does "$0.05 / gigabyte transferred" mean exactly. Transferred outside of AWS or accessed as in read and written data?
* "$0.20/GiB-mo of high-speed cache" – how is the high-speed cache amount computed?
I have a feeling Amazon is about to throw a big bag of money at you and that this will be the fastest acquisition in HN history. Congratulations on your successful launch!
> NFSv3 (soon, our custom protocol).
definitely the thing I want to hear more about. Also, I can't help shake the "what's the catch, how is no one else doing this, or are they doing it quietly?" feeling.
Just want to say this is super cool. I'm excited to see what people build on top of it.. seems like it could enable a new category of hosted data platforms-as-a-service (platform-as-a-services?).
Do you have any relation to https://regatta.dev/ ?
If this product is successful, what prevents AWS from cloning it at a lower price (perhaps by leveraging access to their infrastructure) and putting you out of business?
I am not sure what is the use case for this.
I would love to see the following projects instead:
- exposing a transactional API for S3
- transactional filesystem
Is there any open source alternative to something like this?
This feels, intuitively, like it would be very hard to make crash consistent (given the durable caching layer in between the client and S3). How are you approaching that?
How does this compare to https://github.com/awslabs/mountpoint-s3 ?
Why are you guys hijacking the scroll bar on your website?
Congrats on the launch!
Could a Regatta filesystem offer any advantage over ClickHouse's built-in S3 and local disk caching features in terms of cost or performance?
Similar to objectiveFS - we use this in production for email sync between multiple postfix servers and dovecot. Is this a supported use case?
Congrats! What a great solution, wish you success. NIT: The forced smooth scrolling on the landing page drives me crazy! haha
How does this differ from rclone mount and its vfs/caching system, possibly combined with mergerfs or rclone union for cache tiering?
Sounds similar to https://juicefs.com/
I know for a while Fuse was considered a security nightmare. My own org banned the use of it. Have things gotten better?
I know that Amazon in general has large ingress and egress cost how much overhead will this application incur?
Are there any tech details/architecture of the system? Also, congrats on launching.
i'm not in storage SaaS, so nooby question - how is this different from Snowflake or Databricks?
Feels like FSx for Lustre without the complexity. Definitely what EFS could be.
Congrats on the launch!
Hi! Would this work for a instance that uses Batman to backup Postgres servers?
that looks interesting. we spent a lot of money on FSxL and might save a lot with Weka. Unfortunately, our data access pattern is very random and will likely not benefit from caching unless we cache the entire dataset (100TB)
Regatta Storage is a new cloud file system^W service
At first glance it’s not clear how this is unique from Nasuni.
That's pretty cool Anybody know of something similar for azure cloud?
Is every file a s3 object? What if you change the middle of a large file?
How does this compare to S3 compatible CSI drivers like DirectPV?
Wondering what the difference is between this and juicefs?
Great product ! Congratulations on the launch !!
How does it handle data append and file editing?
How does this differ from what Nasuni offers?
How does this differ from AWS Storage Gateway?
Careers link points to index page :)
Does this compete with Minio?
SeaweedFS and GarageFS?
I dunno if this is considered off-topic, since it's commentary about the website, but that's twice in the past week I've seen a launch website that must have used a template or something because almost all the links in the footer are href="#". If you don't have Careers, Privacy Policy, Terms, or an opinion about Cookies, then just nuke those links
[dead]
[dead]
[dead]
TL;DR: is this a cloud service or an on-premise thing?
[flagged]
This sounds unnecessary and expensive. Why use this over similar self-managed open source offerings?
I used the same approach based on Rclone for a long time. I wondered what makes Regatta Storage different than Rclone. Here is the answer: "When performing mutating operations on the file system (including writes, renames, and directory changes), Regatta first stages this data on its high-speed caching layer to provide strong consistency to other file clients." [0].
Rclone, on the contrary, has no layer that would guarantee consistency among parallel clients.
[0] https://docs.regattastorage.com/details/architecture#overvie...