Why Metaflow?

savin-goyal | 15 points

I think that this would be a better link as it actually explains what Metaflow is without too much marketing fluff https://docs.metaflow.org/introduction/what-is-metaflow

mastazi | a day ago

What do people do to curate/version /transform their raw datasets these days? I am vaguely aware of the "chuck it all into s3" strategy for hanging onto raw data, and related strategies where instead of s3 it's a db of some flavor. What are folks doing for record-keeping for what today's raw data contains vs tomorrow's?

And the next step - a curated dataset has a time-bound provenance - what are folks doing to keep track of the transformations/cleaning steps that makes the raw data useful for the data at the time it's being processed? Does this bit fall under the purview of metaflow, or is this different tooling?

Or maybe my assumptions are off base! Curious about what other teams are doing with their datasets.

thomasingalls | a day ago

I was a big fan of Metaflow a few years back. I thought it was neat how I could write some code and easily run some functions locally versus remote.

Hey Savin, it's been a while since chatted. I hope things are going well ;)

For those unaware, onsone of the co

ghilston | 14 hours ago

Curious to hear from folks who have used both Metaflow and Kubeflow to understand some of those tradeoffs.

Seems like Metaflow is comparatively lightweight, bit more tightly integrated with AWS, less end to end and a bit more agile.

marksimi | 16 hours ago