Polars Cloud and Distributed Polars now available

jonbaer | 142 points

Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

drej | 9 hours ago

I recently had to create a reproducible version of incredibly complicated and messy R concoctions our data scientists came up with.

I did it with pandas without much experience with it and a lot of AI help (essentially to fill in the blanks the data scientists had left, because they only had to do the calculation once).

I then created a polars version which uses lazyframes. It ended up being about 20x faster than the first version. I did try to do some optimizations by hand to make the execution planner work even better which I believe paid off.

If you have to do a large non interactive analytical calculation (i.e. not in a notebook) polars seems to be way ahead imo!

I do wish that it was just as easy to use as a rust library though.. the focus however seems to be on being competitive in python land mainly.

sureglymop | 7 hours ago
robertkoss | 10 hours ago

Love it!

Still don't get why one of the biggest player in the space, Databricks is overinvesting in Spark. For startups, Polars or DuckDB are completely sufficient. Other companies like Palantir already support bring your own compute.

robertkoss | 10 hours ago

Been a polars fan for a loooong time. Happy to see the team ship their product and I hope it does well!

cantdutchthis | 11 hours ago

Polars is certainly better than pandas doing things locally. But that is a low bar. I’ve not had great experience using Polars on large enough datasets. I almost always end up using duckdb. If I am using SQL at the end of the day, why bother starting with Polars? With AI these days, it’s ridiculously fast to put together performant SQLs. Heck you can even make your own grammar and be done with it.

lvl155 | 10 hours ago

I don't understand. Can I use distributed Polars with my own machines or do I have to buy cloud compute to run distributed queries (I don't want that). If not, is this planned?

boomer_joe | 9 hours ago

Polars is great, absolute best of luck with the launch

jpcompartir | 10 hours ago

Hmm so how does the polars SQLContext stack up against duckdb? And can both cope with a distributed polars?

It feels like we are on the path to reinventing BigQuery.

willvarfar | 11 hours ago

Out of curiosity and because I don't want to create a test account right now:

How does billing with "Deploy on AWS" work? Do I need to bring my own AWS account and Polars is payed for the image through AWS or am I billed by Polars and they pass a share to AWS. In other words do I have a contract primarily with AWS or Polars?

weinzierl | 11 hours ago

Cool. But abstract away the infra knowledge to the actual instance types. Instead I’d expect the polars cloud abstraction to find me the most cost effective (spot instance) that meets my cpu and memory reqs and disk reqs. Why do I have to give it — looking at the example — the AWS instance type?

gigatexal | 11 hours ago

Is there any distributed polars for non Polars Cloud?

EDIT: nevermind see same question in this thread. The answer is no!

raoulj | 6 hours ago

How does Polars compare to FireDucks?

blackhaz | 7 hours ago

Maybe just me, but for anyone else who was confused

- Polars (Pola.rs) - the DataFrames library that now has a cloud version

- Polar (Polar.sh) - Payments and MoR service built on top of Stripe

ritzaco | 10 hours ago

SnowFlake, Polars, DucksDB, FireBase, FireDuck... I guess the next product will be IceDuck.

What is wrong with you DB people :))).

dbacar | an hour ago

can i run a distributed computation in pola.rs cloud on my own AWS infra? or do I need to run it on-prem?

cmollis | 7 hours ago

So competing with SnowFlake?

anonu | 11 hours ago

can you dive a bit deeper into the comparison with spark rdd

cbb330 | 10 hours ago

[dead]

curtisszmania | 3 hours ago

[dead]

Arnechos | 9 hours ago