Show HN: Arroyo – Write SQL on streaming data

necubi | 115 points

Unbounded streams, but with watermarks (which right now seem fixed length?):

https://doc.arroyo.dev/concepts#watermarks

Also works based on fixed, pre-built pipelines. This is all very much in the style of most stream processing platforms today but I hope we’ll continue to move closer as an industry to having our cake and eating it: ingest everything in real-time, while serving any query (with joins) over the full dataset (either incrementally or ad-hoc).

thom | a year ago

Looks cool. What is the difference between this tools and benthos (https://www.benthos.dev/)?

yevpats | a year ago

This is a really exciting project! I recently learned about https://github.com/vmware/database-stream-processor which builds on a new theoretical foundation and claims to be 9x faster than Flink. It is also written in Rust, and there is a compiler from SQL to Rust executables. Can you comment on the differences?

sorenbs | a year ago

Between Flink, Spark and KSQL, streaming is very JVM centric. It is nice to see more non JVM projects emerge.

I am not sure about your premise that the operations side is difficult. It tends to be submitting a job to a cluster in Flink or Spark.

The harder barrier to entry is the functional style of transformation code. Even though other frameworks have it, I think the SQL API as the first class citizen is the bigger differentiator.

benjaminwootton | a year ago

In the watermarks documentation it mentions that events arriving after the watermark are dropped. Are there any plans to make this configurable (to disable dropping or trigger exception handling) and/or alertable?

I can think of quite a few use cases (particularly in finance) where we'd want late-arrivals to be recorded and possibly incorporated into later or revised results, not silently dropped on the floor.

jsty | a year ago

Very interesting project, Arroyo has been on my watch list for a while! How would you say does Arroyo compare to Apache Flink, i.e. what are pros and cons? For instance, given it's implemented in Rust, I'd assume Arroyo's resource consumption might be lower?

(Disclaimer: I work for Decodable, where we build a SaaS based on Flink)

gunnarmorling | 10 months ago

Would love to know how you look at tools list Materialize in comparison

dangoodmanUT | a year ago

Slightly off-topic.

"Arroyo" is a Spanish word meaning creek, or stream

fasteo | a year ago

Very exciting, how is feature parity with tinybird?

https://www.tinybird.co/

KRAKRISMOTT | a year ago
[deleted]
| a year ago

This looks great, and it’s very cool that it recommends Nomad to run it in production.

I wish more products would support (or at least document how to run on) Nomad.

httgp | a year ago

Would Arroyo be an alternative to Confluent KSQL?

laurensr | a year ago

[dead]

yakorevivan | a year ago

Any interest in redoing the web console in Rust? 8)

trevyn | a year ago