Netflix Conductor: Open-source workflow orchestration engine

swyx | 253 points

I set up Conductor where I work while evaluating workflow engines, and overall wasn't too happy with it. The default datastore is this Netflix-specific thing (Dynomite) that's built on top of redis. It's not particularly easy to integrate operationally into non-Netflix infrastructure, and Conductor itself hard dependencies on several services.

The programming model for workflows/tasks felt a little cumbersome, and after digging into the Java SDK/Client, I wasn't impressed with the code quality.

We did have some contacts at Netflix to help us with it, but some aspects (like dyomite itself, and its sidecar, dynomite-manager) felt abandoned with unresponsive maintainers.

We've started using Temporal[0] (née Cadence) recently, and while it's not quite production-ready, it's been great to work with, and, just as critically, very easy to deal with operationally in our infrastructure. The Temporal folks are mostly former Uber developers who worked on Cadence, and since they're building a business around Temporal, they've been much more focused and responsive.

[0] https://temporal.io/

kelnos | 5 years ago

Workflows and orchestration are my jam -- that's what we're trying to simplify over at https://refinery.io

Conductor is a cool piece of tech, and it's a well-established player in a rapidly growing space for workflow engines.

I used to work at Uber and that company had microservice-hell for a while. They built the project Cadence[0] to alleviate that. It is similar to Conductor in many ways.

One project to watch out for is Argo[1] which is a CNCF-backed project.

There are also some attempts[2] to standardize the workflow spec.

Serverless adds a whole new can of worms to what orchestration engines have to manage, and I'm very curious to see how things evolve in the future. Kubernetes adds a whole dimension of complexity to the problem space, as well.

If anybody is interested in chatting about microservice hell or complex state machines for business logic, I'd be excited to chat. I'm always looking for more real world problems to help solve (as an early stage startup founder) and more exposure to what others are struggling with is helpful!

0: https://github.com/uber/cadence

1: https://argoproj.github.io/argo/

2: https://serverlessworkflow.github.io/

freeqaz | 5 years ago

Quick notes from skimming the docs:

* Conductor implements a workflow orchestration system which seems at the highest level to be similar to Airflow, with a couple of significant details.

* There are no "workers", instead tasks are executed by existing microservices.

* The Orchestrator doesn't push work to workers (e.g. Airflow triggering Operators to execute a DAG), instead the clients poll the orchestrator for tasks and execute when they find them.

My hot take:

If you already have a very large mesh of collaborating microservices and want to extract an orchestration layer on top of their tasks, this system could be a good fit.

Most of what you're doing here can also be implemented in Airflow, using an HTTPOperator or GRPCOperator that triggers your services to initiate their task. You don't get things like pausing though. On the other hand, you do get the ability to run simple/one-off tasks in an Airflow operator, instead of having to build a service to run your simple Python function.

I'm unsure on whether push/pull is better; I think it largely depends on your context. I'm inclined to say that for most cases, having the orchestrator push tasks out over HTTP is a better default, since you can simply load-balance those requests and horizontally scale your worker pool, and it's easier to test a pipeline manually (e.g. for development environments) if the workers respond to simple HTTP requests, instead of having to provide a stub/test implementation of the orchestrator. (In particular I'm thinking about "running the prod env on your local machine in k8s" -- this isn't practical at Netflix scale though.)

theptip | 5 years ago

We've used Conductor at my workplace for about a year now. The grounding is pretty solid but the documentation is pretty pants once you dig into it. We have to resort to digging into github issues to find fairly fundamental features that aren't really documented. I feel Conductor is something Netflix has open-sourced and then sort of dumped on the OS community.

For example there isn't any examples of how to implement workers using their Java client, we had to dig up a blog post to do that, although it is fairly simple a very basic example of implementing the Worker interface would be nice.

They also do not make it clear the exact relationship between tasks and workflows and it's hard to find any good examples of relatively complex workflows and task definitions available on the internet other than Netflix's barebones documenatation and the kitchen-sink workflow they provide, which is broken by default on the current API.

Also the configuration makes mention of so many fields that are pretty much undocumented, like you can swap out your persistence layer for something else but I would have no idea how that works.

tupac_speedrap | 5 years ago

Suprised to see Camunda isn't mentioned here more.

Open-Source BPMN compliant workflow processing with a history of success. Goldman Sachs supposedly runs their internal org with it.

Slightly different target use case, but Camunda has really shined in microservices orchestration and I find implementing complex workflow and managing task dependencies much easier with it.

TheColorYellow | 5 years ago

Very interesting. Looks a lot like zeebe [0], which uses BPMN for the workflow definition. This makes it easier to communicate the processes with the rest of the company. I never used it in production, just played around with it for a demo.

[0] https://zeebe.io/

_gfrc | 5 years ago

Can someone explain how and where to use a Workflow Orchestration Engine ?

shadykiller | 5 years ago

I prefer Power Automate / Logic Apps interface, it would be cool if there was a Power Automate open source imagine the number of plugins for that cloud tool that would come up? It's a valuable tool and part of the O365 ecosystem and could be even greater, more strategy and vision into that product would make O365 and Azure a leader in components and integrations this is the most valuable thing in the end of it all.

ramon | 5 years ago

Does this have the same limitations as Airflow? How does it compare to something like Prefect?[0]

[0] https://medium.com/the-prefect-blog/why-not-airflow-4cfa4232...

ForHackernews | 5 years ago

We started using it around 2016 in the company I work for. We decided to use it to automate the often manual setup of new clients for each product. It grew to use our own security and rights system, and we also added a different database support (which we are working on the open source). We also changed the Jason API to conform to our company wide standard.

At the time, we wanted something that we could host ourself, maintained, open source and that was working!

Nowadays internal teams also use it to automate their own processes as well.

We’d probably go for a “push based” workflow engine, maybe based on events, mainly for latency and load reasons, but it’s something we’re ok with so far (there is a way to listen to event for some tasks though, but it’s not that easy)

If I’m not mistaken, Netflix uses it to automate video encoding for shows, but that might be outdated.

Overall, we’re pleased about it. But here are some cons about it: we wished we could split some services out (such as read only ones, or the workflow definitions from the executions, etc, but the code isn’t architecture for such an easy split: for example, pushing the result of a task computation by a worker triggers the current workflow to determine the next task to schedule, but it’s doing this internally, and not through the defined interfaces) Security of the api is not so easy, as it’s not really modular (unlike the database implementation, which is great). That point is being worked on though, so there is some hope for the future.

jiehong | 5 years ago

For better or worse, we ended up creating our own workflow engine at my company. Unfortunately, everyone who ends up using it hates it. We've also run into the problem where the entire process of producing our end product is encoded in the workflow. Downstream steps depends on earlier steps etc. If any part of the process changes, managing this data becomes tough.

Additionally, we have software engineers writing these workflows. Ideally we would have tooling so that those who know the process can write these things. The difficulty we have had though is making it easy to join/match up earlier parts of the process with later steps. We do this now by keeping a lot of data in the workflow and by occasionally persisting data in other places. Software engineers, not the process people, are the ones who understand the data model and how to munge everything together.

Have others dealt with this issue?

yeswecatan | 5 years ago

I have been exploring workflow orchestration for sometime now - specifically Temporal. Temporal's authors don't recommend it for very high throughput (per workflow) use cases, although I haven't benchmarked it myself. Also using it in a SaaS environment, I would prefer some serverless deployment strategy which possibly allows scaling down to zero.

I have my eyes on flink stateful functions http://statefun.io/ The abstractions are quite low level as compared to Temporal but the overall ability to write tasks/activities as serverless functions which have access to state is quite attractive.

Would be happy to talk to someone who has explored this further.

f0rr0 | 5 years ago

Seems neat. I guess this partially solves the problem of having some workflow stuck/dropped.

I wonder how much overhead there is. How much latency does each task cause?

Is it feasible to complete workflows while a user/client is waiting for a RESTful response?

jayd16 | 5 years ago
[deleted]
| 5 years ago

Since this has come up several times, Airflow is an orchestration tool for ETL jobs (long running complex processes) and Netflix Conductor is an orchestration tool for micro-services (short running simple processes).

iblaine | 5 years ago

There is also https://github.com/dapr/workflows which uses Azure Logic Apps engine on https://github.com/dapr/

f0rr0 | 5 years ago

> Almost no way to systematically answer “How much are we done with process X”?

is this a typo?

dmead | 5 years ago

Any difference with airflow?

The_rationalist | 5 years ago
dang | 5 years ago

It's great to see corporations getting more involved in open source software; giving back and empowering the developer community.

realistcake | 5 years ago

can somebody ELI5, why would someone need such a workflow orchestration engine? What problems are best solved with workflow engines?

lewisjoe | 5 years ago

How does this compare with Cadence/Temporal?

itpragmatik | 5 years ago

I wonder what the relationship is to StackStorm[1]? StackStorm is older and lists Netflix as a sponsor / user.

[1] https://stackstorm.com/

dahfizz | 5 years ago

Comparison with Airflow will be helpful.

iamAtom | 5 years ago