Well... yeah, kind of.I guess some parts that are missing are having a schema for the CMD part, and being able to easily generate APIs for various languages from that schema

Dynamic libraries, command-line executables, …

Just some random brain dump: Why limit to ML models?Perhaps we can (should?) have some universal package hub, where you can package and push a &quot;thing&quot; from any language, and then pull and use it from any other language. With some metadata describing the input&#x2F;output schema. The underlying engine can use WASM or containers or something like that.

Seems almost too good to be true, but I really hope it&#x27;s not. How does it handle things like CUDA dependencies? Can it somehow make those portable too? Or is GPU acceleration not quite there yet?

That’s awesome! Thanks for making this

Yes, that&#x27;s a use case Carton supports.For exmaple, if your model contains arbitrary Python code, you&#x27;d pack it using [1] and then you could load it from another language using [2]. In this case, Carton transparently spins up an isolated Python interpreter under the hood to run your model (even if the rest of your application is in another language).You can take it one step further if you&#x27;re using certain DL frameworks. For example, you can create a TorchScript model in Python [3] and then use it from any programming language Carton supports without requiring python at runtime (i.e. your model runs completely in native code).[1] <a href="https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;packing&#x2F;python" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;packing&#x2F;python</a>[2] <a href="https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;loading" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;loading</a>[3] <a href="https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;packing&#x2F;torchscript" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;packing&#x2F;torchscript</a>

So this means if I want to use a ML model I made in python, but don&#x27;t want to code the rest of the application in python I can do that?

This looks interesting - I use OONX to call my PyTorch models from .NET but so far it’s meant I’ve not been able to test out JAX based libraries since they don’t have ONNX export and it has also meant I had to write C# boilerplate code to preprocess my input data into the form required by the model.Potentially this could be a lot better but I’d be curious what speed overhead the IPC layer adds. At least with ONNX you get a nice speed bump :)

I&#x27;m definitely open to it if there&#x27;s interest (or if someone wants to help), but I don&#x27;t have plans to implement Windows support myself at the moment.The currently supported platforms [1] were mostly driven by environments I&#x27;ve seen at various tech companies.I do have active plans to support inference from WASM&#x2F;WebGPU so maybe that could be a good entrypoint to Windows support.--[1] Currently, the supported platforms are:* `x86_64` Linux and macOS* `aarch64` Linux (e.g. Linux on AWS Graviton)* `aarch64` macOS (e.g. M1 and M2 Apple Silicon chips)* WebAssembly (metadata access only for now, but WebGPU runners are coming soon)

Any plans to support Windows? That would make Carton the ultimate library to embed LLMs into desktop applications

got it. went through both of the codebases. what you say is the case. thanks!

That seems different to me. OP is talking about using ML models outside of python (well, in python, too). That link seems to be talking about using ML models across frameworks (pytorch, tensorflow, jax, etc) in python.

is this ancillary to what [these guys](<a href="https:&#x2F;&#x2F;github.com&#x2F;unifyai&#x2F;ivy">https:&#x2F;&#x2F;github.com&#x2F;unifyai&#x2F;ivy</a>) are trying to do?

This HN post looks really weird on mobile (no, not the website, HN itself)

I think this Carton project is on a lower level than Triton. With Triton you&#x27;d start the Triton server then make requests against it, while Carton is more like a library that you include in your application&#x2F;library and code it with the same language you&#x27;d write your application&#x2F;library.

Is this the same as Nvidia&#x27;s Triton?

I&#x27;d love to see this for golang (even without GPU support).

&gt; This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM)If someone already has an ONNX model, there&#x27;s already an in-browser capable ONNX runtime: <a href="https:&#x2F;&#x2F;onnxruntime.ai&#x2F;docs&#x2F;get-started&#x2F;with-javascript.html#onnx-runtime-web" rel="nofollow noreferrer">https:&#x2F;&#x2F;onnxruntime.ai&#x2F;docs&#x2F;get-started&#x2F;with-javascript.html...</a>(It does use some parts compiled to WASM under the hood, presumably for performance.)

ONNX runtime doesn&#x27;t convert models, it runs them, and it has bindings in several languages. And most importantly it&#x27;s tiny compared to the whole python package mess you get with TF or pytorch.If carton took a TF&#x2F;pytorch model and just dealt with the conversion into a real runtime, somehow using custom ops for the bits that don&#x27;t convert, that would be amazing though.

That&#x27;s a good question! There&#x27;s an FAQ entry on the homepage that touches on this, but let me know if I can improve it:&gt; ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, &quot;conversion&quot; steps (e.g. to ONNX) can be problematic and require additional validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration.&gt; With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).More broadly, Carton can compose with other interesting technologies in ways ONNX isn&#x27;t able to because ONNX is an inference engine while Carton is an abstraction layer.

Maybe I&#x27;m missing something here, isn&#x27;t this largely achieved by ONNX already?[0] <a href="https:&#x2F;&#x2F;onnx.ai" rel="nofollow noreferrer">https:&#x2F;&#x2F;onnx.ai</a>

It&#x27;s gonna fall apart in a spectacular way when they try to marshal data across compiled language boundaries.This is the actual hard problem in this domain, not packaging a model file in a zipfile.

This is a reasonable approach for systems that allowed to load binaries (either the running artifact is a binary or semi-binary (WASM executable) or it allows to load .so &#x2F; .dll from user-provided places).It basically runs with the promise that you can package CUDA &#x2F; PyTorch &#x2F; Python interpreter into the host language in some way, and use it.This is true for Android, not true for iOS, true for almost all desktop systems, somewhat true for web (packaging PyTorch + Python interpreter in WASM, the latter is easy, the former, I am unsure), probably not true for FAAS environments (such as Cloudflare worker, or AWS Lambda).

&gt; From any [*] programming language.[*] If &quot;any programming language&quot; is Python or Javascript.

Wow, make it open source quickly!!! :hype:. It&#x27;s a classic Python REST API for model serving. But we have very low latency constraints. As such, rewriting in more high performant backend languages e.g. Go or Rust would substantially reduce resource usage (by reducing horizontal scaling need). Pre-baked model serving frameworks e.g. Nvidia&#x27;s Triton aren&#x27;t an option, since we have to query a feature store, and do some input feature tracking in between. Go seemed like an efficient, developer friendly choice, but there aren&#x27;t any well maintained model inference libraries in Go up to this day...

We have a similar high performance AI stack written in Go capable to load many different models from different frameworks. This is work of several years. Just saw your comment and thought about our company internal talk to release everything under an open source license. Thanks for reminding me :)
What are your use-cases?

This seems to be a reasonable approach for Go, but you did need to carry a lot in your containerized environment (Go tends to have very lean container, and this approach requires a fat container with CUDA, PyTorch, Python etc).

Make it for Go, and I am sold. Running ML models in Go services is still an unsolved problem.

Slightly related dumb question, I saw on GitHub that TensorFlow has Java support. Does anyone actually use TensorFlow with Java?

In addition to the benefits mentioned in the sibling comment, zip files let you seek to and access individual files in the archive without extracting all files (vs tar files for example).This lets us do things like fetch model metadata [1] for a large remote model, by only fetching a few tiny byte ranges instead of the whole model archive.It also means you can include sample data (images, etc) with your model and they&#x27;re only fetched when necessary (for example with stable diffusion: <a href="https:&#x2F;&#x2F;carton.pub&#x2F;stabilityai&#x2F;sdxl" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.pub&#x2F;stabilityai&#x2F;sdxl</a>)[1] <a href="https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;metadata" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.run&#x2F;docs&#x2F;metadata</a>

zip-file-as-a-container-format seems pragmatic: it&#x27;s a way to bundle multiple files into one file (easier to manage than scattering multiple files), it avoids introducing a new proprietary format, it can optionally be compressed, support for reading and writing the container format is already widespread.To give two examples of prior art, it worked for Quake 3 data files (.pk3) &amp; geospatial data files (.kmz)Maybe it&#x27;s not the best choice but it doesn&#x27;t seem like a bad one.

It&#x27;s a fairly common way of bundling multiple files into one that has large support and usually &quot;good enough&quot; compression.It&#x27;s hardly revolutionary to do this, here are some common examples of things that are zip files but don&#x27;t label themselves as such:- .jar- .odt, .ods, .odp, .docx, .xlsx, .pptx- .epub- .apk- .crx, .xpi

&gt; Carton wraps your model with some metadata and puts it in a zip fileWhy a zip file?

It&#x27;s not a shallow dismissal.The selling point of this thing is cross-language interoperability, and while they advertise it, they don&#x27;t deliver.Sorry, but if your &quot;any language&quot; is &quot;Python or Javascript&quot; your project hasn&#x27;t even reached the proof of concept stage, it&#x27;s just a vague idea at this point.Supporting C++ and C will be 90% of the work and the real challenge.

Replying to this to explain the downvotes.We all think this. My initial thought was that this is probably a startup selling PyTorch-as-a-Service, and I did not bother to read the article. It turns out that I was wrong, and this might even be useful -- if not for the implementation, then perhaps for the idea.However, it turns out to make Hacker News a nicer space if we follow these guidelines:&gt; Please don&#x27;t post shallow dismissals, especially of other people&#x27;s work. A good critical comment teaches us something.

OP has already worked at both Facebook and Google, it&#x27;s doubtful they need any more resume-bolstering.

&quot;...run any machine learning model from any programming language*.&quot;*As long as that language is python or rust.What I think is that this is nothing more than a resume-bolstering effort that doesn&#x27;t really need to exist and probably won&#x27;t once OP lands a role at whatever FAANG company they&#x27;re trying to impress.

The goal of Carton is to let you use a single interface to run any machine learning model from any programming language.It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes.Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve.At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below.Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application.I’ve been working on Carton for almost a year now and I’m excited to open source it today!Some useful links:* Website, docs, quickstart - <a href="https:&#x2F;&#x2F;carton.run" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.run</a>* Explore existing models - <a href="https:&#x2F;&#x2F;carton.pub" rel="nofollow noreferrer">https:&#x2F;&#x2F;carton.pub</a>* Repo - <a href="https:&#x2F;&#x2F;github.com&#x2F;VivekPanyam&#x2F;carton">https:&#x2F;&#x2F;github.com&#x2F;VivekPanyam&#x2F;carton</a>* Architecture - <a href="https:&#x2F;&#x2F;github.com&#x2F;VivekPanyam&#x2F;carton&#x2F;blob&#x2F;main&#x2F;ARCHITECTURE.md">https:&#x2F;&#x2F;github.com&#x2F;VivekPanyam&#x2F;carton&#x2F;blob&#x2F;main&#x2F;ARCHITECTURE...</a>Please let me know what you think!

Show HN: Carton – Run any ML model from any programming language