Looks great to automate workload for Windows desktop application. I'd love to understand more deeply how your application works, so the set of commands your backend send is click, scroll, screenshot. Does it send command to say type character into an input field? How is it able to pin point a text field from a screenshot? Is LLM reliable to pin point x and y to click on a field?
Also, to have this run in a large scale, Does it become prohibitively expensive to run on daily basis on thousand of custom workflows? I assume this runs on the cloud.
This is great, but a part of me wonders if our industry isnt putting a bandaid on a problem that we ourselves created.
Consider your typical early-2000s era Windows app. It would expect a mouse, but for power users, keyboard shortcuts would be available for every action, even if clunky. For example, Alt F tab tab tab to get to some input field, enter text, tab Alt R Return.
By about 2015 these were all straightforwardly scriptable with AutoHotkey amd similar tools.
But too late: by 2015 even Windows users were using web apps, where the keyboard bindings are variable or non existent, where the entire UI can change overnight, etc. I see some RPA approaches desperately trying to decode the DOM or match pixel elements. It's wild, as you point out.
I guess what I'm wondering if going after legacy Windows apps is a small TAM already largely solved, whereas the SPA/webapp market is gigantic, growing every day, and woefully, miserably, broken as far as automation is concerned.
Looks great. For the EMR use cases, do you sign BAAs? Which CUA models are being used? No data retention?
Congrats on the launch! I've tried building a windows automation script myself but found existing solutions quite cumbersome booking appointments and fetching information on a desktop SMS.
Can I monitor/manage this remotely? I'm not on site with the client and previously tried to manage through AnyDesk but the client often turned off the machine.
Also is there anyway to run this so that it won't interrupt workflows while someone is using the machine? I imagine a solution could just be having the client run an extra computer that's dedicated for this or running after hours on the local machine.
Scheduled a demo
Great idea!
You have not social share preview image on the homepage: https://www.opengraph.xyz/url/https%3A%2F%2Fwww.cyberdesk.io...
Have you looked at using accessibility APIs, such as UI Automation on Windows, to augment screenshots and simulated mouse clicks?
Personally I think this approach is flawed because it runs in the cloud. If it were an agent I could run locally I'd be much more interested.
Autoit must be a good 20 years old: https://www.autoitscript.com/site/
Can it do assertions? This could be useful for testing old software.
You do not need a driver to do what you're doing. It's just slightly easier with a driver.
You can accomplish this from usermode and you wouldn't give potential customers (anyone who plays modern games) a non-starter for your product.
[dead]
[dead]
[dead]
[dead]
Frankly quite insulting to call any Windows app legacy
Congrats! I think the space is very interesting, I was a founder of a similar windows CUA infra/ RPA agents but pivoted. My thoughts:
1) The funny thing about determinism is how deterministic you should be when to break, its kind of a recursive problem. agents are inherently very tough to guardrail on an action space so big like in CUA. The guys from browser use realized it as well and built workflow-use. Or you could try RL or finetuning per task but is not viable(economically or tech wise) currently.
2) As you know, It's a very client facing/customized solution space You might find this interesting, it reflects my thoughts in the space as well. Tough to scale as a fresh startup unless you really niche down on some specific workflows. https://x.com/erikdunteman/status/1923140514549043413 (he is also building in the deterministic agent space now funnily enough) 3) It actually is annoyingly expensive with Claude if you break caching, which you have to at some point if you feed in every screenshot etc. You mentioned you use multiple models (i guess uitars/omniparser?), but in the comments you said claude?
4) Ultimately the big bet in the RPA space, as again you know, is that the TAM wont shrink a lot due to more and more SAP's, ERP's etc implementing API's. Of course the big money will always be in ancient apps that wont, but then again in that space, uipath and the others have a chokehold. (and their agentic tech is actually surprisingly good when i had a look 3 months ago)
Good luck in any case! I feel like its one of those spaces that we are definitely still a touch too early, but its such a big market that there is plenty of space for a lot of people.