Cool!
How are messages counted? For example, in Cursor, one request is 25 tool calls. Does 100 messages in a subscription here mean 100 tool calls or 100 requests each with 25 tool calls?
When it comes to privacy, there are also some questions. It says that requests can only be used for debugging purposes, but it later mentions a license for using the requests to improve the platform, which can mean that you can use it not only for debugging purposes.
I currently use Cerebras for qwen3. One of the things I like is its speed(the TPM limit is rough). I am curious, how fast is qwen3 on your platform and what quantization are you running for your models?
Do you plan to offer a high-quality FIM models in the bundle? Would be handy to perform autocompletion locally, say via the Qwen3-coder.
I was literally just wishing there was something like this, this is perfect! Do you do prompt caching?
Can this be provided as an API?
how would I point to your API to use in a Mastra ai agent?
I signed up, feels like this is something that should've existed long ago.
Your privacy policy isn't good for a privacy focused provider though. You shouldn't have the rights to use my personal information. The use of Google Tag Manager also not inspire confidence, especially in LLM pages where you might "accidentally" install a user monitoring script and the prompts get logged. I'd suggest looking at how Kagi do the marketing to privacy-conscious customers.