This is really neat! Didn’t realize it could be this simple to run RL on models. Quick question: How would I specify the reward function for tool use? or is this something you automatically do for me when I specify the available tools and their uses?
3s | 6 hours ago
Is there any credence to the view that these startups are basically dspy wrappers
nextworddev | 13 hours ago
[dead]
curtisszmania | 13 hours ago
Was excited to see something about reinforcement learning as I'm working on training an agent to play a game, but apparently all reinforcement learning nowadays is for LLMs.