You can now enable GPT-4o, DeepSeek R1, Sonnet 3.5, etc.... to understand what's on your screen and take actions. Turning Any LLM into a Computer Use Agent 100% free & open source.
These kind of releases are coming faster and faster.
Find the source on GitHub and run the model on HuggingFace