🚧 Limitations & Future Plans

Kaizen Agent is evolving rapidly, but it's important to understand what it can and can't do today — and where it's going.

🧱 Current Limitations

Framework support: Currently works only with:
- Python agents
- Mastra (a TypeScript agent framework)
Single-agent only: Kaizen Agent supports testing only one agent at a time.
No multi-turn interactions: It's designed for single-call agent methods, not ongoing conversations.
Evaluation is limited to return values:
- You must define evaluation targets on the return of a single function.
- It cannot evaluate intermediate variables or multi-method outputs.
Input types supported:
- string
- dict
- object (with constructor args)
No browser or tool-based agent support (yet)
Not suitable for production environments — it's for development & improvement only

🚀 Future Plans

We're building toward a general-purpose debugging agent for any LLM application. Planned features include:

🔄 Multi-turn support: Debug conversational agents that span multiple steps
👥 Multi-agent scenarios: Handle coordination, failures, and optimization across agents
🧠 Complex workflows: Support for:
- Browser agents
- Tool-using agents
- Toolformer-style LLMs
📦 Framework compatibility: Add support for popular frameworks like:
- LangChain
- CrewAI
- Autogen
- OpenAgents
📊 API comparison testing: Automatically evaluate which LLM or endpoint performs better
🎯 Production-grade fine-tuning: Help teams go from 80% to 90%+ accuracy for real-world use cases
🛠️ Project-level automation: Let Kaizen Agent fix and refactor full agent workflows across files and modules

Our long-term vision is to create an AI development teammate that accelerates LLM development automatically, so humans can focus on building features — and Kaizen Agent handles everything else to reach production-level reliability.

💬 Want to help?
Join our Discord or contribute on GitHub.

🧱 Current Limitations​

🚀 Future Plans​

🧱 Current Limitations

🚀 Future Plans