Frequently Asked Questions

Find answers to common questions about Kaizen Agent, troubleshooting tips, and guidance on when and how to use the tool effectively.

General Questions

What is Kaizen Agent?

Kaizen Agent is an AI-powered debugging engineer that automatically tests, analyzes, and improves your AI agents and LLM applications. Instead of manually writing test cases and debugging failures, you define your test inputs and evaluation criteria in YAML, and Kaizen handles the rest.

When should I use Kaizen Agent?

Kaizen Agent is most valuable during the development phase of your AI agents, right after you've written the initial code but before deployment.

Perfect use cases:

🔄 Iterative Development: Test and improve agents during development cycles
🚀 Pre-Deployment Validation: Ensure your agent works reliably before going live
🐛 Bug Detection: Catch and fix issues you might miss with manual testing
📈 Performance Optimization: Continuously improve prompts and code based on test results
🛡️ Quality Assurance: Maintain high standards as your agent evolves

When should I NOT use Kaizen Agent?

Production environments - Kaizen is for development/testing, not live systems
Simple, stable agents - If your agent is already working perfectly, you might not need it
Non-AI applications - Kaizen is specifically designed for AI agents and LLM applications

Do I need to write test code?

No! Kaizen Agent uses YAML configuration instead of traditional test files:

❌ Traditional approach: Write test files with unittest, pytest, or jest
✅ Kaizen approach: Define tests in YAML - no test code needed!

Installation & Setup

What are the system requirements?

Python 3.8+ (Python 3.9+ recommended for best performance)
Google API Key for Gemini models
Basic familiarity with Python or TypeScript

How do I get a Google API key?

Go to Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the generated key and set it as GOOGLE_API_KEY in your environment

How do I set up environment variables?

Option 1: Using .env file (Recommended)

# Create .env file
cat > .env << EOF
GOOGLE_API_KEY=your_api_key_here
GITHUB_TOKEN=ghp_your_github_token_here
EOF

Option 2: Set directly in shell

export GOOGLE_API_KEY="your_api_key_here"
export GITHUB_TOKEN="ghp_your_github_token_here"

Option 3: Using Kaizen commands

# Create environment example file
kaizen setup create-env-example

# Check environment setup
kaizen setup check-env

Configuration

How do I write effective evaluation criteria?

⚠️ CRITICAL: This section feeds directly into the LLM for automated evaluation. Write clear, specific criteria for best results.

✅ Good Examples:

- name: sentiment_score
  source: variable
  criteria: "The sentiment_score must be a float between -1.0 and 1.0. Negative values indicate negative sentiment, positive values indicate positive sentiment. The score should accurately reflect the emotional tone of the input text."
  weight: 0.4

- name: response_quality
  source: return
  criteria: "The response should be professional, well-structured, and contain actionable insights. It must be free of grammatical errors and provide specific, relevant information that addresses the user's query directly."
  weight: 0.6

❌ Poor Examples:

- name: result
  source: return
  criteria: "Should be good"  # Too vague
  weight: 1.0

- name: accuracy
  source: variable
  criteria: "Check if it's correct"  # Not specific enough
  weight: 1.0

What input types does Kaizen support?

Kaizen supports multiple input types:

String Input:

- name: text_content
  type: string
  value: "Your text here"

Dictionary Input:

- name: config
  type: dict
  value:
    key1: "value1"
    key2: "value2"

Object Input:

- name: user_review
  type: object
  class_path: agents.review_processor.UserReview
  args: 
    text: "This product exceeded my expectations!"
    rating: 5
    category: "electronics"

How do I test edge cases?

Include tests for:

Empty or minimal inputs
Very long inputs
Unusual or unexpected inputs
Boundary conditions

Example edge case tests:

steps:
  - name: Normal Input
    input:
      input: "Hello, how are you?"
  
  - name: Edge Case - Empty Input
    input:
      input: ""
  
  - name: Edge Case - Very Long Input
    input:
      input: "This is a very long input that might cause issues..."
  
  - name: Edge Case - Special Characters
    input:
      input: "Test with special chars: !@#$%^&*()"

Troubleshooting

Common Error: "Module not found"

Problem: Kaizen can't import your agent module.

Solutions:

Check file paths: Ensure your YAML file_path matches the actual file location
Verify module structure: Make sure your Python module can be imported correctly
Check working directory: Run Kaizen from the correct directory
Test import manually: Try python -c "import your_module" to verify

Common Error: "API key not found"

Problem: Kaizen can't find your Google API key.

Solutions:

Check environment variable: Ensure GOOGLE_API_KEY is set
Verify .env file: Make sure it's in the correct location
Test manually: Try echo $GOOGLE_API_KEY to verify it's set
Use setup command: Run kaizen setup check-env to diagnose

Common Error: "Evaluation failed"

Problem: The LLM evaluator can't understand your evaluation criteria.

Solutions:

Make criteria more specific: Include exact requirements, ranges, or formats
Provide context: Explain what "good" means in your domain
Include examples: Reference expected patterns or behaviors
Use clear language: Avoid ambiguous terms that LLMs might misinterpret

Common Error: "GitHub access denied"

Problem: Can't create pull requests or access GitHub.

Solutions:

Check token permissions: Ensure your GitHub token has the repo scope
Verify repository access: Make sure you have write access to the repository
Test GitHub access: Run kaizen test-github-access --repo owner/repo-name
Check repository format: Use owner/repo-name format (e.g., myuser/myproject)

Tests are taking too long

Problem: Tests are running slowly or timing out.

Solutions:

Set timeout: Add timeout: 180 to your YAML settings
Reduce test complexity: Simplify evaluation criteria
Use parallel testing: Enable parallel: true in settings
Check API limits: Ensure you're not hitting rate limits

Generated fixes don't make sense

Problem: Kaizen's automatic fixes seem incorrect or inappropriate.

Solutions:

Review evaluation criteria: Make sure they're clear and specific
Check test inputs: Ensure test cases are realistic
Review before applying: Always review fixes before merging
Adjust weights: Modify evaluation target weights to focus on priorities

Performance & Optimization

How can I speed up my tests?

Use parallel testing:

settings:
  parallel: true
  max_workers: 4

Set appropriate timeouts:
```
settings:
  timeout: 180  # seconds
```
Optimize evaluation criteria: Make them more specific and concise
Reduce test complexity: Focus on the most important aspects

How do I handle rate limits?

Add delays between requests:

settings:
  request_delay: 1  # seconds between requests

Use retry logic:
```
max_retries: 3
```
Monitor API usage: Check your Google AI Studio dashboard
Consider API quotas: Upgrade your plan if needed

Best Practices

How should I structure my test configuration?

Use descriptive names: Professional Email Improvement vs Test 1
Balance test coverage: Mix happy path and edge case tests
Write clear evaluation criteria: Be specific about what constitutes success
Include realistic inputs: Use test data that reflects real usage
Review generated fixes: Always check before applying

How do I maintain test quality over time?

Regularly update evaluation criteria based on new requirements
Add new test cases as you discover edge cases
Review and refine prompts based on test results
Monitor performance metrics and adjust weights accordingly
Keep test data current with your application's evolution

How do I collaborate with team members?

Share YAML configurations in version control
Use consistent naming conventions for tests and evaluations
Document evaluation criteria so others understand the standards
Review pull requests created by Kaizen before merging
Set up CI/CD integration for automated testing

Advanced Topics

Can I use custom evaluation functions?

Yes, you can define custom evaluation logic:

evaluation:
  evaluation_targets:
    - name: custom_score
      source: custom_function
      function: my_evaluator.evaluate_response
      criteria: "Custom evaluation criteria"
      weight: 0.5

How do I integrate with CI/CD?

You can integrate Kaizen Agent with GitHub Actions:

name: Kaizen Agent Testing

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
      - name: Install Kaizen Agent
        run: pip install kaizen-agent
      - name: Run Kaizen tests
        run: kaizen test-all --config kaizen.yaml --auto-fix --create-pr --repo ${{ github.repository }}
        env:
          GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

How do I debug test failures?

Enable debug mode:

kaizen test-all --config kaizen.yaml --debug

Save detailed logs:

kaizen test-all --config kaizen.yaml --save-logs

Analyze logs:

kaizen analyze-logs test-logs/latest_test.json

Check the logs directory for detailed execution information

Getting Help

Where can I get more help?

Discord Community: Join our Discord server for real-time support
GitHub Issues: Report bugs and request features on GitHub
Documentation: Check the other guides in this documentation
Examples: Review the Examples page for working configurations

How do I report a bug?

Check existing issues on GitHub first
Create a new issue with:
- Clear description of the problem
- Steps to reproduce
- Your YAML configuration (with sensitive data removed)
- Error messages and logs
- Environment details (Python version, OS, etc.)

How can I contribute?

Star the repository on GitHub
Share examples of your configurations
Report bugs and request features
Join the Discord community to help other users
Submit pull requests for improvements

For more help, join our Discord community or check out our GitHub repository.

General Questions​

What is Kaizen Agent?​

When should I use Kaizen Agent?​

When should I NOT use Kaizen Agent?​

Do I need to write test code?​

Installation & Setup​

What are the system requirements?​

How do I get a Google API key?​

How do I set up environment variables?​

Configuration​

How do I write effective evaluation criteria?​

What input types does Kaizen support?​

How do I test edge cases?​

Troubleshooting​

Common Error: "Module not found"​

Common Error: "API key not found"​

Common Error: "Evaluation failed"​

Common Error: "GitHub access denied"​

Tests are taking too long​

Generated fixes don't make sense​

Performance & Optimization​

How can I speed up my tests?​

How do I handle rate limits?​

Best Practices​

How should I structure my test configuration?​

How do I maintain test quality over time?​

How do I collaborate with team members?​

Advanced Topics​

Can I use custom evaluation functions?​

How do I integrate with CI/CD?​

How do I debug test failures?​

Getting Help​

Where can I get more help?​

How do I report a bug?​

How can I contribute?​

General Questions

What is Kaizen Agent?

When should I use Kaizen Agent?

When should I NOT use Kaizen Agent?

Do I need to write test code?

Installation & Setup

What are the system requirements?

How do I get a Google API key?

How do I set up environment variables?

Configuration

How do I write effective evaluation criteria?

What input types does Kaizen support?

How do I test edge cases?

Troubleshooting

Common Error: "Module not found"

Common Error: "API key not found"

Common Error: "Evaluation failed"

Common Error: "GitHub access denied"

Tests are taking too long

Generated fixes don't make sense

Performance & Optimization

How can I speed up my tests?

How do I handle rate limits?

Best Practices

How should I structure my test configuration?

How do I maintain test quality over time?

How do I collaborate with team members?

Advanced Topics

Can I use custom evaluation functions?

How do I integrate with CI/CD?

How do I debug test failures?

Getting Help

Where can I get more help?

How do I report a bug?

How can I contribute?