Examples

Explore real-world examples of AI agents and their Kaizen Agent test configurations. These examples demonstrate different use cases, input types, and evaluation strategies.

Email Improvement Agent

A simple agent that improves email drafts by making them more professional and well-structured.

Python Version

Agent Code (my_agent.py):

import google.generativeai as genai
import os

class EmailAgent:
    def __init__(self):
        genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
        self.model = genai.GenerativeModel('gemini-2.5-flash-preview-05-20')
        # Simple prompt that Kaizen can improve significantly
        self.system_prompt = "Improve this email draft."
    
    def improve_email(self, email_draft):
        full_prompt = f"{self.system_prompt}\n\nEmail draft:\n{email_draft}\n\nImproved version:"
        response = self.model.generate_content(full_prompt)
        return response.text

Test Configuration (kaizen.yaml):

name: Email Improvement Agent Test
file_path: my_agent.py
description: This agent improves email drafts by making them more professional, clear, and well-structured. It transforms casual or poorly written emails into polished, business-appropriate communications.
agent:
  module: my_agent
  class: EmailAgent
  method: improve_email

evaluation:
  evaluation_targets:
    - name: quality
      source: return
      criteria: "The email should be professional, polite, and well-structured with proper salutations and closings"
      weight: 0.5
    - name: format
      source: return
      criteria: "The response should contain only the improved email content without any explanatory text, markdown formatting, or additional commentary. It should be a clean, standalone email draft ready for use."
      weight: 0.5

files_to_fix:
  - my_agent.py

steps:
  - name: Professional Email Improvement
    input:
      input: "hey boss, i need time off next week. thanks"
  
  - name: Edge Case - Empty Email
    input:
      input: ""
  
  - name: Edge Case - Very Informal Email
    input:
      input: "yo dude, can't make it to the meeting tomorrow. got stuff to do. sorry!"

TypeScript Version (Mastra)

Agent Code (my_agent.ts):

import { google } from '@ai-sdk/google';
import { Agent } from '@mastra/core/agent';

export const emailFixAgent = new Agent({
  name: 'Email Fix Agent',
  instructions: `You are an email assistant. Improve this email draft.`,
  model: google('gemini-2.5-flash-preview-05-20'),
});

Test Configuration (kaizen.yaml):

name: Email Improvement Agent Test
file_path: src/mastra/agents/email-agent.ts
language: typescript
description: This agent improves email drafts by making them more professional, clear, and well-structured. It transforms casual or poorly written emails into polished, business-appropriate communications.
agent:
  module: email-agent  # Just the file name without extension

evaluation:
  evaluation_targets:
    - name: quality
      source: return
      criteria: "The email should be professional, polite, and well-structured with proper salutations and closings"
      weight: 0.5
    - name: format
      source: return
      criteria: "The response should contain only the improved email content without any explanatory text, markdown formatting, or additional commentary. It should be a clean, standalone email draft ready for use."
      weight: 0.5

files_to_fix:
  - src/mastra/agents/email-agent.ts

settings:
  timeout: 180

steps:
  - name: Professional Email Improvement
    input:
      input: "hey boss, i need time off next week. thanks"
  
  - name: Edge Case - Very Informal Email
    input:
      input: "yo dude, can't make it to the meeting tomorrow. got stuff to do. sorry!"

Text Analysis Agent

A more complex agent that performs sentiment analysis and extracts key information from text.

Agent Code

Agent Code (agents/text_analyzer.py):

import google.generativeai as genai
import os
import json

class TextAnalyzer:
    def __init__(self):
        genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
        self.model = genai.GenerativeModel('gemini-2.5-flash-preview-05-20')
        self.system_prompt = """
        You are a text analysis expert. Analyze the given text and provide:
        1. A sentiment score between -1.0 (very negative) and 1.0 (very positive)
        2. Key phrases that capture the main points
        3. A structured analysis summary
        
        Return your response as a JSON object with these fields:
        - sentiment_score: float
        - key_phrases: list of strings
        - analysis_summary: string
        """
    
    def analyze_text(self, text_content):
        if not text_content.strip():
            return {
                "sentiment_score": 0.0,
                "key_phrases": [],
                "analysis_summary": "No text provided for analysis."
            }
        
        full_prompt = f"{self.system_prompt}\n\nText to analyze:\n{text_content}\n\nAnalysis:"
        response = self.model.generate_content(full_prompt)
        
        try:
            # Try to parse JSON response
            result = json.loads(response.text)
            return result
        except json.JSONDecodeError:
            # Fallback to structured text response
            return {
                "sentiment_score": 0.0,
                "key_phrases": [],
                "analysis_summary": response.text
            }
    
    def analyze_review(self, user_review, analysis_settings=None):
        """Analyze a structured user review object"""
        if analysis_settings is None:
            analysis_settings = {
                "include_sentiment": True,
                "extract_keywords": True,
                "detect_emotions": False
            }
        
        # Combine review text with metadata
        review_text = f"Review: {user_review.text}\nRating: {user_review.rating}/5\nCategory: {user_review.category}"
        
        result = self.analyze_text(review_text)
        result["review_quality"] = "high" if user_review.rating >= 4 else "medium"
        
        return result

User Review Class (agents/review_processor.py):

class UserReview:
    def __init__(self, text, rating, category, helpful_votes=0, verified_purchase=False):
        self.text = text
        self.rating = rating
        self.category = category
        self.helpful_votes = helpful_votes
        self.verified_purchase = verified_purchase

Test Configuration

Test Configuration (kaizen.yaml):

name: Text Analysis Agent Test Suite
agent_type: dynamic_region
file_path: agents/text_analyzer.py
description: |
  Test suite for the TextAnalyzer agent that processes and analyzes text content.
  
  This agent performs sentiment analysis, extracts key information, and provides
  structured analysis results. Tests cover various input types, edge cases, and
  expected output formats to ensure reliable performance.

agent:
  module: agents.text_analyzer
  class: TextAnalyzer
  method: analyze_text

evaluation:
  evaluation_targets:
    - name: sentiment_score
      source: variable
      criteria: "The sentiment_score must be a float between -1.0 and 1.0. Negative values indicate negative sentiment, positive values indicate positive sentiment. The score should accurately reflect the emotional tone of the input text."
      description: "Evaluates the accuracy of sentiment analysis output"
      weight: 0.4
    - name: key_phrases
      source: variable
      criteria: "The key_phrases should be a list of strings containing the most important phrases from the input text"
      description: "Checks if key phrase extraction is working correctly"
      weight: 0.3
    - name: analysis_quality
      source: return
      criteria: "The response should be well-structured, professional, and contain actionable insights"
      description: "Evaluates the overall quality and usefulness of the analysis"
      weight: 0.3

max_retries: 3

files_to_fix:
  - agents/text_analyzer.py
  - agents/prompts.py

referenced_files:
  - agents/prompts.py
  - utils/text_utils.py

steps:
  - name: Positive Review Analysis
    description: "Analyze a positive customer review"
    input:
      file_path: agents/text_analyzer.py
      method: analyze_text
      input: 
        - name: text_content
          type: string
          value: "This product exceeded my expectations! The quality is outstanding and the customer service was excellent. I would definitely recommend it to others."
          
      expected_output: 
        sentiment_score: 0.8
        key_phrases: ["exceeded expectations", "outstanding quality", "excellent customer service"]

  - name: Negative Feedback Analysis
    description: "Analyze negative customer feedback"
    input:
      file_path: agents/text_analyzer.py
      method: analyze_text
      input: 
        - name: text_content
          type: string
          value: "I'm very disappointed with this purchase. The product arrived damaged and the support team was unhelpful."
          
      expected_output: 
        sentiment_score: -0.7
        key_phrases: ["disappointed", "damaged product", "unhelpful support"]

  - name: Neutral Text Analysis
    description: "Analyze neutral or mixed sentiment text"
    input:
      file_path: agents/text_analyzer.py
      method: analyze_text
      input: 
        - name: text_content
          type: string
          value: "The product has both good and bad aspects. The design is nice but the price is high."
          
      expected_output: 
        sentiment_score: 0.0
        key_phrases: ["good aspects", "bad aspects", "nice design", "high price"]

  - name: Object Input Analysis
    description: "Analyze text using a structured user review object"
    input:
      file_path: agents/text_analyzer.py
      method: analyze_review
      input: 
        - name: user_review
          type: object
          class_path: agents.review_processor.UserReview
          args: 
            text: "This product exceeded my expectations! The quality is outstanding."
            rating: 5
            category: "electronics"
            helpful_votes: 12
            verified_purchase: true
        - name: analysis_settings
          type: dict
          value:
            include_sentiment: true
            extract_keywords: true
            detect_emotions: false
          
      expected_output: 
        sentiment_score: 0.9
        key_phrases: ["exceeded expectations", "outstanding quality", "excellent customer service"]
        review_quality: "high"

  - name: Empty Input Handling
    description: "Test how the agent handles empty or minimal input"
    input:
      file_path: agents/text_analyzer.py
      method: analyze_text
      input: 
        - name: text_content
          type: string
          value: ""
          
      expected_output: 
        sentiment_score: 0.0
        key_phrases: []

Code Review Agent

An agent that reviews code and provides feedback on quality, security, and best practices.

Agent Code

Agent Code (agents/code_reviewer.py):

import google.generativeai as genai
import os

class CodeReviewer:
    def __init__(self):
        genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
        self.model = genai.GenerativeModel('gemini-2.5-flash-preview-05-20')
        self.system_prompt = """
        You are a senior software engineer conducting code reviews. Analyze the provided code and provide:
        1. Code quality assessment (1-10 scale)
        2. Security concerns (if any)
        3. Performance considerations
        4. Specific improvement suggestions
        
        Be constructive and provide actionable feedback.
        """
    
    def review_code(self, code_content, language="python"):
        full_prompt = f"{self.system_prompt}\n\nLanguage: {language}\n\nCode to review:\n```{language}\n{code_content}\n```\n\nReview:"
        response = self.model.generate_content(full_prompt)
        return response.text
    
    def review_function(self, function_code, context=None):
        """Review a specific function with optional context"""
        context_info = f"\nContext: {context}" if context else ""
        full_prompt = f"{self.system_prompt}\n\nFunction to review:{context_info}\n\n```python\n{function_code}\n```\n\nReview:"
        response = self.model.generate_content(full_prompt)
        return response.text

Test Configuration

Test Configuration (kaizen.yaml):

name: Code Review Agent Test Suite
file_path: agents/code_reviewer.py
description: |
  Test suite for the CodeReviewer agent that analyzes code quality, security, and best practices.
  
  This agent provides comprehensive code reviews with actionable feedback for
  improving code quality, identifying security issues, and suggesting optimizations.

agent:
  module: agents.code_reviewer
  class: CodeReviewer
  method: review_code

evaluation:
  evaluation_targets:
    - name: review_quality
      source: return
      criteria: "The review should be comprehensive, constructive, and provide specific, actionable feedback. It should identify potential issues and suggest concrete improvements."
      weight: 0.4
    - name: technical_depth
      source: return
      criteria: "The review should demonstrate technical expertise and cover code quality, security, performance, and best practices relevant to the programming language."
      weight: 0.3
    - name: clarity
      source: return
      criteria: "The feedback should be clear, well-structured, and easy to understand. It should avoid overly technical jargon when possible and provide explanations for suggestions."
      weight: 0.3

files_to_fix:
  - agents/code_reviewer.py

steps:
  - name: Python Function Review
    description: "Review a Python function for quality and best practices"
    input:
      file_path: agents/code_reviewer.py
      method: review_code
      input: 
        - name: code_content
          type: string
          value: |
            def calculate_total(items):
                total = 0
                for item in items:
                    total += item.price
                return total
        - name: language
          type: string
          value: "python"

  - name: Security Vulnerability Review
    description: "Review code for potential security vulnerabilities"
    input:
      file_path: agents/code_reviewer.py
      method: review_code
      input: 
        - name: code_content
          type: string
          value: |
            def process_user_input(user_data):
                query = "SELECT * FROM users WHERE id = " + user_data
                return execute_query(query)
        - name: language
          type: string
          value: "python"

  - name: Performance Review
    description: "Review code for performance optimizations"
    input:
      file_path: agents/code_reviewer.py
      method: review_code
      input: 
        - name: code_content
          type: string
          value: |
            def find_duplicates(items):
                duplicates = []
                for i in range(len(items)):
                    for j in range(i + 1, len(items)):
                        if items[i] == items[j]:
                            duplicates.append(items[i])
                return duplicates
        - name: language
          type: string
          value: "python"

Chatbot Agent

A conversational agent that handles customer support queries.

Agent Code

Agent Code (agents/customer_support.py):

import google.generativeai as genai
import os
import json

class CustomerSupportBot:
    def __init__(self):
        genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
        self.model = genai.GenerativeModel('gemini-2.5-flash-preview-05-20')
        self.system_prompt = """
        You are a helpful customer support representative for TechCorp. 
        You help customers with product questions, troubleshooting, and general inquiries.
        
        Guidelines:
        - Be polite, professional, and helpful
        - Provide accurate information about our products
        - Escalate complex issues appropriately
        - Keep responses concise but informative
        - Always ask if there's anything else you can help with
        """
        self.conversation_history = []
    
    def respond(self, user_message, context=None):
        # Build conversation context
        conversation_context = ""
        if self.conversation_history:
            conversation_context = "\n\nPrevious conversation:\n"
            for msg in self.conversation_history[-3:]:  # Last 3 messages
                conversation_context += f"{msg['role']}: {msg['content']}\n"
        
        full_prompt = f"{self.system_prompt}{conversation_context}\n\nCustomer: {user_message}\n\nSupport Agent:"
        response = self.model.generate_content(full_prompt)
        
        # Update conversation history
        self.conversation_history.append({"role": "customer", "content": user_message})
        self.conversation_history.append({"role": "agent", "content": response.text})
        
        return response.text
    
    def reset_conversation(self):
        """Reset the conversation history"""
        self.conversation_history = []

Test Configuration

Test Configuration (kaizen.yaml):

name: Customer Support Bot Test Suite
file_path: agents/customer_support.py
description: |
  Test suite for the CustomerSupportBot that handles customer inquiries and support requests.
  
  This agent provides helpful, professional responses to customer questions about
  products, troubleshooting, and general support issues.

agent:
  module: agents.customer_support
  class: CustomerSupportBot
  method: respond

evaluation:
  evaluation_targets:
    - name: helpfulness
      source: return
      criteria: "The response should be helpful and directly address the customer's question or concern. It should provide relevant information or guidance."
      weight: 0.4
    - name: professionalism
      source: return
      criteria: "The response should be polite, professional, and maintain a helpful tone. It should reflect good customer service practices."
      weight: 0.3
    - name: accuracy
      source: return
      criteria: "The information provided should be accurate and appropriate for a customer support context. It should not contain false or misleading information."
      weight: 0.3

files_to_fix:
  - agents/customer_support.py

steps:
  - name: Product Inquiry
    description: "Handle a customer asking about product features"
    input:
      file_path: agents/customer_support.py
      method: respond
      input: 
        - name: user_message
          type: string
          value: "What are the main features of your premium plan?"

  - name: Technical Support
    description: "Handle a technical troubleshooting request"
    input:
      file_path: agents/customer_support.py
      method: respond
      input: 
        - name: user_message
          type: string
          value: "I can't log into my account. It says 'invalid credentials' but I'm sure my password is correct."

  - name: Billing Question
    description: "Handle a billing-related inquiry"
    input:
      file_path: agents/customer_support.py
      method: respond
      input: 
        - name: user_message
          type: string
          value: "I was charged twice this month. Can you help me get a refund?"

  - name: Conversation Continuity
    description: "Test conversation history and context"
    input:
      file_path: agents/customer_support.py
      method: respond
      input: 
        - name: user_message
          type: string
          value: "Thanks for the help. How do I contact you if I have more questions?"

Running the Examples

To run any of these examples:

Create the agent file with the provided code
Create the YAML configuration file
Set up your environment with the required API keys
Run the tests:

# Run with auto-fix
kaizen test-all --config kaizen.yaml --auto-fix --save-logs

# Run with PR creation (if GitHub is set up)
kaizen test-all --config kaizen.yaml --auto-fix --create-pr --repo your-username/your-repo-name

Customizing Examples

You can customize these examples by:

Modifying the prompts to match your specific use case
Adjusting evaluation criteria to focus on your priorities
Adding more test cases to cover additional scenarios
Changing input types to match your agent's expected inputs
Updating file paths to match your project structure

Best Practices from Examples

Clear Evaluation Criteria: Each example shows specific, measurable evaluation criteria
Edge Case Testing: Examples include tests for empty inputs, error conditions, and unusual scenarios
Descriptive Test Names: Test names clearly indicate what's being tested
Balanced Test Coverage: Mix of happy path and edge case tests
Realistic Inputs: Test inputs reflect real-world usage scenarios

For more examples and community contributions, check out our GitHub repository or join our Discord community.

Email Improvement Agent​

Python Version​

TypeScript Version (Mastra)​

Text Analysis Agent​

Agent Code​

Test Configuration​

Code Review Agent​

Agent Code​

Test Configuration​

Chatbot Agent​

Agent Code​

Test Configuration​

Running the Examples​

Customizing Examples​

Best Practices from Examples​

Email Improvement Agent

Python Version

TypeScript Version (Mastra)

Text Analysis Agent

Agent Code

Test Configuration

Code Review Agent

Agent Code

Test Configuration

Chatbot Agent

Agent Code

Test Configuration

Running the Examples

Customizing Examples

Best Practices from Examples