In AI — Mar 12, 2026

Building Production-Ready AI Guilds with Claude: A Test-Driven Approach

By: Nihal Srivastava 5 min read

How I used Claude to generate the entire Iterative Studio guild and the methodology that made it work.

The Challenge

Building multi-agent AI systems is complex. The Rustic AI framework provides powerful primitives — agents, routing slips, transforms, dependency injection — but wiring them together correctly requires precision. A single misplaced JSONata expression can break an entire workflow.

When I set out to build the Iterative Studio guild (original repo: https://github.com/ryoiki-tokuiten/Iterative-Contextual-Refinements), a multi-mode AI system with 5 operational modes, 16 agents, and complex routing logic, I decided to do it entirely with Claude. The result? A fully functional guild with:

5 operational modes: Refine, Deepthink, Adaptive, Agentic, and Contextual
16 specialized agents coordinating through routing slips
Complex JSONata transforms for content-based routing
Comprehensive test coverage validating every transform and route

Here's the methodology that made it work.

The Core Insight: Transforms and Routes Are the Failure Points

After several failed attempts at generating guilds with AI, I noticed a pattern. The generated code would look correct, but when run:

JSONata expressions would fail silently or return unexpected results
Routes would send messages to the wrong topics
Conditional logic in transforms would have subtle bugs
Field accessors would reference non-existent paths

The problem wasn't the agent definitions or the overall architecture, but the JSONata transforms and routing logic.

JSONata is powerful but unforgiving. A missing $ prefix, an incorrect path like $.payload.message.content instead of $.payload.choices[0].message.content, or a malformed conditional can break everything.

The Solution: Test-Driven Guild Generation

I developed a three-phase approach that dramatically improved success rates:

Phase 1: Reuse Existing Agents

Instead of asking Claude to create new agent classes, I instructed it to reuse existing agents from the Rustic AI ecosystem:

"Build the guild using existing agents from the system. We have:
- LLMAgent for simple LLM calls
- ReActAgent for tool-using agents
- AggregatingAgent for collecting multiple responses
- UserProxyAgent for user interaction

Do NOT create new agent classes unless absolutely necessary."

This constraint has multiple benefits:

Reduced complexity: No new Python code to debug
Proven components: Existing agents are already tested
Faster iteration: Focus on orchestration, not implementation
Better maintainability: Leverages framework updates automatically

For Iterative Studio, all 16 agents use just 3 agent classes (see the full guild spec):

LLMAgent (12 agents) — Mode Controller, Feature Suggesters, Strategy Generator, etc.
ReActAgent (2 agents) — Adaptive Deepthink and Agentic agents with tools
AggregatingAgent (1 agent) — Refine Aggregator for parallel responses

Phase 2: Generate the Guild Spec with Explicit Transform Testing

When asking Claude to generate the guild.json, I included a critical instruction:

"For every transform and JSONata expression in the routes:
1. Write a unit test that validates the transform independently
2. Import the transform directly from the guild.json file
3. Test with realistic mock data matching actual message formats
4. Verify both the happy path and edge cases"

This resulted in tests like TestModeControllerTransformIntegration (see test_iterative_studio_e2e.py) that:

def test_mode_controller_transformer_extracts_user_messages(self):
    """Test that the Mode Controller transformer correctly extracts user messages."""

    # Load the actual handler from guild.json
    guild_spec = load_guild_spec()
    mode_controller_step = next(
        step for step in guild_spec.routes.steps
        if step.agent and step.agent.name == "Mode Controller"
    )

    # Create transformer from the actual handler (see FunctionalTransformer in message.py)
    transformer = FunctionalTransformer(
        style=mode_controller_step.transformer.style,
        handler=mode_controller_step.transformer.handler,
    )

    # Test with realistic payload
    payload = {
        "choices": [{"message": {"content": "agentic", "role": "assistant"}}],
        "input_messages": [
            {"content": "system prompt", "role": "system"},
            {"content": "user message", "role": "user"},
        ],
    }

    result = transformer.transform(origin=origin, ...)

    assert result.topics == "AGENTIC"
    assert result.payload["messages"][0]["role"] == "user"

The key insight: test the actual JSONata from the guild.json, not a copy. This ensures:

No drift between tested and deployed transforms
Immediate feedback when transforms are modified
Tests serve as documentation for transform behavior

Phase 3: Run Tests and Fix Iteratively

With tests in place, the workflow becomes:

Generate guild.json with Claude
Run pytest showcase/tests/iterative_studio/
Share failures with Claude
Claude fixes the specific transforms
Repeat until green

This iterative loop is fast because:

Failures are isolated to specific transforms
Error messages pinpoint the exact JSONata expression
Claude can see what the transform expected vs. received

The Test Structure That Works

After iteration, I settled on this test organization (see complete tests in showcase/tests/iterative_studio/):

1. Guild Spec Validation Tests

class TestGuildSpecLoading:
    def test_guild_json_exists(self): ...
    def test_guild_json_is_valid_json(self): ...
    def test_guild_spec_can_be_parsed(self): ...
    def test_all_agents_have_required_fields(self): ...
    def test_dependency_map_is_valid(self): ...

These catch structural issues before anything runs.

2. Toolset Tests

class TestArxivToolset:
    def test_arxiv_toolset_import(self): ...
    def test_arxiv_toolset_has_toolspecs(self): ...
    def test_arxiv_toolset_parameter_class(self): ...

class TestAgenticToolset:
    def test_agentic_toolset_write_and_read_file(self): ...
    def test_agentic_toolset_path_traversal_blocked(self): ...

If you create custom toolsets (extending the base Toolset class), test them in isolation first.

3. Routing Configuration Tests

class TestRoutingConfiguration:
    def test_routing_steps_cover_all_modes(self): ...
    def test_mode_controller_routes_to_modes(self): ...
    def test_refine_mode_parallel_routing(self): ...

Verify the routing topology before testing transforms.

4. Transform Accuracy Tests (The Critical Ones)

class TestTransformAccuracy:
    def test_mode_controller_transform_maps_all_modes(self): ...
    def test_refine_aggregator_transform_combines_messages(self): ...
    def test_critique_agent_conditional_routing(self): ...

These test that JSONata expressions produce correct outputs.

5. Transform Integration Tests

class TestModeControllerTransformIntegration:
    def test_mode_controller_routes_to_correct_modes(self):
        """Test each mode routes to the correct topic."""
        mode_to_topic = {
            "refine": ["REFINE_NOVELTY", "REFINE_QUALITY"],
            "deepthink": "DT_STRATEGY",
            "adaptive": "ADAPTIVE_DT",
            "agentic": "AGENTIC",
            "contextual": "CTX_MAIN",
        }

        for mode, expected_topic in mode_to_topic.items():
            result = transformer.transform(...)
            assert result.topics == expected_topic

Test transforms with real FunctionalTransformer instances.

6. End-to-End Tests

class TestEndToEndFlow:
    def test_guild_starts_and_stops_cleanly(self): ...
    def test_all_agents_are_instantiated(self): ...
    def test_routing_slip_is_attached_to_guild(self): ...

Verify the guild actually launches and runs.

Best Practices for AI-Assisted Guild Development

1. Provide Clear Message Format Examples

Tell Claude exactly what messages look like:

"The LLMAgent produces ChatCompletionResponse with this structure:
{
  'choices': [{'message': {'content': '...', 'role': 'assistant'}}],
  'input_messages': [original messages],
  'usage': {...}
}

Your transforms must access $.payload.choices[0].message.content"

2. Be Explicit About JSONata Syntax

Common pitfalls to warn Claude about:

"JSONata notes:
- Use $variable for local bindings
- String concatenation uses & not +
- Access nested fields with $.payload.field not payload.field
- Ternary is: condition ? true_value : false_value
- Use $uppercase() for case-insensitive comparison"

3. Include Context Tracking Requirements

"When routing between agents, preserve context:
- Include 'context': {'mode': $mode} in route output
- Track iteration counts in context for loops
- Pass through session_state for downstream processors"

4. Request Defensive Transforms

"Add fallback handling to transforms:
- Default to 'CTX_MAIN' if mode is unrecognized
- Use $exists() to check for optional fields
- Handle both string and array topic formats"

5. Validate Import Paths

Ask Claude to verify all class paths are importable:

def test_agent_classes_are_importable(self):
    for agent_data in spec_data["agents"]:
        class_name = agent_data["class_name"]
        agent_class = get_class_from_name(class_name)
        assert agent_class is not None

The Iterative Studio Result

Using this methodology, Claude generated a fully functional guild with:

Complex Mode Routing

$route := {
  'refine': ['REFINE_NOVELTY', 'REFINE_QUALITY'],
  'deepthink': 'DT_STRATEGY',
  'adaptive': 'ADAPTIVE_DT',
  'agentic': 'AGENTIC',
  'contextual': 'CTX_MAIN'
};
$topics := $lookup($route, $mode);
{'topics': $topics ? $topics : 'CTX_MAIN', ...}

Conditional Iteration Logic

$needs_revision and $iteration < 2
  ? {'topics': 'DT_REFINEMENT', 'context': {'dt_iteration': $iteration + 1}, ...}
  : {'topics': 'DT_REDTEAM', ...}

Message Aggregation

$combined := $join(
  $map($msgs, function($m) { $m.data.choices[0].message.content }),
  '\n\n---\n\n'
);

All validated by 50+ tests that run in seconds.

Quick Setup Checklist

For your next AI-generated guild:

Define your agent palette — List existing agents you'll use
Map message formats — Document payload structures for each agent
Sketch the routing topology — Draw which agents connect to which
Write test scaffolds first — Create empty test classes for each component
Generate with Claude — Include format docs and test requirements
Run tests, share failures — Let Claude iterate on fixes
Add edge case tests — Cover error paths and defaults
Integration test — Verify the full guild launches

Conclusion

Building complex multi-agent systems with AI is not just possible, it's efficient when you have the right methodology. The key insights:

The Iterative Studio guild has 5 modes, 16 agents, and complex routing and it was generated entirely by Claude using this approach in just 2 hours!! This is another massive advantage I see with using Rustic AI: we can build a working multi-agent system running end-to-end in 2 hours with full production-grade infrastructure already handled. The tests caught every JSONata bug before deployment, and the guild has been running reliably ever since.

The future of AI development isn't writing every line yourself, it's knowing how to guide AI tools effectively and validate their output systematically.

The Iterative Studio guild is available in showcase/apps/iterative_studio/ with full test coverage in showcase/tests/iterative_studio/.