Building Production-Ready AI Guilds with Claude: A Test-Driven Approach
How I used Claude to generate the entire Iterative Studio guild and the methodology that made it work.
The Challenge
Building multi-agent AI systems is complex. The Rustic AI framework provides powerful primitives — agents, routing slips, transforms, dependency injection — but wiring them together correctly requires precision. A single misplaced JSONata expression can break an entire workflow.
When I set out to build the Iterative Studio guild (original repo: https://github.com/ryoiki-tokuiten/Iterative-Contextual-Refinements), a multi-mode AI system with 5 operational modes, 16 agents, and complex routing logic, I decided to do it entirely with Claude. The result? A fully functional guild with:
- 5 operational modes: Refine, Deepthink, Adaptive, Agentic, and Contextual
- 16 specialized agents coordinating through routing slips
- Complex JSONata transforms for content-based routing
- Comprehensive test coverage validating every transform and route
Here's the methodology that made it work.
The Core Insight: Transforms and Routes Are the Failure Points
After several failed attempts at generating guilds with AI, I noticed a pattern. The generated code would look correct, but when run:
- JSONata expressions would fail silently or return unexpected results
- Routes would send messages to the wrong topics
- Conditional logic in transforms would have subtle bugs
- Field accessors would reference non-existent paths
The problem wasn't the agent definitions or the overall architecture, but the JSONata transforms and routing logic.
JSONata is powerful but unforgiving. A missing $ prefix, an incorrect path like $.payload.message.content instead of $.payload.choices[0].message.content, or a malformed conditional can break everything.
The Solution: Test-Driven Guild Generation
I developed a three-phase approach that dramatically improved success rates:
Phase 1: Reuse Existing Agents
Instead of asking Claude to create new agent classes, I instructed it to reuse existing agents from the Rustic AI ecosystem:
"Build the guild using existing agents from the system. We have:
- LLMAgent for simple LLM calls
- ReActAgent for tool-using agents
- AggregatingAgent for collecting multiple responses
- UserProxyAgent for user interaction
Do NOT create new agent classes unless absolutely necessary."
This constraint has multiple benefits:
- Reduced complexity: No new Python code to debug
- Proven components: Existing agents are already tested
- Faster iteration: Focus on orchestration, not implementation
- Better maintainability: Leverages framework updates automatically
For Iterative Studio, all 16 agents use just 3 agent classes (see the full guild spec):
LLMAgent(12 agents) — Mode Controller, Feature Suggesters, Strategy Generator, etc.ReActAgent(2 agents) — Adaptive Deepthink and Agentic agents with toolsAggregatingAgent(1 agent) — Refine Aggregator for parallel responses
Phase 2: Generate the Guild Spec with Explicit Transform Testing
When asking Claude to generate the guild.json, I included a critical instruction:
"For every transform and JSONata expression in the routes:
1. Write a unit test that validates the transform independently
2. Import the transform directly from the guild.json file
3. Test with realistic mock data matching actual message formats
4. Verify both the happy path and edge cases"
This resulted in tests like TestModeControllerTransformIntegration (see test_iterative_studio_e2e.py) that:
def test_mode_controller_transformer_extracts_user_messages(self):
"""Test that the Mode Controller transformer correctly extracts user messages."""
# Load the actual handler from guild.json
guild_spec = load_guild_spec()
mode_controller_step = next(
step for step in guild_spec.routes.steps
if step.agent and step.agent.name == "Mode Controller"
)
# Create transformer from the actual handler (see FunctionalTransformer in message.py)
transformer = FunctionalTransformer(
style=mode_controller_step.transformer.style,
handler=mode_controller_step.transformer.handler,
)
# Test with realistic payload
payload = {
"choices": [{"message": {"content": "agentic", "role": "assistant"}}],
"input_messages": [
{"content": "system prompt", "role": "system"},
{"content": "user message", "role": "user"},
],
}
result = transformer.transform(origin=origin, ...)
assert result.topics == "AGENTIC"
assert result.payload["messages"][0]["role"] == "user"
The key insight: test the actual JSONata from the guild.json, not a copy. This ensures:
- No drift between tested and deployed transforms
- Immediate feedback when transforms are modified
- Tests serve as documentation for transform behavior
Phase 3: Run Tests and Fix Iteratively
With tests in place, the workflow becomes:
- Generate guild.json with Claude
- Run
pytest showcase/tests/iterative_studio/ - Share failures with Claude
- Claude fixes the specific transforms
- Repeat until green
This iterative loop is fast because:
- Failures are isolated to specific transforms
- Error messages pinpoint the exact JSONata expression
- Claude can see what the transform expected vs. received
The Test Structure That Works
After iteration, I settled on this test organization (see complete tests in showcase/tests/iterative_studio/):
1. Guild Spec Validation Tests
class TestGuildSpecLoading:
def test_guild_json_exists(self): ...
def test_guild_json_is_valid_json(self): ...
def test_guild_spec_can_be_parsed(self): ...
def test_all_agents_have_required_fields(self): ...
def test_dependency_map_is_valid(self): ...
These catch structural issues before anything runs.
2. Toolset Tests
class TestArxivToolset:
def test_arxiv_toolset_import(self): ...
def test_arxiv_toolset_has_toolspecs(self): ...
def test_arxiv_toolset_parameter_class(self): ...
class TestAgenticToolset:
def test_agentic_toolset_write_and_read_file(self): ...
def test_agentic_toolset_path_traversal_blocked(self): ...
If you create custom toolsets (extending the base Toolset class), test them in isolation first.
3. Routing Configuration Tests
class TestRoutingConfiguration:
def test_routing_steps_cover_all_modes(self): ...
def test_mode_controller_routes_to_modes(self): ...
def test_refine_mode_parallel_routing(self): ...
Verify the routing topology before testing transforms.
4. Transform Accuracy Tests (The Critical Ones)
class TestTransformAccuracy:
def test_mode_controller_transform_maps_all_modes(self): ...
def test_refine_aggregator_transform_combines_messages(self): ...
def test_critique_agent_conditional_routing(self): ...
These test that JSONata expressions produce correct outputs.
5. Transform Integration Tests
class TestModeControllerTransformIntegration:
def test_mode_controller_routes_to_correct_modes(self):
"""Test each mode routes to the correct topic."""
mode_to_topic = {
"refine": ["REFINE_NOVELTY", "REFINE_QUALITY"],
"deepthink": "DT_STRATEGY",
"adaptive": "ADAPTIVE_DT",
"agentic": "AGENTIC",
"contextual": "CTX_MAIN",
}
for mode, expected_topic in mode_to_topic.items():
result = transformer.transform(...)
assert result.topics == expected_topic
Test transforms with real FunctionalTransformer instances.
6. End-to-End Tests
class TestEndToEndFlow:
def test_guild_starts_and_stops_cleanly(self): ...
def test_all_agents_are_instantiated(self): ...
def test_routing_slip_is_attached_to_guild(self): ...
Verify the guild actually launches and runs.
Best Practices for AI-Assisted Guild Development
1. Provide Clear Message Format Examples
Tell Claude exactly what messages look like:
"The LLMAgent produces ChatCompletionResponse with this structure:
{
'choices': [{'message': {'content': '...', 'role': 'assistant'}}],
'input_messages': [original messages],
'usage': {...}
}
Your transforms must access $.payload.choices[0].message.content"
2. Be Explicit About JSONata Syntax
Common pitfalls to warn Claude about:
"JSONata notes:
- Use $variable for local bindings
- String concatenation uses & not +
- Access nested fields with $.payload.field not payload.field
- Ternary is: condition ? true_value : false_value
- Use $uppercase() for case-insensitive comparison"
3. Include Context Tracking Requirements
"When routing between agents, preserve context:
- Include 'context': {'mode': $mode} in route output
- Track iteration counts in context for loops
- Pass through session_state for downstream processors"
4. Request Defensive Transforms
"Add fallback handling to transforms:
- Default to 'CTX_MAIN' if mode is unrecognized
- Use $exists() to check for optional fields
- Handle both string and array topic formats"
5. Validate Import Paths
Ask Claude to verify all class paths are importable:
def test_agent_classes_are_importable(self):
for agent_data in spec_data["agents"]:
class_name = agent_data["class_name"]
agent_class = get_class_from_name(class_name)
assert agent_class is not None
The Iterative Studio Result
Using this methodology, Claude generated a fully functional guild with:
Complex Mode Routing
$route := {
'refine': ['REFINE_NOVELTY', 'REFINE_QUALITY'],
'deepthink': 'DT_STRATEGY',
'adaptive': 'ADAPTIVE_DT',
'agentic': 'AGENTIC',
'contextual': 'CTX_MAIN'
};
$topics := $lookup($route, $mode);
{'topics': $topics ? $topics : 'CTX_MAIN', ...}
Conditional Iteration Logic
$needs_revision and $iteration < 2
? {'topics': 'DT_REFINEMENT', 'context': {'dt_iteration': $iteration + 1}, ...}
: {'topics': 'DT_REDTEAM', ...}
Message Aggregation
$combined := $join(
$map($msgs, function($m) { $m.data.choices[0].message.content }),
'\n\n---\n\n'
);
All validated by 50+ tests that run in seconds.
Quick Setup Checklist
For your next AI-generated guild:
- Define your agent palette — List existing agents you'll use
- Map message formats — Document payload structures for each agent
- Sketch the routing topology — Draw which agents connect to which
- Write test scaffolds first — Create empty test classes for each component
- Generate with Claude — Include format docs and test requirements
- Run tests, share failures — Let Claude iterate on fixes
- Add edge case tests — Cover error paths and defaults
- Integration test — Verify the full guild launches
Conclusion
Building complex multi-agent systems with AI is not just possible, it's efficient when you have the right methodology. The key insights:
The Iterative Studio guild has 5 modes, 16 agents, and complex routing and it was generated entirely by Claude using this approach in just 2 hours!! This is another massive advantage I see with using Rustic AI: we can build a working multi-agent system running end-to-end in 2 hours with full production-grade infrastructure already handled. The tests caught every JSONata bug before deployment, and the guild has been running reliably ever since.
The future of AI development isn't writing every line yourself, it's knowing how to guide AI tools effectively and validate their output systematically.
The Iterative Studio guild is available in showcase/apps/iterative_studio/ with full test coverage in showcase/tests/iterative_studio/.