Research Guide

This guide covers scientific methodology for conducting rigorous research with SimAgents.

Research Philosophy

SimAgents is designed for studying emergent AI behavior in multi-agent environments. Key principles:

Reproducibility: Deterministic baseline experiments can be replicated with seed + configuration; LLM exploratory runs must be treated as non-deterministic
Observability: All state changes are logged and queryable
Comparability: Standardized metrics enable cross-study comparison
Minimal Imposition: System provides physics, not strategies

Designing Experiments

Experiment DSL

Define experiments in YAML and declare the execution profile explicitly:

name: "resource_scarcity_cooperation"
description: "Test cooperation emergence under resource scarcity"
seed: 12345
profile: llm_exploratory
benchmarkWorld: canonical_core

world:
  size: [100, 100]
  biomes:
    desert: 0.7
    plains: 0.2
    forest: 0.1

agents:
  - type: claude
    count: 5
  - type: gemini
    count: 5
  - type: baseline_random
    count: 5

duration: 1000  # ticks

metrics:
  - gini
  - cooperation_index
  - survival_rate
  - clustering_coefficient

snapshots:
  interval: 100  # Save state every 100 ticks

shocks:
  - tick: 500
    type: economic
    params:
      currencyChange: -0.5  # 50% currency destruction

Running Experiments

cd apps/server

# Validate configuration
bun run src/experiments/runner.ts --config experiments/my-experiment.yaml --dry-run

# Run experiment
bun run src/experiments/runner.ts --config experiments/my-experiment.yaml

# Run with custom output
bun run src/experiments/runner.ts --config experiments/my-experiment.yaml --output results/

Batch Experiments

Run multiple seeds before making inferential claims:

bun run src/experiments/runner.ts --config experiments/my-experiment.yaml --runs 5 --output results/

Baseline Agents

For valid hypothesis testing, compare LLM agents against baselines:

Random Walk (Null Hypothesis)

agents:
  - type: baseline_random
    count: 10

Actions chosen uniformly at random. Establishes minimum performance baseline.

Rule-Based (Classical AI)

agents:
  - type: baseline_rule
    count: 10

Hardcoded heuristics: eat when hungry, sleep when tired, gather when near resources. Tests if LLMs outperform simple rules.

Q-Learning (Reinforcement Learning)

agents:
  - type: baseline_qlearning
    count: 10

Tabular Q-learning with survival reward. Tests LLM vs traditional RL.

Sugarscape Replication

Classic ABM comparison. Configure world to match Sugarscape parameters and compare agent behavior.

Cooperation Incentives System

SimAgents implements Sugarscape-inspired cooperation mechanics that create genuine incentives for group behavior without imposing strategies.

Cooperation Bonuses

Action	Bonus	Solo Penalty	Description
Gather	+25%/agent (max +75%)	-50%	Agents at same location boost each other's efficiency
Forage	+15%/agent (max +45%)	-40%	Nearby agents improve foraging success
Public Work	+20%/worker (max +60%)	-50%	Working together increases pay

Group Gather (Rich Spawns)

Resource spawns with 12+ units require group cooperation:

Solo agents can only extract 2 units maximum
2+ agents unlock full harvest with +50% bonus
Creates natural dependency without forcing interaction

Trust-Based Pricing

Shelter transactions use trust scores:

High trust (>+100): -10% discount
Low trust (<-100): +10% penalty
Rewards agents who build positive relationships

Trade Bonuses

Trading with trusted partners provides advantages:

+20% items received when trust >20
+5% per prior interaction (max +25% loyalty bonus)
Trust gains multiply at higher relationship levels

Item Spoilage

Perishable items create urgency for trade/consumption:

Food/Water: -1% per tick
Medicine: -0.5% per tick
Battery: -0.2% per tick
Materials/Tools: No decay

Research Implications

These mechanics enable experiments on:

Cooperation emergence: Does the bonus system drive grouping?
Trust network formation: How quickly do agents build relationships?
Solo vs cooperative strategies: Which LLM types favor which approach?
Resource pooling: Do agents develop sharing conventions?

Metrics

Economic Metrics

Metric	Formula	Interpretation
Gini Coefficient	Standard Gini on agent balances	0 = perfect equality, 1 = one agent has everything
Wealth Variance	σ² of agent balances	Higher = more inequality
Trade Volume	Successful trades per tick	Higher = more economic activity
Market Efficiency	Price convergence over time	Lower spread = more efficient

Metric	Formula	Interpretation
Cooperation Index	f(trades, trust, clustering)	0-1, higher = more cooperation
Clustering Coefficient	Spatial agent grouping	Higher = agents form groups
Trust Network Density	Edges / possible edges	Higher = more relationships
Conflict Rate	Harm/steal actions per tick	Higher = more conflict

Emergence Metrics

Metric	Formula	Interpretation
Emergence Index	(systemComplexity - Σ agentComplexity) / systemComplexity	Higher = more emergent behavior
Role Crystallization	Consistency of agent roles over time	Higher = stable social roles
Norm Emergence	Consistency of agent responses to scenarios	Higher = shared behavioral norms

Survival Metrics

Metric	Formula	Interpretation
Survival Rate	Alive agents / initial agents	By LLM type
Mean Lifetime	Average ticks survived	Longer = better strategies
Death Causes	Starvation vs exhaustion vs harm	Strategy failure mode

Reproducibility

Seed Management

Every random operation uses a seeded PRNG:

// In experiment config
seed: 12345

// Affects:
// - Resource spawn placement
// - Initial agent positions
// - Action timing variations
// - Biome generation

State Snapshots

Capture complete world state:

snapshots:
  interval: 100
  include:
    - agents
    - resources
    - relationships
    - events

Snapshots stored in results/{experiment}/snapshots/tick_{N}.json.

Event Sourcing

All state changes recorded:

SELECT * FROM events
WHERE tick BETWEEN 100 AND 200
ORDER BY timestamp;

Replay any moment:

curl http://localhost:3000/api/replay/tick/150

Statistical Analysis

Recommended Approach

Multiple Seeds: Run 10+ seeds per condition
Burn-in Period: Discard first 100 ticks (initialization effects)
Steady-State Analysis: Focus on ticks 100-900
Final State Comparison: Compare end states across conditions

Example R Analysis

library(tidyverse)

# Load results
results <- read_csv("results/experiment/metrics.csv")

# Compare Gini by LLM type
results %>%
  filter(tick > 100) %>%
  group_by(llm_type) %>%
  summarise(
    mean_gini = mean(gini),
    sd_gini = sd(gini),
    n = n()
  ) %>%
  mutate(se = sd_gini / sqrt(n))

# Statistical test
t.test(gini ~ llm_type, data = results %>% filter(llm_type %in% c("claude", "gemini")))

Example Python Analysis

import pandas as pd
from scipy import stats

# Load results
results = pd.read_csv("results/experiment/metrics.csv")

# Compare cooperation by condition
claude = results[results.llm_type == "claude"].cooperation_index
gemini = results[results.llm_type == "gemini"].cooperation_index

stat, pvalue = stats.mannwhitneyu(claude, gemini)
print(f"Mann-Whitney U: {stat}, p={pvalue:.4f}")

Shock Injection

Test system resilience with controlled perturbations:

Economic Shocks

shocks:
  - tick: 500
    type: economic
    params:
      currencyChange: -0.5      # Destroy 50% of currency
      # OR
      inflationRate: 0.1        # 10% inflation per tick

Natural Disasters

shocks:
  - tick: 500
    type: disaster
    params:
      type: drought             # Reduces food regen
      severity: 0.7             # 70% reduction
      duration: 100             # For 100 ticks
      region: [40, 40, 60, 60]  # Affected area

Rule Modifications

shocks:
  - tick: 500
    type: rule
    params:
      modify: gather_rate
      factor: 0.5               # Half gathering efficiency

Publishing Research

Required Disclosures

When publishing SimAgents research, include:

Experiment Configuration: Full YAML or JSON config
Seeds Used: All random seeds
Software Version: SimAgents commit hash or version
LLM Versions: Specific model versions (e.g., "claude-3-haiku-20240307")
Metrics Definitions: Any custom metrics

Export complete experiment data:

# Export all events
curl http://localhost:3000/api/replay/events?from=0&to=1000 > events.json

# Export snapshots
cp results/experiment/snapshots/*.json ./data/

# Export metrics
cp results/experiment/metrics.csv ./data/

Suggested Citation

@software{simagents2026,
  title = {SimAgents: A Platform for Studying Emergent AI Behavior},
  author = {AgentAuri Team},
  year = {2026},
  url = {https://github.com/agentauri/simagents.io}
}

Scientific Assumptions

SimAgents makes explicit assumptions that should be acknowledged:

Discrete Time: World updates in ticks, not continuous time
Grid Space: 2D discrete grid, not continuous space
Perfect Observation: Agents see all entities within visibility radius
Synchronous Decisions: All agents decide simultaneously per tick
No Physical Embodiment: Agents are points, not physical bodies

See Scientific Framework for detailed assumption analysis and limitations.

Known Limitations

LLM Stochasticity: Even with seeds, LLM responses vary
API Latency: External LLM calls add timing variability
Scale Limits: Currently tested up to 50 agents
No Long-term Memory: Agent memory is per-session

Acknowledge these limitations in research publications.

Research Philosophy​

Designing Experiments​

Experiment DSL​

Running Experiments​

Batch Experiments​

Baseline Agents​

Random Walk (Null Hypothesis)​

Rule-Based (Classical AI)​

Q-Learning (Reinforcement Learning)​

Sugarscape Replication​

Cooperation Incentives System​

Cooperation Bonuses​

Group Gather (Rich Spawns)​

Trust-Based Pricing​

Trade Bonuses​

Item Spoilage​

Research Implications​

Metrics​

Economic Metrics​

Social Metrics​

Emergence Metrics​

Survival Metrics​

Reproducibility​

Seed Management​

State Snapshots​

Event Sourcing​

Statistical Analysis​

Recommended Approach​

Example R Analysis​

Example Python Analysis​

Shock Injection​

Economic Shocks​

Natural Disasters​

Rule Modifications​

Publishing Research​

Required Disclosures​

Data Sharing​

Suggested Citation​

Scientific Assumptions​

Known Limitations​

Further Reading​