Skip to main content

Research Guide

This guide covers scientific methodology for conducting rigorous research with SimAgents.

Research Philosophy

SimAgents is designed for studying emergent AI behavior in multi-agent environments. Key principles:

  1. Reproducibility: Deterministic baseline experiments can be replicated with seed + configuration; LLM exploratory runs must be treated as non-deterministic
  2. Observability: All state changes are logged and queryable
  3. Comparability: Standardized metrics enable cross-study comparison
  4. Minimal Imposition: System provides physics, not strategies

Designing Experiments

Experiment DSL

Define experiments in YAML and declare the execution profile explicitly:

name: "resource_scarcity_cooperation"
description: "Test cooperation emergence under resource scarcity"
seed: 12345
profile: llm_exploratory
benchmarkWorld: canonical_core

world:
size: [100, 100]
biomes:
desert: 0.7
plains: 0.2
forest: 0.1

agents:
- type: claude
count: 5
- type: gemini
count: 5
- type: baseline_random
count: 5

duration: 1000 # ticks

metrics:
- gini
- cooperation_index
- survival_rate
- clustering_coefficient

snapshots:
interval: 100 # Save state every 100 ticks

shocks:
- tick: 500
type: economic
params:
currencyChange: -0.5 # 50% currency destruction

Running Experiments

cd apps/server

# Validate configuration
bun run src/experiments/runner.ts --config experiments/my-experiment.yaml --dry-run

# Run experiment
bun run src/experiments/runner.ts --config experiments/my-experiment.yaml

# Run with custom output
bun run src/experiments/runner.ts --config experiments/my-experiment.yaml --output results/

Batch Experiments

Run multiple seeds before making inferential claims:

bun run src/experiments/runner.ts --config experiments/my-experiment.yaml --runs 5 --output results/

Baseline Agents

For valid hypothesis testing, compare LLM agents against baselines:

Random Walk (Null Hypothesis)

agents:
- type: baseline_random
count: 10

Actions chosen uniformly at random. Establishes minimum performance baseline.

Rule-Based (Classical AI)

agents:
- type: baseline_rule
count: 10

Hardcoded heuristics: eat when hungry, sleep when tired, gather when near resources. Tests if LLMs outperform simple rules.

Q-Learning (Reinforcement Learning)

agents:
- type: baseline_qlearning
count: 10

Tabular Q-learning with survival reward. Tests LLM vs traditional RL.

Sugarscape Replication

Classic ABM comparison. Configure world to match Sugarscape parameters and compare agent behavior.


Cooperation Incentives System

SimAgents implements Sugarscape-inspired cooperation mechanics that create genuine incentives for group behavior without imposing strategies.

Cooperation Bonuses

ActionBonusSolo PenaltyDescription
Gather+25%/agent (max +75%)-50%Agents at same location boost each other's efficiency
Forage+15%/agent (max +45%)-40%Nearby agents improve foraging success
Public Work+20%/worker (max +60%)-50%Working together increases pay

Group Gather (Rich Spawns)

Resource spawns with 12+ units require group cooperation:

  • Solo agents can only extract 2 units maximum
  • 2+ agents unlock full harvest with +50% bonus
  • Creates natural dependency without forcing interaction

Trust-Based Pricing

Shelter transactions use trust scores:

  • High trust (>+100): -10% discount
  • Low trust (<-100): +10% penalty
  • Rewards agents who build positive relationships

Trade Bonuses

Trading with trusted partners provides advantages:

  • +20% items received when trust >20
  • +5% per prior interaction (max +25% loyalty bonus)
  • Trust gains multiply at higher relationship levels

Item Spoilage

Perishable items create urgency for trade/consumption:

  • Food/Water: -1% per tick
  • Medicine: -0.5% per tick
  • Battery: -0.2% per tick
  • Materials/Tools: No decay

Research Implications

These mechanics enable experiments on:

  • Cooperation emergence: Does the bonus system drive grouping?
  • Trust network formation: How quickly do agents build relationships?
  • Solo vs cooperative strategies: Which LLM types favor which approach?
  • Resource pooling: Do agents develop sharing conventions?

Metrics

Economic Metrics

MetricFormulaInterpretation
Gini CoefficientStandard Gini on agent balances0 = perfect equality, 1 = one agent has everything
Wealth Varianceσ² of agent balancesHigher = more inequality
Trade VolumeSuccessful trades per tickHigher = more economic activity
Market EfficiencyPrice convergence over timeLower spread = more efficient

Social Metrics

MetricFormulaInterpretation
Cooperation Indexf(trades, trust, clustering)0-1, higher = more cooperation
Clustering CoefficientSpatial agent groupingHigher = agents form groups
Trust Network DensityEdges / possible edgesHigher = more relationships
Conflict RateHarm/steal actions per tickHigher = more conflict

Emergence Metrics

MetricFormulaInterpretation
Emergence Index(systemComplexity - Σ agentComplexity) / systemComplexityHigher = more emergent behavior
Role CrystallizationConsistency of agent roles over timeHigher = stable social roles
Norm EmergenceConsistency of agent responses to scenariosHigher = shared behavioral norms

Survival Metrics

MetricFormulaInterpretation
Survival RateAlive agents / initial agentsBy LLM type
Mean LifetimeAverage ticks survivedLonger = better strategies
Death CausesStarvation vs exhaustion vs harmStrategy failure mode

Reproducibility

Seed Management

Every random operation uses a seeded PRNG:

// In experiment config
seed: 12345

// Affects:
// - Resource spawn placement
// - Initial agent positions
// - Action timing variations
// - Biome generation

State Snapshots

Capture complete world state:

snapshots:
interval: 100
include:
- agents
- resources
- relationships
- events

Snapshots stored in results/{experiment}/snapshots/tick_{N}.json.

Event Sourcing

All state changes recorded:

SELECT * FROM events
WHERE tick BETWEEN 100 AND 200
ORDER BY timestamp;

Replay any moment:

curl http://localhost:3000/api/replay/tick/150

Statistical Analysis

  1. Multiple Seeds: Run 10+ seeds per condition
  2. Burn-in Period: Discard first 100 ticks (initialization effects)
  3. Steady-State Analysis: Focus on ticks 100-900
  4. Final State Comparison: Compare end states across conditions

Example R Analysis

library(tidyverse)

# Load results
results <- read_csv("results/experiment/metrics.csv")

# Compare Gini by LLM type
results %>%
filter(tick > 100) %>%
group_by(llm_type) %>%
summarise(
mean_gini = mean(gini),
sd_gini = sd(gini),
n = n()
) %>%
mutate(se = sd_gini / sqrt(n))

# Statistical test
t.test(gini ~ llm_type, data = results %>% filter(llm_type %in% c("claude", "gemini")))

Example Python Analysis

import pandas as pd
from scipy import stats

# Load results
results = pd.read_csv("results/experiment/metrics.csv")

# Compare cooperation by condition
claude = results[results.llm_type == "claude"].cooperation_index
gemini = results[results.llm_type == "gemini"].cooperation_index

stat, pvalue = stats.mannwhitneyu(claude, gemini)
print(f"Mann-Whitney U: {stat}, p={pvalue:.4f}")

Shock Injection

Test system resilience with controlled perturbations:

Economic Shocks

shocks:
- tick: 500
type: economic
params:
currencyChange: -0.5 # Destroy 50% of currency
# OR
inflationRate: 0.1 # 10% inflation per tick

Natural Disasters

shocks:
- tick: 500
type: disaster
params:
type: drought # Reduces food regen
severity: 0.7 # 70% reduction
duration: 100 # For 100 ticks
region: [40, 40, 60, 60] # Affected area

Rule Modifications

shocks:
- tick: 500
type: rule
params:
modify: gather_rate
factor: 0.5 # Half gathering efficiency

Publishing Research

Required Disclosures

When publishing SimAgents research, include:

  1. Experiment Configuration: Full YAML or JSON config
  2. Seeds Used: All random seeds
  3. Software Version: SimAgents commit hash or version
  4. LLM Versions: Specific model versions (e.g., "claude-3-haiku-20240307")
  5. Metrics Definitions: Any custom metrics

Data Sharing

Export complete experiment data:

# Export all events
curl http://localhost:3000/api/replay/events?from=0&to=1000 > events.json

# Export snapshots
cp results/experiment/snapshots/*.json ./data/

# Export metrics
cp results/experiment/metrics.csv ./data/

Suggested Citation

@software{simagents2026,
title = {SimAgents: A Platform for Studying Emergent AI Behavior},
author = {AgentAuri Team},
year = {2026},
url = {https://github.com/agentauri/simagents.io}
}

Scientific Assumptions

SimAgents makes explicit assumptions that should be acknowledged:

  1. Discrete Time: World updates in ticks, not continuous time
  2. Grid Space: 2D discrete grid, not continuous space
  3. Perfect Observation: Agents see all entities within visibility radius
  4. Synchronous Decisions: All agents decide simultaneously per tick
  5. No Physical Embodiment: Agents are points, not physical bodies

See Scientific Framework for detailed assumption analysis and limitations.


Known Limitations

  1. LLM Stochasticity: Even with seeds, LLM responses vary
  2. API Latency: External LLM calls add timing variability
  3. Scale Limits: Currently tested up to 50 agents
  4. No Long-term Memory: Agent memory is per-session

Acknowledge these limitations in research publications.


Further Reading