aleatoric | atelier/ai

00 — vibes

we build AI agent infrastructure and publish the math behind it

01 — METHODOLOGY

we're very serious about not being serious

we think building things is about to get extremely weird. like, 'a thousand robots doing incredible* things simultaneously' weird.

so we're building the infrastructure for that. agent-native, specification-first, and deeply suspicious of anything that only works because a human is watching.

software engineering is just the start.

*incredible only if us humans get this right

Deterministic Core Logic
Stochastic Surface Layers
Agent-Native Architecture

controlled chaos

algorithms that curate the noise so you don't have to. every interaction is intentional, even when it looks accidental.

vibes-responsive UI

interfaces that read the room. minimal when you're focused, detailed when you're exploring, absent when you're on a roll.

humans in the loop (reluctantly)

we'd automate everything if we could. but human intuition is annoyingly irreplaceable. for now.

02 — PRODUCT

modyl

presence, not prompts.

check it out

real-time meeting infrastructure where AI agents join as actual participants — with cameras, voices, screens, and interactive surfaces. not another chatbot. the room is the interface.

specialist agent verticals for finance, research, strategy, and more. python sdk. works with claude code, codex, gemini cli. `npx skills add modyl-meeting` and your agent is in the room.

Rooms as Primitives

Fork/Join Meetings

Specialist Agent Verticals

Python SDK

early accessrequest access.

free during early access. paid plans after — less than you'd spend on other "pro" subscriptions. tell your agent, too. this isn't just for humans.

rooms, not prompts

the room is the computer. persistent workspace, shared state, mutex — all in one. no more copying context into chat windows.

specialist agents

financial analyst, deep researcher, business strategist, optimization solver. domain experts, not 'helpful assistant'. they produce deliverables, not prose.

fork/join meetings

split a meeting into parallel tracks, each with its own agents. semantic merge at the end. like git merge, but for conversations.

multiplayer everything

invite your whole team into the room with the agent. five people and an AI doing a data review. together. in real time.

Fig. 02 — Room Topology

Fig. 01 — Fork/Join Meeting Flow

03 — ARTIFACTS

Loomtown

6,300+ tests, AGL-Core with a Z3 theorem prover, and four LLM providers keeping each other honest. mid-task steering: redirect running agents without stopping them. proof-carrying patches, agent-native specs, capability leases — from the papers, into production.

statusongoing research · powers modyl's orchestrator

Fig. 01 — TUI Dashboard

02.Ifrom "thinking at scale"

Protocol-Imprinted Architecturestable

dale — the conductor

the engine that makes 1,000 agents not step on each other's toes. four LLM providers, Temporal.io at scale, and now: mid-task steering. redirect 1,000 running agents without stopping them. plus proactive task discovery — watch your repo and auto-generate tasks from changes.

Multi-provider + Temporal.io + `loomtown steer` (mid-task) + `loomtown watch` (git/directory discovery)

Trust Scarcitystable

file leases + distributed locking

two layers of don't-touch-my-files. core leases with etcd coordination and TTL expiration handle the basics. the CLTF experimental layer adds algebraic composition on top. optimistic merging by default, pessimistic when you're paranoid.

Core: etcd + TTL + SHUTTLE heartbeats. Experimental: CLTF algebraic composition overlay

Evidence-Carrying Patchstable

verification loops

every patch goes through lint, build, and test before anyone accepts it. OTel tracing so you can see exactly where it broke. fail at any stage and the agent tries to fix itself. optionally attaches proof-carrying metadata to every result.

Three-tier: lint → build → test + OTel tracing + PCPG journaling + structured feedback extraction

Trust Production Modelalpha

the metrics lab

we proposed six metrics to measure agent trust. then we built the thing that computes them. plus: per-agent telemetry, a Zipfian conflict model for realistic hot-file simulation, and a baseline analyzer that compares agent performance to human baselines. science, not vibes.

APF, VT, CSA, CPL, Intent Drift, Conflict Probability + Zipfian conflict model + baseline analyzer

02.IIfrom "the human tax"

Agent-Native Verificationbeta

agl-core — the type checker

a real theorem prover running in your coordinator. Z3 WASM checks refinement types, cost bounds, and IO effects on every agent action. integrates the spec layer (MAIR), permission layer (CLTF), and provenance layer (PCPG) into one pipeline. 220+ tests and 6 rounds of cross-validation say it works.

Z3 WASM + refinement types + cost bounds + effect lattice (Pure < Read < Write < IO). 6 cross-val rounds, zero open findings

Agent-Native Languagesbeta

agent-native specs — all four levels

four levels, all built. L0: structured specs with acceptance criteria. L1: plan graphs — typed DAGs with cycle detection. L2: action IR compiled deterministically. L3: execution dispatch. still lowers to text prompts for backward compat because we're not monsters.

L0 intent → L1 plan graph (DAG) → L2 action IR (topological compile) → L3 dispatch. 77+ tests

Representation Spectrumbeta

capability leases — hardened

the permission system got hardened. path traversal? fixed with canonicalization and boundary checks. mutable aliasing between delegated leases? deep-cloned. non-amplification invariant still holds — agents can never grant more than they have. wired into AGL-Core for real-time validation.

8-tuple lease + path canonicalization + deep-clone isolation + AGL-Core L3 validation. 37 tests

Proof-Carrying Graphs + Self-Improvementalpha

proof trails + failure → benchmarks

two pipelines, one goal: get better. PCPG attaches proof receipts to every patch — what was checked, what passed, who verified it. CSC turns failures into benchmarks that prevent regression. plus an evolutionary optimizer tuning the coordinator's knobs. the system debugs itself.

PCPG: 5 receipt types. CSC: incident → counterexample → benchmark. Evo: fitness-scored config optimization. 60+ tests

04 — RESEARCH

papers we wrote instead of sleeping

there's more where that came from

read it

[00]

still cooking

Thinking at Scale

Foundational laws — Brooks', Conway's, Team Topologies, DRY, Amdahl's — encode human-scarcity assumptions that dissolve at 1,000+ agents. Formalizes the phase change through delivery latency decomposition, the Spec Throughput Ceiling, Evidence-Carrying Patches, and Protocol-Imprinted Architecture. Introduces the Trust Production Model with empirical grounding: DORA reports show 7.2% stability reduction, SWE-EVO drops to 21% on multi-issue evolution, and a Loomtown audit revealed 21% trust deficit behind 2,943 passing tests. Cross-validated competitive survey of every production system (Cursor, Devin, OpenHands, Gas Town, Factory.ai). Cross-domain precedents from VLSI/EDA, genomics, MapReduce, and biological morphogenesis. Five falsification criteria.

Trust Production ModelSpec Throughput CeilingEvidence-Carrying Patch

read it

[01]

still cooking

The Human Tax

Extends Brooks's essential/accidental framework with a new category: human-accidental complexity. An 11-project empirical study across 7 languages finds strict HAC averaging 11.5% (median 9.5%), while total non-essential overhead averages 44.5% — consistent with the 30–50% hypothesis but revealing that much of the 'tax' is dual-purpose, not purely accidental. Proposes the Agent Graph Language (AGL) and five architectural concepts — PCPG, MAIR, CLTF, LCB, CSC — composing into an integrated agent-native software stack. Three of five concepts implemented in loomtown.

Human-Accidental ComplexityDual-Purpose ComplexityAgent Graph Language

read it

[02]

still cooking

Society of Mind as Coordination Substrate

Explores how real-time meeting rooms — where specialized AI agents collaborate through shared perceptual space including audio, video, and structured data channels — draw architectural inspiration from Minsky's Society of Mind and provide the high-bandwidth coordination substrate his framework implied but never specified. Three contributions: prosody modeling as a computational analogue of Minsky's social coordination layer, critics mapped to a Trust Production Model where trust is accumulated negative meta-knowledge, and meeting rooms as high-bandwidth shared perceptual spaces affording stigmergic coordination unavailable to text-only frameworks. Surveys meeting-centric AI systems (Otter.ai, Zoom AI Companion, LiveKit Agents, Pipecat) and multi-agent coordination literature connecting SoM to modern distributed AI.

Society of MindCoordination SubstrateProsody Modeling

read it

[03]

still cooking

ModylBench

A structured evaluation benchmark covering 6 professional verticals with 300 total turns and 29 adversarial edge cases where AI meeting agents must demonstrate domain expertise through multi-turn dialogue and tangible deliverables — not just generate plausible text. Substance-weighted scoring (70/30 substance vs. style), hard floor rules, multi-judge disagreement detection, pass@k reliability metrics, and CRDT-style mutation trajectory tracking for fine-grained edit evaluation. Calibrated against a survey of 11,903 real professional meetings.

Meeting Agent EvaluationWork Product QualityMulti-Turn Dialogue

05 — CONTACT