Skip to content

Changelog

All notable changes to looplet are documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • docs/faq.md: "Why not LangGraph?" honest comparison (thanks @mvanhorn, #17)

[0.1.7] - 2026-04-21

First public release of looplet.

Added (launch polish)

  • ROADMAP.md with a frozen v1.0 API contract and explicit out-of-scope list.
  • docs/ site scaffold (tutorial, evals, recipes, hooks, good-first-issues, discussions-seed, demo-script) + mkdocs-material config + GitHub Pages workflow.
  • THIRD_PARTY_USERS.md social-proof seed.
  • src/looplet/examples/ollama_hello.py — zero-API-key onboarding.
  • Codecov upload step in CI (non-blocking).
  • Leaner README (<170 lines) with the pydantic-ai-harness disambiguation moved to the top.

Added (evals — pytest-style agent evaluation)

  • Eval framework (looplet.evals). Write eval_* functions that take EvalContext and return any of float, bool, str, dict, or EvalResult. The framework normalizes all return types.
  • eval_discover(path) — auto-discovers eval functions in eval_*.py files (like pytest discovers test_*).
  • eval_run(evals, ctx) — runs evaluators, auto-detects llm parameter for LLM-as-judge, catches errors gracefully.
  • eval_run_batch(evals, contexts) — runs same evals across multiple trajectories with per-eval avg/min/max aggregation.
  • eval_mark(*tags) — decorator for categorizing evals. eval_run and eval_run_batch accept include=/exclude= to filter by marks.
  • eval_cli(args) — CLI runner with threshold-based pass/fail exit codes for CI integration.
  • EvalHook — LoopHook that builds EvalContext at on_loop_end and runs all evaluators automatically during development.
  • EvalContext.from_trajectory_dir() — loads context from saved trajectories with support for both looplet and benchmark formats.

Added (MCP + skills)

  • MCPToolAdapter — wraps MCP server tools as ToolSpec instances via JSON-RPC over stdio. No MCP SDK required.
  • Skill — bundles tools + context + prompt fragment into one loadable unit. skill.register(registry) adds all tools.

Added (approval)

  • ApprovalHook — stops the loop when a tool returns needs_approval=True. Combined with checkpoint_dir for crash-safe async human-in-the-loop approval.
  • Renamed elicitapproval uniformly: LoopConfig.approval_handler, ToolContext.request_approval, ToolContext.approve().

Changed (naming cleanup)

  • Renamed internal names for clarity: coerce_textto_text, DiminishingReturnsTrackerStallDetector, reactive_compactemergency_truncate, compress_session_logage_session_entries, enforce_result_budgettrim_results, should_compress_contextis_context_oversized, HEAVY_BLOCK_KINDSLARGE_CONTENT_TYPES, DefaultSummarizerdefault_summarizer.
  • Renamed compact services: DefaultCompactServiceTruncateCompact, LLMCompactServiceSummarizeCompact.
  • Renamed normalise_hook_returnnormalize_hook_return.
  • Moved concurrent_dispatch and reactive_recovery from FLAGS global singleton to LoopConfig fields.
  • Trimmed __all__ from 154 → 54 symbols organized into labeled tiers.

Changed (developer experience)

  • Added preview_prompt() — shows what the LLM sees before the first call. Invaluable for debugging.
  • Added TrajectoryRecorder.summary() — one-liner run summary.
  • Added --trace DIR to coding_agent example for trajectory recording.
  • Added step-by-step tutorial to README (5 progressive steps).
  • Added LoopConfig docstring with "start here" guide listing the 4 essential fields.
  • Added FileCheckpointStore.load_latest() + auto-resume wiring in composable_loop — crash-resume is now one line: LoopConfig(checkpoint_dir="./ckpt").

Removed

  • Removed async_loop.py (feature-frozen, no consumers).
  • Removed 3 mock examples (calculator, code_review, research). Replaced with hello_world.py (real LLM) + coding_agent.py (Claude Code-equivalent tools: bash, read, write, edit, glob, grep, think, done).
  • Removed all back-compat aliases.
  • Removed all internal project references (cadence, primal_security).

Added (compaction strategies)

  • PruneToolResults — new zero-LLM-call compaction service that clears old tool-result content while keeping conversation structure intact. Configurable keep_recent (how many recent tool results to preserve) and compactable_tools (restrict to specific tools). Cheapest possible compaction — use as the first stage in a chain.
  • compact_chain(*services) — combinator that tries compaction services in order; first stage that has an effect wins. Replaces the need for a separate ChainedCompactService class. Usage: compact_chain(PruneToolResults(), SummarizeCompact(), TruncateCompact()).
  • CompactOutcome.cleanup — optional post-compact callback. When set, run_compact() invokes it after firing POST_COMPACT. Use for domain-specific state resets (clear caches, re-inject context, reset token baselines) without the loop knowing details.

Changed (renames — back-compat aliases kept)

  • DefaultCompactServiceTruncateCompact — clearer name for "drop old entries, keep N recent, zero LLM calls."
  • LLMCompactServiceSummarizeCompact — clearer name for "LLM summarizes middle, keeps N recent."
  • Old names (DefaultCompactService, LLMCompactService) remain as aliases and continue to work.

Added (context management pt. 2)

  • Prompt caching infrastructure (looplet.cache). New CachePolicy dataclass declares which stable prompt sections (system prompt, tool schemas, memory) should carry Anthropic-style cache_control markers, with per-section TTL (ephemeral / 1h). LoopConfig.cache_policy threads per-turn CacheBreakpoint lists (label + SHA-256 hash + TTL) to backends that expose generate_with_cache(..., cache_breakpoints=[...]). Backends without the kwarg keep working unchanged — caching is strictly additive. CacheBreakDetector ships as a drop-in observer hook that records section-hash changes across turns for cache-miss telemetry.
  • LLMCompactService — new compaction strategy that spends one LLM call to summarise the session. Produces a dense 4-section summary (task goal, findings, open questions, recent decisions) spliced into the session log as a synthetic entry after keep-recent pruning. Falls back to deterministic keep-recent on any summariser error. Trade-off vs DefaultCompactService: one LLM call per compaction for preserved reasoning chains.
  • Threshold-tier context budgeting (looplet.budget). New ContextBudget dataclass with warning_at / error_at / compact_buffer tiers. ThresholdCompactHook is a ready-to-register should_compact implementation that fires proactive compaction once estimated tokens cross the configured tier. BudgetTelemetry observer records per-step tier samples and exposes peak_tier for production dashboards.

Added (architecture improvements)

  • Proactive compact hook slotLoopHook.should_compact(state, session_log, conversation, step_num) -> bool. Fires at the top of each step, before prompt build. Any hook returning True triggers the configured CompactService preemptively. Complements the reactive prompt_too_long path — use for message-count or token-estimate heuristics. StreamingHook gets a no-op stub.
  • Tool-result streaming via TOOL_PROGRESS — new LifecycleEvent.TOOL_PROGRESS. When hooks are present, the loop builds a ToolContext.on_progress callback per tool-call that emits TOOL_PROGRESS (with the originating tool_call) whenever the tool invokes ctx.report_progress(stage, data). Observers can stream intermediate output from long-running tools without blocking dispatch.
  • Budget-aware turn continuation — new LoopConfig.max_turn_continuations: int = 0. When > 0 and the backend exposes last_stop_reason, llm_call_with_retry will re-prompt up to N times on stop_reason == "max_tokens" and concatenate outputs so long thoughts aren't truncated mid-message. LLMResult gains stop_reason and continuations fields.
  • build_briefing / build_prompt as hook slots — both are now optional methods on LoopHook. First hook returning a non-None string wins; the loop falls back to LoopConfig.build_briefing / config.build_prompt / the built-in default. Lets domain hooks own prompt construction without threading callables through LoopConfig separately.
  • DomainAdapter — new dataclass bundling the five domain callables (build_briefing, extract_entities, build_trace, build_prompt, extract_step_metadata) into a single object. LoopConfig.domain: DomainAdapter | None = None seeds matching flat fields when they are None. Flat fields still win over the adapter, which wins over built-in defaults — use the adapter to package a reusable agent in one handle instead of five kwargs.

Removed (breaking)

  • InvestigationLog backward-compat alias is gone — use SessionLog directly.
  • HARNESS_FLAGS backward-compat alias is gone — use FLAGS.
  • Legacy CADENCE_* environment variables for feature flags are no longer read; use the LOOPLET_* prefix.
  • _clone_tools_excluding private alias is gone — use clone_tools_excluding.
  • LoopConfig.permissions is gone. Register a PermissionHook(PermissionEngine(...)) in hooks=[...] instead — it flows through the same unified HookDecision + event bus as every other hook.

Added

  • Unified hook vocabulary — HookDecision (looplet.hook_decision). All hook slots now accept a single HookDecision return type (legacy None / bool / str returns still work via normalise_hook_return). Helpers Allow(), Deny(reason), Block(reason), Stop(reason), Continue(), InjectContext(text) make intent explicit at the call site.
  • Lifecycle events — on_event(payload) (looplet.events). LoopHook gained an optional on_event(EventPayload) method. The loop now fires 10 named events: SESSION_START, PRE_LLM_CALL, POST_LLM_RESPONSE, PRE_TOOL_USE, POST_TOOL_USE, POST_TOOL_FAILURE, PRE_COMPACT, POST_COMPACT, STOP, SUBAGENT_START, SUBAGENT_STOP. Any hook can subscribe with a single method instead of implementing every slot.
  • PermissionHook (looplet.permissions) — wraps PermissionEngine and plugs it into the event bus so policy decisions flow through the same HookDecision path as custom hooks.
  • CompactService + DefaultCompactService + run_compact(...) (looplet.compact) — reactive compaction is now a swappable service with PRE_COMPACT / POST_COMPACT events.
  • LoopConfig.render_messages_override — byte-exact escape hatch. Receives (messages, default_prompt, step_num) and returns the exact prompt string sent to the LLM. Lets advanced callers take full control of prompt rendering without forking the loop.
  • First-class subagentsrun_sub_loop(..., subagent_id=...) now fires SUBAGENT_START / SUBAGENT_STOP events on the parent's hooks and returns subagent_id in the result dict for correlation.
  • replay_loop(trace_dir, tools=...) — rerun a captured trace through a fresh composable_loop without calling the LLM again. Useful for golden-trajectory regression tests, hook A/Bs, and cost-free loop diffs. Raises RuntimeError if the replay loop requests more calls than were recorded or diverges in method (generate vs generate_with_tools). Falls back to call_NN_response.txt files when manifest.jsonl is missing.
  • python -m looplet show <trace-dir> — stdlib-only CLI that prints a one-page summary of a captured trace (run id, termination, per-step tool calls with durations, LLM totals). Exit code 1 when the directory is missing or malformed.
  • looplet.provenance — new module for debugging agent runs:
  • RecordingLLMBackend / AsyncRecordingLLMBackend wrap any backend and capture every prompt, system prompt, tool schema, response, duration, and error as LLMCall records. generate_with_tools is surfaced only when the wrapped backend supports it, so NativeToolBackend detection stays honest.
  • TrajectoryRecorder hook captures a structured Trajectory per run (steps, context-before, termination reason, embedded Tracer spans) and writes trajectory.json + steps/step_NN.json.
  • ProvenanceSink is a 3-line facade: wrap_llm(...), trajectory_hook(), flush().
  • On-disk layout is diff-friendly: call_NN_prompt.txt / call_NN_response.txt per LLM call plus a manifest.jsonl.
  • Both recorders accept redact= for secret scrubbing and max_chars_per_call= for bounded memory.
  • See Provenance guide for API reference, recipes, and performance notes.
  • Step.pretty() — human-readable CLI formatter complementing Step.summary() (which is tuned for LLM context assembly).

[0.1.6] - 2026-04-17

Added

  • looplet.testing — public test-utility module exposing MockLLMBackend and AsyncMockLLMBackend (scripted, zero-dependency) so downstream packages can unit-test hooks, tools, and backends without a real LLM provider.
  • PyPI publish workflow (.github/workflows/publish.yml) that builds + publishes on version tags via PyPI trusted publishing.
  • README positioning matrix comparing looplet to LangGraph, DSPy, and smolagents; observability/OTel wiring example; stability & versioning policy; real AnthropicBackend usage in quick-start.

Fixed

  • resume_loop_state() now restores the checkpointed Conversation thread (was silently dropping multi-turn message history on resume).
  • RoutingLLMBackend.generate_with_tools is now gated dynamically via __getattr__ so hasattr(llm, "generate_with_tools") returns a truthful answer for the currently-selected backend (consistent with _FallbackLLM and CostTracker).
  • Async __llm_error__ step is now recorded through _history to match the sync loop (previously caused session-log/conversation drift on LLM failure).

Previously added in this release

  • ToolError taxonomy — structured ErrorKind enum (PERMISSION_DENIED, TIMEOUT, VALIDATION, EXECUTION, PARSE, CONTEXT_OVERFLOW, RATE_LIMIT, NETWORK, CANCELLED) plus a ToolError dataclass. ToolResult now carries both error: str (for JSON-safe display) and error_detail: ToolError (for introspection).
  • PermissionEngine — declarative ALLOW / DENY / ASK / DEFAULT rules with fail-closed arg_matcher, plug-in ask_handler for human-in-the-loop, and an append-only denial audit log.
  • CancelToken — cooperative cancellation is now threaded through LoopConfigllm_call_with_retry / async_llm_call_with_retryToolContext.cancel_token, so both the next LLM call and any in-flight tool can stop cleanly.
  • ToolContext.elicitLoopConfig.elicit_handler surfaces a generic elicit(prompt) → str protocol to tools for interactive prompts.
  • Multi-block messagesMessage.content supports a list of ContentBlock(kind, data) alongside plain str. HEAVY_BLOCK_KINDS (image / audio / video / binary) are stripped before summarization.
  • Async build_traceasync_composable_loop now stashes the built trace on state.trace at exit (async generators can't return a value).
  • SyncToAsyncAdapter.generate_with_tools — router-selected sync backends keep native-tools support in the async loop.
  • Preflight context check — async loop matches sync by skipping a doomed LLM call when the prompt is already too long under FLAGS.reactive_recovery.
  • Checkpoint state countersresume_loop_state now round-trips state.queries_used and state.budget_remaining so budget enforcement continues across resume.

Changed

  • ToolResult.error narrowed back to str | None (JSON-safe). Use ToolResult.error_detail for structured introspection.
  • PermissionRule.matches() now fails closed per decision type: DENY rules match on matcher errors (block), ALLOW / ASK rules do not (don't accidentally grant).
  • PermissionEngine._resolve_default collapses ambiguous engine defaults (ASK / DEFAULT) to DENY so a decision never leaks into a PermissionOutcome where both .allowed and .denied are False.
  • ToolSpec._accepts_ctx is computed eagerly at register() time (and self-heals in dispatch() for specs inserted directly).
  • _backend_accepts_cancel_token cache keyed by (type, method_name) instead of id() (eliminates id-recycling hazard).
  • _classify_exception broadened to detect asyncio.CancelledError, rate-limit, context-overflow, and parse exceptions by class name / message content.
  • SyncToAsyncAdapter._adapter_cache now prefers the backend object itself as the dict key, with id() as a fallback for unhashable backends.
  • SessionLog.to_list() includes recall_key for full round-trip through checkpoints.
  • ToolError.context now round-trips through Conversation.serialize / deserialize.
  • Permission-denied results from hooks now populate error_detail with ErrorKind.PERMISSION_DENIED (parity with the PermissionEngine path) in both sync and async loops.

Fixed

  • _rebuild_prompt now renders memory and falls back to the structured build_prompt from looplet.prompts instead of a bare f-string, restoring parity with the first-pass build.
  • _deserialize_message now reconstructs ToolError from serialized error_kind / error_retriable / error_context fields.
  • _NullSessionLog (async) gained the attributes the async loop expects: entries, current_theory, to_list(), compact().

[0.1.5] - initial public import

  • Initial release as a standalone package. See the extraction commit history for the pre-extraction development timeline.