Last synced from GitHub: Apr 22, 2026, 01:13 AM
Mirrored from the product repository.
← Docs/Architecture

GeminiClaw Architecture

Runtime behavior, module boundaries, reliability controls, and extension patterns.

Mirrored from the main product repository

GeminiClaw Architecture

This document describes the current architecture of GeminiClaw as implemented in the repository today.

It is written to answer four questions:

  • what GeminiClaw is
  • how the runtime works
  • how governance, memory, skills, and handoff fit together
  • how the control plane maps to the backend

Detailed diagram package for the active architecture: [Docs/architecture/geminiclaw-active-architecture.md](Docs/architecture/geminiclaw-active-architecture.md).

> Detailed active architecture package: see [Docs/architecture/geminiclaw-active-architecture.md](Docs/architecture/geminiclaw-active-architecture.md). That document is the diagram-first inspection of the current active code paths, including the Telegram ADK path, the CLI/WhatsApp compatibility loops, and the exclusion boundary for deprecated or duplicate files.

1. Architectural Intent

GeminiClaw is a local-first multi-agent runtime with an operator-facing control plane.

It is designed for:

  • execution, not only conversation
  • explicit runtime governance
  • inspectable context and memory
  • high autonomy with trust-first controls
  • multi-agent collaboration without ambiguous ownership

1.1 Core Principles

  • **Local-first**: the runtime, memory, policies, and control plane run in the user's environment.
  • **Execution-first**: tools, scheduling, mutation paths, and operational loops are first-class.
  • **Ownership-preserving**: the addressed agent owns the user task; delegation is explicit and scoped.
  • **Trust-first**: policy prefers assistive, explainable controls over blunt lockdowns.
  • **Operator-first Studio**: the first layer of the UI must be friendly, guided, and useful for non-technical operators.
  • **Living-spec discipline**: important architectural behavior is validated through smoke evals, not only unit tests.

1.2 Conceptual Glossary

  • **Platform**: the full GeminiClaw product.
  • **Runtime**: the orchestration engine that executes runs, tools, policies, approvals, memory retrieval, and replay.
  • **Agent**: a cognitive or operational entity hosted by the runtime.
  • **Channel**: an entry or delivery surface such as Telegram, CLI, or WhatsApp.
  • **Tool**: the executable primitive.
  • **Skill**: a first-class capability package that groups tools, instructions, dependencies, policy hints, and activation constraints.
  • **MCP**: the external capability layer for remote integrations.
  • **Control Plane**: GeminiClaw Studio and the local API used to inspect and govern the system.

1.3 Ownership Invariant

This invariant must remain true everywhere:

  • the agent that receives the request owns the task
  • handoff is allowed only for explicit subtask delegation
  • handoff must not silently reassign the main task
  • the addressed agent remains responsible for the final response

2. Runtime Topology

flowchart TD
  TG["Telegram"] --> ORCH["Telegram Orchestrator"]
  WA["WhatsApp"] --> ORCH
  CLI["CLI"] --> ORCH
  ORCH --> RUNNER["ADK Runner"]
  RUNNER --> POLICY["Policy Callback"]
  RUNNER --> TOOLS["Tool Adapter"]
  RUNNER --> MEMORY["Memory Orchestrator"]
  RUNNER --> INSTR["Instruction Builder"]
  RUNNER --> SESSION["Session Service"]
  RUNNER --> HANDOFF["Handoff Adapter"]
  POLICY --> LATTICE["Policy Lattice"]
  POLICY --> RATELIMIT["Rate Limiter"]
  POLICY --> LOOPGUARD["Loop Guard"]
  TOOLS --> LOCAL["Local Tools (99)"]
  TOOLS --> MCP["MCP Bridge"]
  MEMORY --> SQLITE["SQLite"]
  MEMORY --> LANCE["LanceDB (Surprise Metrics)"]
  SESSION --> ADKDB["adk_sessions / adk_events"]
  RUNNER --> OBS["Observability"]
  OBS --> API["Control Plane API"]
  API --> STUDIO["GeminiClaw Studio (22 views)"]

3. Main Runtime Flow

3.1 Request entry

GeminiClaw supports three channels: Telegram (primary), CLI, and an optional WhatsApp integration.

Main entry modules:

  • `src/channels/telegram.ts` — Telegram bot with Grammy
  • `src/channels/telegramOrchestrator.ts` — multi-bot orchestration
  • `src/channels/telegramHandlers.ts` — message/media/callback handlers
  • `src/channels/telegramCommands.ts` — `/start`, `/approve`, etc.
  • `src/channels/whatsapp.ts` — optional WhatsApp integration via Baileys
  • `src/channels/whatsappAuth.ts` — optional WhatsApp auth flow
  • `src/channels/cli.ts` — local REPL for development

All channels funnel through a single entry point: `runAdkAgent()` from `src/adk/runtime/runner.ts`.

Responsibilities:

  • receive user input (text, photo, voice, document)
  • resolve the addressed agent from database
  • call `runAdkAgent()` with agent record, channel context, and message
  • deliver the response back through the channel

3.1.1 Boot sequence e hot-reload de sub-agentes

O processo `telegram.ts` é o ponto de entrada. No boot:

1. **Takeover com espera ativa**: se outro processo está rodando (hot-reload via `tsx watch`), envia SIGTERM e aguarda até 10s pelo processo morrer antes de iniciar bots. Sem essa espera, bots 3-N recebem 409 Conflict do Telegram (polling duplo). 2. **Sub-agentes inicializados sequencialmente** com delay de 5s entre cada um (evita rate-limit do Telegram). 3. **`dropPendingUpdates: false`** em sub-agentes: mensagens enviadas durante o downtime são entregues e processadas, não descartadas. O Master usa `false` pelo mesmo motivo. 4. **Filtro de staleness** (`SUBAGENT_STALE_GRACE_SECONDS=120`): mensagens com timestamp > 2 minutos antes do boot são descartadas silenciosamente com log `[STALE_MSG]`. Protege contra replay de mensagens muito antigas se o bot ficou offline por horas.

3.2 ADK Runtime

The central runtime is Google ADK (`@google/adk`), living in `src/adk/runtime/` (16 files, ~3,200 lines).

Main module:

  • `src/adk/runtime/runner.ts` — `runAdkAgent()`

This replaced the 3,114-line `autonomousLoop.ts` (now @deprecated, safe to delete after 2026-05-01).

#### 3.2.0 ADK Runtime files

| File | Purpose | |------|---------| | `runner.ts` | Core execution: inference → tool calls → response assembly → memory persistence | | `agentFactory.ts` | Creates ADK Agent from GeminiClaw DB records with model, tools, instructions, policy | | `instructionBuilder.ts` | Composes system instructions from agent profile, playbooks, autonomy rules, self-repair protocol | | `toolAdapter.ts` | Bridges all 99 GeminiClaw tools as ADK FunctionTool instances. Each tool wrapped with per-tool timeout (60s default; overrides for media tools up to 5min) | | `policyCallback.ts` | Single `beforeToolCallback` — rate limiter + loop guard + policy lattice + Google Ads playbook. All catch blocks emit `console.warn` (no silent failures) | | `loopGuard.ts` | Adaptive turn limits per autonomy mode + tail-repetition detection + per-tool batch limits | | `sessionService.ts` | SQLite-backed session lifecycle (`adk_sessions`, `adk_events` tables) | | `channelAdapter.ts` | Converts channel messages to ADK Content, splits responses for channel limits | | `channelContextRegistry.ts` | Stores Grammy/channel context per session for tools that need Telegram access | | `observability.ts` | Logs to `adk_runs` and `adk_steps` tables + event emitter for Studio | | `handoffAdapter.ts` | Multi-agent orchestration via ADK sub_agents | | `mcpBridge.ts` | Loads MCP servers via `MCPToolset` from `@google/adk`. Tools merged into agent at startup via `preloadMcpTools()` | | `memoryService.ts` | ADK MemoryService backed by existing SQLite + LanceDB | | `types.ts` | Shared types: GCAgentRecord, ChannelContext, AdkRunResult, etc. | | `featureFlag.ts` | @deprecated — always returns true (ADK is sole runtime) | | `index.ts` | Barrel export organized by phase |

#### 3.2.0.1 ADK Runner execution flow

1. **Session rotation**: if session exceeds 50 events, rotate (copy last 10 events to new session) 2. **Memory retrieval**: call `memoryOrchestrator.buildContextSnapshot()` with hierarchical scopes 3. **Session creation**: inject `user_facts` and `relevant_memories` into session state 4. **Approval detection**: check for `pending_approval` in session state, match user confirmation patterns 5. **Inference loop**: `runner.runAsync()` — LLM decides which tools to call, ADK executes natively 6. **Policy enforcement**: `policyCallback` evaluates every tool call (rate limit → loop guard → policy lattice → playbook) 7. **Response extraction**: filter function call/response events, extract final model text 8. **Output formatting**: apply Telegram output mode (executive/operational/debug) + sanitize runtime directives 9. **Memory persistence**: call `afterTurn()` to update working memory and save to LanceDB

Responsibilities:

  • build inference context from agent profile and memory
  • delegate tool selection to the LLM natively (no intent classifier)
  • enforce policy via single callback
  • record runtime traces
  • manage session lifecycle
  • stop, continue, or wait for approval

3.2.1 Self-repair execution mode

GeminiClaw treats "change GeminiClaw itself" as a dedicated runtime posture, not a normal open-ended run.

In the ADK architecture, self-repair is implemented as a **structured instruction protocol** embedded in the system prompt by `instructionBuilder.ts` → `buildSelfRepairProtocol()`, rather than a separate state machine module.

The protocol enforces 5 mandatory phases:

| Phase | Rule | |-------|------| | **1. Investigate** | Use `explore_codebase`, `read_file`, `search_text` to understand the problem. No modifications allowed. | | **2. Plan** | List affected files, describe changes, assess risk. Present plan to user. | | **3. Execute** | Apply modifications one by one. | | **4. Verify** | **Mandatory — cannot be skipped.** Run `npm run typecheck`, `npm --prefix dashboard run build`, or `check_docs_sync`. Never declare "done" without verification. | | **5. Synthesize** | Summarize what was done, files changed, and verification results. If verification failed, return to Phase 3. |

The key architectural rule:

  • verification is required before declaring completion
  • if Phase 1 finds insufficient evidence, the agent must ask for more context instead of guessing

Legacy module `src/core/selfRepair.ts` still exists for Studio visibility endpoints but is no longer the primary execution engine.

Important endpoints:

  • `GET /api/v1/runtime/runs/:id/self-repair`
  • `GET /api/v1/runtime/runs/:id/self-repair-checklist`
  • `GET /api/v1/runtime/runs/:id/self-repair-recovery`
  • `GET /api/v1/analytics/overview` (includes selfRepairAttention stats for the Studio overview card)

3.2.2 ADK policy model

The ADK runtime replaces the legacy intent classifier and semantic guardrails with a single `beforeToolCallback` in `policyCallback.ts`.

Main module:

  • `src/adk/runtime/policyCallback.ts`

The callback evaluates every tool call through 4 layers in order:

| Layer | Module | Purpose | |-------|--------|---------| | **Rate Limiter** | `rateLimiter.ts` | Per-chat sliding window quotas. Rejects if quota exceeded. | | **Loop Guard** | `loopGuard.ts` | Adaptive turn limits by autonomy mode (trust_first: 50, require_approval: 15, supervised: 8; hard ceiling: 60). Detects tail repetition and per-tool batch limits. | | **Policy Lattice** | `policyEngine.ts` | Evaluates deny/require_confirmation rules from the policy database. | | **Google Ads Playbook** | `googleAdsPlaybook.ts` | Domain-specific gates for mutable Google Ads operations. |

Tool safety classification:

  • **ALWAYS_SAFE_TOOLS**: 30+ read-only tools that never need approval (read_file, query_google_ads, web_search, etc.)
  • **DESTRUCTIVE_TOOLS**: operations that always require explicit approval (remove_agent, dangerous git operations)
  • **Google Ads destructive actions**: pause/enable campaign, update budget, update bidding, remove keywords — require confirmation even in trust_first mode

Approval flow:

  • When a tool is blocked, the callback stores `pending_approval` in session state with tool name and action
  • The runner detects user confirmation patterns ("sim", "ok", "aprovo", "pode fazer") and grants time-limited approval (5 min TTL)
  • Approval is persisted as `approved:{tool}:{action}` key in session state

Response handling:

  • Runtime directives are stripped from user-facing text via regex sanitization
  • Function call/response events are filtered out of the response stream
  • Output modes (executive/operational/debug) applied before Telegram delivery

Conversation history isolation:

  • Scheduled task messages stored under virtual `sched:<agentId>:<ts>` chatIds
  • `messages.source` column: `'user'` | `'scheduled_task'` | `'internal_directive'` | `'agent_message'`

> **Legacy note:** `intentClassifier.ts` and the semantic gate/dialog state machine are @deprecated since 2026-04-03. The ADK runtime delegates tool selection to the LLM natively — no intent classification needed.

3.2.3 Agent autonomy mode

Each agent carries an `autonomyMode` field (stored in the `agents` table):

| Value | Behavior | Loop Guard Limit | |---|---|---| | `trust_first` | Execute tools freely except destructive ops. No approval needed for read/write. | 50 turns | | `require_approval` (default) | Read-only tools execute freely. Write/mutation tools require approval. | 15 turns | | `supervised` | All tools require explicit approval. | 8 turns |

Set via the Studio agent editor or via SQL: `UPDATE agents SET autonomyMode = 'trust_first' WHERE name = 'AgentName';`

Policy gates by mode:

  • `trust_first`: all tools allowed except DESTRUCTIVE_TOOLS and destructive Google Ads actions
  • `require_approval`: ALWAYS_SAFE_TOOLS allowed, everything else requires confirmation
  • `supervised`: everything requires confirmation

Not bypassed in any mode: rate limiter, MCP trust gates, memory governance gates.

3.3 Tool routing

Main module:

  • `src/adk/runtime/toolAdapter.ts`

The tool adapter bridges GeminiClaw's 99 static tools (declared in `src/tools/schema.ts`) as ADK `FunctionTool` instances. Dynamic tools created via `create_tool` are loaded from `schema_dynamic.ts`.

Responsibilities:

  • call `registerAllTools()` to initialize the legacy tool registry
  • wrap each tool handler as an ADK-compatible function
  • coerce array arguments (Gemini sometimes sends arrays as strings)
  • truncate large tool outputs (>15K chars) with file save option
  • inject channel context (Grammy, chatId, agentId) via `channelContextRegistry.ts`
  • apply trust posture through `policyCallback.ts`

Legacy `src/core/router.ts` still exists but is only used internally by the tool registry for handler lookup. The ADK runner never calls it directly.

4. Context Engineering and Work Context

4.1 Context engineering

Main modules:

  • `src/adk/runtime/instructionBuilder.ts` — system prompt assembly
  • `src/core/instructionComposer.ts` — agent profile composition from DB

GeminiClaw treats context as a first-class runtime layer. The ADK instruction builder composes the system prompt from:

1. **Identity block** — agent name, capabilities, composed instruction from DB profile (mission, tone of voice, binding instructions) 2. **Autonomy rules** — mode-specific behavior (trust_first / require_approval / supervised) 3. **Operator playbook** — user-defined `playbook_instruction` from DB 4. **Resourcefulness protocol** — teaches the agent to never say "I can't" and compose solutions from existing tools 5. **Self-repair protocol** — 5-phase investigation→plan→execute→verify→synthesize pipeline 6. **Tool routing preferences** — e.g., use `generate_google_ads_report` instead of manual GAQL for reports 7. **Google Ads playbook** — RSA diagnosis workflow, optimization best practices, headline rules, safety rules 8. **Dynamic context** — `user_facts` and `relevant_memories` injected via session state (empty section removed if no data)

Legacy `src/core/contextEngineering.ts` still exists but is no longer called in the ADK path.

4.2 Effective work context

GeminiClaw now exposes a richer work context model rather than relying on implicit prompt state.

Key fields include:

  • `agentScope`
  • `sessionScope`
  • `activeSkillSet`
  • `memoryPolicy`
  • `handoffContext`
  • `operatorConstraints`
  • `contextProfile`

Related endpoints:

  • `GET /api/v1/runtime/runs/:id/context`
  • `GET /api/v1/runtime/runs/:id/context-diff`
  • `GET /api/v1/runtime/session-contexts/:sessionId/effective-context`
  • `GET /api/v1/runtime/session-contexts/:sessionId/context-history`
  • `GET /api/v1/work-context/overview`
  • `GET /api/v1/work-context/runs/:runId/lineage`

4.3 Session-aware work contexts

Session contexts let the runtime think inside an explicit work boundary without changing the task owner.

Main modules:

  • `src/core/runtimeSessionContext.ts`
  • `src/core/runtimeSessionSuggestion.ts`
  • `src/core/runtimeSessionPresentation.ts`
  • `src/core/sessionLifecycleMemory.ts`

Session lifecycle states currently include:

  • `active`
  • `paused`
  • `completed`
  • `archived`

5. Skills Runtime

5.1 Why skills exist

GeminiClaw distinguishes between:

  • **tools** as executable primitives
  • **skills** as reusable capability packs

This keeps tool execution explicit while allowing richer packaging of:

  • instruction blocks
  • tool bundles
  • MCP dependencies
  • policy hints
  • memory hints
  • allowed agents
  • activation constraints

5.2 Main module

  • `src/core/skillsRuntime.ts`

5.3 Resolution model

Skill precedence:

  • bundled
  • custom
  • workspace

Agents can explicitly enable skills, and runs resolve an active skill set from:

  • agent base posture
  • explicit enablements
  • session/work context
  • handoff context where relevant

Main endpoints:

  • `GET /api/v1/skills`
  • `GET /api/v1/skills/:skillId`
  • `POST /api/v1/skills`
  • `PATCH /api/v1/skills/:skillId`
  • `POST /api/v1/skills/:skillId/validate`
  • `GET /api/v1/skills/:skillId/dependencies`
  • `GET /api/v1/skills/:skillId/effective-policy`
  • `POST /api/v1/agents/:agentId/skills/:skillId/enable`
  • `POST /api/v1/agents/:agentId/skills/:skillId/disable`

6. Policy Lattice and Runtime Governance

6.1 Main module

  • `src/core/policyEngine.ts`

6.2 Goal

GeminiClaw used to have safety and approval logic spread across the runtime. It now converges around a unified policy lattice so the system can explain and evolve decisions more cleanly.

6.3 Scope

The lattice covers:

  • planning gates
  • risky tool execution
  • MCP trust posture
  • memory governance actions
  • skill execution posture
  • handoff transitions
  • channel restrictions

6.4 Policy decision envelope

The central output shape is a policy decision envelope with:

  • `decision`
  • `reason`
  • `policySource`
  • `userImpact`
  • `operatorImpact`
  • `nextStep`
  • `requiresConfirmation`

Important endpoints:

  • `GET /api/v1/policy/effective`
  • `GET /api/v1/policy/effective-trace`
  • `GET /api/v1/policy/decisions`
  • `PATCH /api/v1/policy/overrides`

6.5 Trust-first rule

The policy philosophy is intentionally trust-first:

  • autonomy remains high by default
  • broad or risky actions become assistive or approval-based
  • behavior should stay explainable rather than silently blocked

For agents requiring maximum autonomy, set `autonomyMode = 'trust_first'` in the `agents` table (see §3.2.3). This bypasses the SEMANTIC_GATE intent check while keeping Policy Lattice gates active for risky tools.

7. Memory System

7.1 Memory layers

GeminiClaw uses two main memory forms:

  • **structured memory** in SQLite
  • **semantic memory** in LanceDB

Main modules:

  • `src/core/memoryOrchestrator.ts`
  • `src/core/memoryScope.ts`
  • `src/core/memoryAccessPolicy.ts`
  • `src/core/memoryGovernancePolicy.ts`
  • `src/core/memoryGovernanceInbox.ts`
  • `src/core/memoryGovernanceNarratives.ts`

7.2 Memory boundaries

Memory is scoped and owned explicitly.

Important concepts:

  • scope
  • owner type
  • owner id
  • memory class
  • sensitivity
  • retention posture
  • lineage

#### Agent memory read model — hierarchical isolation + platform scope

Access to `memory_facts` is controlled by agent ownership and hierarchy, **not** by scope labels.

**Rule:** each agent reads: 1. Its own facts (`ownerType = 'agent' AND ownerId = agentId`) 2. Facts of its direct subordinates (`ownerType = 'agent' AND ownerId IN subordinateIds`) 3. Platform facts (`ownerType = 'platform'`) — cross-agent knowledge readable by everyone (e.g. Google Ads CSV rules)

`ownerType = 'chat'` is **deprecated**. GeminiClaw is single-user; the chat-level owner concept was a multi-user artifact. New facts must never use `ownerType = 'chat'`. Existing records have been migrated.

| ownerType | Stored by | Readable by | Mechanism | |---|---|---|---| | `agent` | any agent | self + direct superior | `reportsTo` hierarchy | | `platform` | operator / Studio | any agent | `getGlobalPlatformFacts()` (no chatId filter) | | `chat` | **legacy** | — | do not use |

Hierarchy is resolved via `reportsTo` on the `agents` table. `getSubordinateAgentIds(agentId)` in `agentRepository.ts` returns all agents where `reportsTo = agentId`.

The scope values (`personal`, `team`, `project`, `org`) remain as **metadata labels** on memory records — useful for governance and auditing — but they do not control who can read what.

`sharedMemoryScopes` on the `agents` table is **deprecated** for access control. Set to `[]` on all agents. The field is kept for backward compatibility with session-scoped runtime contexts only.

RAG context labels injected into system prompt:

  • `[CONHECIMENTO DO AGENTE]` — agent-owned facts (self + subordinates)
  • `[CONHECIMENTO DE PLATAFORMA]` — platform facts

#### Memory tools available to agents

| Tool | Status | Behavior | |---|---|---| | `save_to_memory` | ✅ Active | Default: `ownerType='agent'`, `ownerId=agentId` (automatic). Pass `ownerType='platform'` for cross-agent knowledge (stored under `chatId='__platform__'`). **Save policy (enforced via description):** only save if (a) user provided the data directly or (b) user explicitly requested it. Never save proactively. Never save anything discoverable from project files — use `explore_codebase`/`read_file` fresh each session instead. When receiving a stable fact from the user, ask for confirmation before saving. | | `save_shared_memory` | ⛔ Deprecated | Identical behavior to `save_to_memory`. Schema marked `[DEPRECATED]`. Agents instructed not to call it. Use `save_to_memory` with `ownerType='platform'` instead. | | `search_memory` | ✅ Active | Semantic search over LanceDB (past conversation summaries). | | `list_memory_facts` | ✅ Active | Lists structured facts. `ownerType='chat'` filter removed from schema guidance. |

Current memory classes include:

  • `working`
  • `durable`
  • `shared`
  • `session_recap`
  • `restricted`

7.5 Titans-inspired surprise metrics

Inspired by the *"Titans: Learning to Memorize at Test Time"* paper, the memory system uses surprise scoring to filter noise and prioritize informative memories.

Constants in `memoryOrchestrator.ts`:

  • `SURPRISE_THRESHOLD = 0.25` — chunks below this are discarded (not saved to LanceDB)
  • `SURPRISE_WEIGHT = 0.3` — boost factor for high-surprise memories during retrieval

#### Surprise-gated insertion (afterTurn) Before saving an evicted chunk to LanceDB, the system computes a surprise score: 1. Vector search the summary embedding against the 5 nearest neighbors for the same chatId 2. `surpriseScore = 1 - maxSimilarity` (clamped to [0, 1]) 3. If `surpriseScore < SURPRISE_THRESHOLD`, the chunk is discarded with log `[Memory] ⏭️ Chunk descartado` 4. Otherwise, the chunk is saved with `surpriseScore` as a persistent metadata field

This filters out routine messages ("ok", "entendido", approval confirmations) that would dilute retrieval quality.

#### Surprise-boosted retrieval (searchMemories) After vector search, results are re-ranked:

finalScore = similarity × (1 + SURPRISE_WEIGHT × surpriseScore)

Memories with higher surprise scores are prioritized in the context injected into the ADK runner.

#### Schema The `surpriseScore` field (Float, default 0.0) is added via `ensureVectorGovernanceSchema()` migration. Old records without the field receive 0.0 (neutral — no boost, no penalty).

7.3 Session-aware retrieval

When an active session exists, retrieval now prefers:

  • session-relevant memory
  • skill-relevant memory
  • handoff-origin memory
  • compatible memory classes

It only falls back to broader chat memory when needed.

7.4 Memory governance

GeminiClaw supports explicit governance actions:

  • review
  • redact
  • purge
  • restore
  • reclassify
  • archive
  • retention override

Important endpoints:

  • `GET /api/v1/memory/console`
  • `GET /api/v1/memory/facts`
  • `GET /api/v1/memory/semantic`
  • `GET /api/v1/memory/lineage`
  • `GET /api/v1/memory/audit`
  • `GET /api/v1/memory/policy`
  • `GET /api/v1/memory/policy/evaluate`
  • `GET /api/v1/memory/inbox`
  • `POST /api/v1/memory/archive`
  • `POST /api/v1/memory/retention-override`
  • `POST /api/v1/memory/reclassify`
  • `POST /api/v1/memory/restore`
  • `POST /api/v1/memory/reviews`
  • `POST /api/v1/memory/reviews/bulk`
  • `POST /api/v1/memory/policy/guard`
  • `PATCH /api/v1/memory/facts/:factId/sensitivity`
  • `POST /api/v1/memory/facts/:factId/redact`
  • `PATCH /api/v1/memory/semantic/:semanticId/sensitivity`
  • `POST /api/v1/memory/semantic/:semanticId/redact`

8. Structured Handoff

8.1 Main modules

  • `src/core/agentHandoff.ts`
  • `src/tools/handoffAgentTask.ts`

8.2 Why handoff exists

Handoff is GeminiClaw’s internal collaboration primitive. It exists to delegate subtasks without violating ownership of the user request.

8.3 Current flow

Handoff supports:

  • creation
  • contract inspection
  • accept
  • progress
  • return
  • cancel
  • impact tracking
  • merge back into owner context

Main endpoints:

  • `GET /api/v1/agent-handoffs`
  • `GET /api/v1/agent-handoffs/:handoffId`
  • `POST /api/v1/agent-handoffs`
  • `GET /api/v1/agent-handoffs/:handoffId/contract`
  • `POST /api/v1/agent-handoffs/:handoffId/accept`
  • `POST /api/v1/agent-handoffs/:handoffId/progress`
  • `POST /api/v1/agent-handoffs/:handoffId/cancel`
  • `POST /api/v1/agent-handoffs/:handoffId/return`
  • `GET /api/v1/agent-handoffs/:handoffId/timeline`
  • `GET /api/v1/agent-handoffs/:handoffId/impact`
  • `GET /api/v1/agent-handoffs/:handoffId/merge`

9. MCP Integration

9.1 Main modules

  • `src/core/mcpRegistry.ts`
  • `src/core/mcpClient.ts`

9.2 Model

MCP is treated as an external capability fabric, not as an agent layer.

Each capability can carry:

  • trust posture
  • mutability
  • degraded mode
  • dependency mapping
  • health snapshot

Important endpoints:

  • `GET /api/v1/mcp/registry`
  • `GET /api/v1/mcp/trust-overrides`
  • `POST /api/v1/mcp/trust-overrides`
  • `DELETE /api/v1/mcp/trust-overrides`
  • `GET /api/v1/mcp/health`
  • `GET /api/v1/mcp/dependencies`
  • `GET /api/v1/mcp/effective-policy`

10. Control Plane API

10.1 Main module

  • `src/core/server.ts`

This is the local API consumed by GeminiClaw Studio.

Major API groups:

  • agents
  • skills
  • board
  • schedules
  • analytics
  • ops
  • policy
  • mcp
  • handoffs
  • work context
  • memory
  • governance inbox
  • runtime runs
  • Instagram / LinkedIn integrations
  • `GET /api/contract-kpis?window=<hours>` — ADK delivery contract KPIs (satisfaction rate, rescue rate, enforcer p50/p95/p99, intent source breakdown)
  • `POST /api/sessions/:id/rollback` — admin-only session state rollback to a prior phase snapshot

11. GeminiClaw Studio

11.1 Main shell

  • `dashboard/src/app/App.jsx`

11.2 Main surfaces (23 views)

  • `overview-view.jsx` — system health summary
  • `board-view.jsx` — Kanban board
  • `agents-view.jsx` — agent management and profiling
  • `schedules-view.jsx` — scheduled task management
  • `runtime-runs-view.jsx` — ADK execution history (traces)
  • `session-contexts-view.jsx` — runtime session context lifecycle
  • `skills-catalog-view.jsx` — skill registry and enablement
  • `governance-inbox-view.jsx` — memory governance review
  • `memory-governance-view.jsx` — memory lifecycle and policies
  • `mcp-trust-view.jsx` — MCP server trust settings
  • `policy-console-view.jsx` — policy lattice management
  • `handoff-console-view.jsx` — multi-agent handoff orchestration
  • `ops-view.jsx` — ops alerts and daily budgets
  • `instagram-view.jsx` — Instagram account management
  • `linkedin-view.jsx` — LinkedIn account management
  • `adk-chat-view.jsx` — ADK-native chat interface with streaming SSE
  • `finops-view.jsx` — financial operations dashboard (API cost tracking)
  • `pixel-view.jsx` — 3D pixel art canvas (Arena3D, OfficeCanvas)
  • `roadmap-view.jsx` — workspace roadmap sync
  • `admin-view.jsx` — administrator panel
  • `security-view.jsx` — security and access policies
  • `network-visualizer.jsx` — agent topology visualization
  • `contract-kpis.jsx` — ADK delivery contract KPI gauges (satisfaction rate, enforcer percentiles, intent source breakdown)

11.3 UX rule

Studio friendliness is mandatory.

That means:

  • first layer must be readable by non-technical operators
  • deeper technical layers remain available
  • modules should explain what they are, when to use them, and what to do next
  • governance should start from guided queues whenever possible

12. Storage

12.1 SQLite

Main module: `src/core/database.ts` (67 tables total)

Used for:

  • runtime checkpoints and execution history
  • structured memory facts (`memory_facts`)
  • agents, profiles, and onboarding state
  • schedules and task items
  • handoffs and delegation contracts
  • governance audit and policy decisions
  • analytics snapshots and cost events (`api_cost_events`)
  • integration state (Instagram, LinkedIn, Google Workspace, Google Ads)
  • ADK sessions and events (`adk_sessions`, `adk_events` in `sessionService.ts`)
  • ADK execution traces (`adk_runs`, `adk_steps` in `observability.ts`)
  • migration audit (`schema_migrations` — tracks named migrations for idempotency, prevents destructive migrations from running twice)

12.2 LanceDB

Used for:

  • semantic long-term memory with surprise scoring (`surpriseScore` field)
  • semantic governance metadata
  • multimodal retrieval (embeddings via `gemini-embedding-2-preview`, 3072 dimensions)
  • surprise-gated insertion and surprise-boosted retrieval (Titans-inspired)

13. Validation Model

GeminiClaw uses multiple validation layers:

  • unit and integration-style tests through Vitest
  • focused architectural tests
  • smoke evals
  • explainability evals
  • dashboard build validation

Important commands:

  • `npm run typecheck`
  • `npm test`
  • `npm run evals:agents`
  • `npm run evals:explainability`
  • `npm run evals:system`
  • `npm --prefix dashboard run build`

14. Code Analysis Workflow (GitNexus Protocol)

All agents follow a mandatory 4-step protocol for any task involving code analysis, improvement, or modification. This is enforced via the `[WORKFLOW_CODIGO]` section injected by `buildRuntimeSystemInstruction()` for every agent.

| Step | Tool | When | |---|---|---| | 1. Map | `explore_codebase(directory)` | Always first — locate files and structure | | 2. Understand | `query_code_graph(operation="context", target="Symbol")` or `query_code_graph(operation="query", target="concept")` | Before reasoning about a symbol or concept | | 3. Impact | `query_code_graph(operation="impact", target="Symbol")` or `analyze_code_impact(symbol_name)` | **Mandatory before any edit** | | 4. Recent changes | `query_code_graph(operation="detect_changes")` | After commits, to see affected symbols |

**Fallback rule for external repos:** run `query_code_graph(operation="analyze", repo_path="/abs/path")` once to index. If "not indexed" error occurs, fall back to `explore_codebase` + `search_text` + `read_file`.

`analyze_code_impact` (grep-based, no index needed) is the lightweight fallback when GitNexus is unavailable or the repo is not indexed.

15. Extension Guidelines

Add a new tool

1. Implement it in `src/tools/` 2. Register it through the tool schema/registry 3. Decide whether policy or skill posture should constrain it 4. Add or extend tests 5. Update docs if the capability is user-visible

Add a new skill

1. Add or update a skill manifest 2. Validate dependencies and activation constraints 3. Ensure tools and MCP dependencies are explicit 4. Surface it through the skills API and Studio if appropriate 5. Add tests for resolution and agent enablement

Add a new Studio module

1. Add the view component 2. Wire it into `App.jsx` 3. Keep the first layer operator-friendly 4. Use clear empty states and action copy 5. Update README and ARCHITECT if it changes product understanding

15. Instagram Publishing Capabilities

15.1 Tool

`src/tools/instagramProfile.ts`

15.2 Supported publishing actions (patch_instagram_profile)

| Action | Formato | Observações | |---|---|---| | `publish_local_image` | Feed imagem | Fluxo original; auto-detecta vídeo | | `publish_local_image` | Feed vídeo / Reels | Ativado quando filePath é .mp4/.mov; `mediaType=REELS` (padrão) ou `VIDEO` | | `publish_carousel` | Carrossel 2–10 itens | Itens IMAGE aceitam filePath/imageUrl; itens VIDEO exigem videoUrl pública | | `publish_story` | Story imagem | Via filePath ou imageUrl; caption ignorado pela Meta | | `publish_story` | Story vídeo | Via filePath ou videoUrl; polling obrigatório antes de publicar | | `create_media_container` | Genérico | Passo 1 do fluxo manual (IMAGE/VIDEO/REELS com URL pública) | | `publish_media` | Genérico | Passo 2 do fluxo manual a partir de creationId | | `schedule_post` | Agendado | Exige imageUrl/videoUrl pública (relay expira antes da publicação) |

15.3 Infraestrutura de relay local

`src/core/instagramMediaRelay.ts`

Permite publicar arquivos locais como se fossem URLs públicas:

  • TTL 15 min para imagens, 30 min para vídeos
  • Assinado com HMAC-SHA256; valida content-type antes de enviar à Meta
  • Requer `INSTAGRAM_MEDIA_PUBLIC_BASE_URL` apontando para o túnel público da instância

15.4 Processamento assíncrono de containers

`pollContainerReady()` em `instagramProfile.ts` (alias legado: `pollVideoContainerReady`)

Reutilizado por `publish_local_image` (vídeo), `publish_story` (vídeo), `publish_carousel` (itens VIDEO **e** container pai CAROUSEL) e `publish_media`:

  • Polling a cada 5s, máximo 24 tentativas (2 min)
  • Erros distinguíveis: `ERROR` (codec/duração) vs timeout de processamento

15.5 Pipeline completo: generate_image / generate_video → publicação

O fluxo de geração local → publicação funciona de ponta a ponta via relay Cloudflare:

1. `generate_image` → salva `tmp/img_<timestamp>.jpg` 2. `generate_video` (Veo 3.1) → salva `tmp/video_<timestamp>.mp4` - modo texto→vídeo puro - modo imagem→vídeo usando a última foto do chat (`use_last_image=true`) - modo consistência com até 3 referências locais (`reference_image_paths`) - sidecar JSON persistido em `tmp/video_<timestamp>.mp4.json` com `veo_source` para futura extensão 3. Opcionalmente `generate_music` → WAV stereo 48 kHz (Lyria RealTime) pode ser sincronizado ao vídeo em edição. 4. Opcionalmente `narrate_with_music` → WAV mono 24 kHz já mixado (narração + trilha).

15.5.1 Pipeline de áudio: narrate_with_music

Gera locução expressiva + trilha musical em um único WAV:

1. **Gemini TTS** (`gemini-2.5-flash-preview-tts`) processa o roteiro com instrução de estilo. Suporta marcadores emocionais em linguagem natural no texto: `[pausa]`, `[sussurrando]`, `[com entusiasmo]`, etc. Saída: PCM mono 24 kHz 16-bit. 2. **Lyria** (`models/lyria-realtime-exp`) gera trilha de fundo com duração = `narração + 10s`. Saída: PCM stereo 48 kHz 16-bit. 3. **ffmpeg amix** (fallback: mixagem PCM pura em Node.js) combina os dois canais com `music_volume` configurável (padrão 0.25). A saída tem duração da narração (ffmpeg `duration=first`). 4. Arquivo final: WAV em `tmp/`, pronto para upload via `send_file_to_chat` ou publicação.

15.5.2 Pipeline de vídeo: create_reel

Monta um Reel vertical (1080×1920, H.264/AAC, máx 90s) a partir de WAV + imagens:

1. **ffprobe** → duração exata do áudio em segundos. 2. **zoompan por cena** (`generate_scene_clip`) → aplica efeito Ken Burns (zoom-in progressivo, 5 variantes de pan rotacionadas por índice de cena). Saída: clip libx264 sem áudio por cena. 3. **xfade + merge** (`merge_clips_with_audio`) → encadeia todos os clips com transições suaves e adiciona o áudio WAV. Encoder primário: `h264_videotoolbox 3500k` (Apple Silicon); fallback automático: `libx264 -crf 28 -preset fast`. 4. **burnSubtitles** (opcional) → gera `.ass` com captions por cena e queima via filtro libass (Helvetica 52px, bottom-center, PlayRes 1080×1920). 5. Retorna JSON `{ ok, file, filename, duration_seconds, scene_count, size_mb, has_subtitles }` + instrução de sistema para `send_file_to_chat`.

Parâmetros principais: `audio_path` (WAV), `images` (array com `path` + `caption` por cena), `scene_duration` (padrão 5s), `transition_duration` (padrão 1s), `ken_burns_intensity`, `subtitle_style`.

O campo `skip_auto_send` em `generate_image` foi adicionado para suprimir o auto-envio durante pipelines multi-imagem. As tools `generate_image`, `edit_image` e `generate_google_ads_report` agora retornam `ToolArtifactResult` JSON em vez de string plana com instrução embutida. O runner detecta o shape `{ ok, artifact: { key, path, kind, ... }, delivery_required }` e trata a entrega automaticamente.

15.5.3 Pipeline de vídeo longo: create_post(type="veo_reel")

O `veo_reel` usa o mesmo `generate_video`, mas com orquestração para duração maior e consistência:

1. Opcionalmente monta um pack de referência visual a partir de: - última foto do chat - até 3 imagens locais - character pack fictício gerado internamente 2. Gera narração com `narrate_with_music`. 3. Calcula `scene_count` por `target_duration_seconds` quando não houver valor explícito. 4. Gera clips Veo em lotes de 4, salvando checkpoint por batch. 5. Reusa o mesmo pack de referência + seeds determinísticas por cena. 6. Concatena os clips e substitui o áudio final via `ffmpeg`.

Escopo atual de release:

  • `target_duration_seconds` de 15 a 90
  • melhor resultado prático em 60-90s
  • `consistency_mode`: `strict` ou `balanced`
  • base pronta para futura extensão de vídeo Veo, sem UX exposta nesta fase

Vozes disponíveis: Aoede, Kore, Puck, Charon, Fenrir, Leda, Orus, Zephyr, Achernar, Iapetus, Umbriel, Algieba, Despina, Erinome, Gacrux, Pulcherrima, Schedar, Sulafat, Vindemiatrix, Zubenelgenubi. 3. Qualquer action de publicação com `filePath` detecta automaticamente o arquivo mais recente gerado no chat 4. O relay serve o arquivo via `INSTAGRAM_MEDIA_PUBLIC_BASE_URL` (ex: `https://media.example.com`) com TTL de 15min (imagens) ou 30min (vídeos) — suficiente para a Meta fazer o download

Formatos suportados no pipeline completo:

| Formato | generate_image | generate_video | |---|---|---| | Feed post (imagem) | ✅ | — | | Feed Reels | — | ✅ | | Story imagem | ✅ | — | | Story vídeo | — | ✅ | | Carrossel (itens IMAGE) | ✅ | — | | Carrossel (itens VIDEO) | — | ✅ via filePath |

15.6 Limitações conhecidas

  • Stories não suportam caption, não aparecem em `media_list`, expiram em 24h
  • `schedule_post` não suporta filePath local (relay expira; publicação ocorre no futuro)
  • `media_list` retorna apenas feed posts (não Stories, não Reels arquivados separadamente)

---

16. Current Architectural Status

GeminiClaw is now strong in:

  • ontology and boundaries
  • work context orchestration
  • skills as first-class capability packs
  • trust-first runtime governance
  • memory governance and lineage
  • structured multi-agent handoff
  • operator-facing control plane maturity
  • Instagram multi-format publishing (feed, Reels, carrossel, Stories)

The remaining work should be incremental refinement, not foundational repair.

---

17. Smart Model Routing e Thinking Mode

17.1 Motivação

Modelos de alta capacidade (gemini-3.1-pro-preview, gemini-2.5-pro) têm latência elevada mesmo para tarefas simples como saudações (~64s com thinkingLevel 'low' hardcoded). A solução adota roteamento híbrido por intenção: modelo rápido para tarefas simples, modelo poderoso para tarefas complexas.

17.2 Smart Model Router (autonomousLoop.ts)

  • Após classificação de intenção, se `intent === 'quick_answer'` e o modelo do agente é de alta capacidade → downgrade automático para `gemini-2.5-flash`
  • Latência esperada: ~2s (vs ~64s no modelo original)
  • Log: `[MODEL_ROUTER] quick_answer + <modelo> → downgrade gemini-2.5-flash`

17.3 thinkingPreference por agente

Coluna `thinkingPreference TEXT DEFAULT 'auto'` na tabela `agents`.

Valores: `'auto'` | `'low'` | `'medium'` | `'high'`

Mapeamento padrão (intent → thinkingLevel): | Intent | thinkingLevel | |---|---| | quick_answer | low | | read_only | low | | execute | medium | | plan_first | high |

Se `thinkingPreference !== 'auto'`, o valor do agente sobrepõe o mapeamento por intent.

17.4 Wiring completo

  • `AgentProfile.thinkingPreference` — tipo e persistência
  • `agentRepository.ts` — saveAgent + getAllAgents + getAgentByToken
  • `geminiApi.ts` — `GenerateOptions.thinkingPreference`; resolve `thinkingLevel` final
  • `autonomousLoop.ts` — `LoopContext.thinkingPreference`; passado para generateTextStream e generateText
  • `telegram.ts` — `createBotInstance` aceita `thinkingPreference`; wired em loopCtx principal e replay de self-repair
  • `telegramOrchestrator.ts` — `AgentRuntime.thinkingPreference`; wired em todos os loopCtx (scheduler, lead_notifier, instagram_notifier, runtime_replay)
  • `server.ts` — GET expõe; POST aceita e persiste
  • `agents-view.jsx` — seletor "Thinking mode" na aba runtime; emptyAgentForm, formFromAgent e payload POST

17.5 Configuração via conversa

Agentes (ou Master) podem alterar `thinkingPreference` de um agente via UPDATE na tabela agents (usando ferramentas de banco ou spawn_agent com o novo valor). Expose via Studio na aba "Runtime" de cada agente.

---

18. Google Workspace Integration

GeminiClaw suporta conexão com Google Workspace (Gmail, Calendar, Drive, Chat) via OAuth 2.0.

18.1 Arquitetura

| Módulo | Arquivo | Função | |---|---|---| | Auth flow | `src/core/googleWorkspaceAuth.ts` | Troca code por tokens, refresh, revoke | | HTTP client | `src/core/googleWorkspaceClient.ts` | Wrapper autenticado para APIs Google | | Repositório | `src/core/repositories/workspaceRepository.ts` | CRUD de sessões e contas Workspace | | Criptografia | `src/core/workspaceCrypto.ts` | AES-256-GCM para tokens em repouso; chave auto-gerada | | Notificador | `src/core/workspaceNotifier.ts` + `workspaceNotifierRuntime.ts` | Push notifications via Gmail watch | | Tools | `src/tools/workspaceMail.ts`, `workspaceCalendar.ts`, `workspaceDrive.ts`, `workspaceChat.ts` | Tools de leitura/escrita por produto | | Admin | `src/tools/workspaceAccountsAdmin.ts` | Listagem e revogação de contas |

18.2 Onboarding via Telegram

/workspace_connect           → inicia sessão OAuth, retorna link Google + redirect URI exata
/workspace_oauth <url|code>  → fallback manual se callback não funcionar
/workspace_realtime          → ativa Gmail watch (push notifications)
/workspace_status            → lista contas conectadas

O `/workspace_connect` exibe na resposta o **Redirect URI exato** que o servidor usa, eliminando erros de `redirect_uri_mismatch` por guess work.

18.3 Segurança de tokens

  • Tokens OAuth são cifrados com AES-256-GCM antes de salvar no SQLite.
  • Chave de cifração (`WORKSPACE_TOKEN_ENCRYPTION_KEY`) é **auto-gerada** no primeiro uso

e persistida no `.env` — o usuário não precisa configurá-la.

  • Cada token usa IV único de 12 bytes; autenticidade verificada via GCM tag.

18.4 Variáveis de ambiente

| Variável | Obrigatória | Default | |---|---|---| | `GOOGLE_WORKSPACE_OAUTH_CLIENT_ID` | ✅ | — | | `GOOGLE_WORKSPACE_OAUTH_CLIENT_SECRET` | ✅ | — | | `GOOGLE_WORKSPACE_OAUTH_REDIRECT_URI` | Não | auto-detectada do request | | `WORKSPACE_TOKEN_ENCRYPTION_KEY` | Não | auto-gerada | | `WORKSPACE_ONBOARDING_PUBLIC_BASE_URL` | Não | útil com proxy/ngrok |

Guia completo: `Docs/setup/workspace.md`

19. Google Search Console Integration

GeminiClaw suporta integração oficial com a Google Search Console API via OAuth 2.0 delegado ao usuário, expondo 7 tools para agentes.

19.1 Arquitetura

| Módulo | Arquivo | Função | |---|---|---| | Auth flow | `src/core/googleSearchConsoleAuth.ts` | Authorization URL (PKCE), callback, refresh, revoke | | HTTP client | `src/core/googleSearchConsoleClient.ts` | Retry/backoff, timeout, normalização camelCase→snake_case | | Error taxonomy | `src/core/gscErrors.ts` | `GscError`, `mapHttpToGscError`, 10 códigos de erro | | Repositório | `src/core/repositories/gscRepository.ts` | CRUD de contas, tokens criptografados, audit log imutável, OAuth states | | Criptografia | `src/core/workspaceCrypto.ts` | Compartilhada com Workspace — AES-256-GCM | | OAuth routes | `src/core/server.ts` | `/api/v1/integrations/gsc/oauth/start` e `/callback` | | Confirmação | `src/tools/gsc_confirmation.ts` | Tokens single-use com TTL 5min para operações destrutivas |

19.2 Decisão arquitetural: OAuth delegado vs Service Account

O Google Workspace usa Service Account JWT (DWD). O GSC usa **OAuth 2.0 authorization_code** porque:

  • A Search Console API exige que o usuário aprove acesso às propriedades de seu perfil Google.
  • Service accounts não têm acesso a propriedades GSC sem delegação domain-wide, que não se aplica ao caso de uso.

Consequência: os tokens são armazenados por `accountId` (não por `subject`), e re-consent explícito é necessário para upgrade de escopo read → write.

19.3 Superfície de tools

| Tool | Escopo | Operação | |---|---|---| | `gsc_sites_list` | readonly | Lista propriedades disponíveis | | `gsc_search_analytics_query` | readonly | Métricas orgânicas com paginação automática | | `gsc_sitemaps_list` | readonly | Lista sitemaps de uma propriedade | | `gsc_sitemaps_get` | readonly | Detalha um sitemap específico | | `gsc_url_inspection_inspect` | readonly | Inspeciona status de indexação | | `gsc_sitemaps_submit` | **write** | Submete sitemap (audit log obrigatório) | | `gsc_sitemaps_delete` | **write** | Remove sitemap (confirmação em 2 etapas + audit log) |

19.4 Segurança

  • Tokens armazenados cifrados com AES-256-GCM (mesma infra do Workspace).
  • `access_token`, `refresh_token` e `client_secret` nunca aparecem em logs.
  • OAuth CSRF protegido via `state` TTL 10min + PKCE S256.
  • `gsc_sitemaps_delete` exige `confirmation_token` single-use de 5min — mesmo em modo autônomo.
  • Audit log imutável (append-only) para todas as operações de escrita.

19.5 Variáveis de ambiente

| Variável | Obrigatória | Default | |---|---|---| | `GSC_CLIENT_ID` | ✅ | — | | `GSC_CLIENT_SECRET` | ✅ | — | | `WORKSPACE_TOKEN_ENCRYPTION_KEY` | Não | auto-gerada |

19.6 Onboarding

GET /api/v1/integrations/gsc/oauth/start?accountId=gsc_main&scope=read   → redirect Google consent
GET /api/v1/integrations/gsc/oauth/callback                               → troca code, salva tokens
GET /api/v1/integrations/gsc/accounts                                     → lista contas conectadas
DELETE /api/v1/integrations/gsc/accounts/:id/revoke                       → revoga tokens
GET /api/v1/integrations/gsc/accounts/:id/audit                           → audit log de writes

20. FinOps Module

19.1 Purpose

Tracks and visualizes Gemini API spending across all agents, tools, and models.

19.2 Main modules

  • `src/core/geminiPricing.ts` — centralized pricing table (language models per token, Imagen per image, Veo per second)
  • `src/core/repositories/costRepository.ts` — aggregations by period, tool, model, agent
  • `api_cost_events` table in SQLite — granular cost event log

19.3 Instrumented tools

  • `generateImageTool`: $0.04/image (Imagen 4)
  • `generateVideoTool`: $0.40/second (Veo 3.1)
  • `generateMusicTool`: $0.00/generation (Lyria — free/experimental)
  • ADK runner: token usage cost for every LLM interaction

19.4 API

  • `GET /api/v1/finops/summary` — total spending by period
  • `GET /api/v1/finops/breakdown` — drill-down by tool/model/agent
  • `GET /api/v1/finops/audit` — raw cost event trail
  • `GET /api/v1/finops/pricing` — current pricing metadata

19.5 Studio

`finops-view.jsx` — interactive spending KPIs, date filtering, sparkline chart, drill-down tables, and collapsible pricing reference.

20. Google Ads Capabilities

20.1 Tools (20 tools)

Google Ads is the largest tool family, covering campaign management, optimization, and reporting:

| Tool | Purpose | |------|---------| | `query_google_ads` | Raw GAQL queries for specific data points | | `patch_google_ads_campaign` | Campaign mutations (pause, enable, update budget, update RSA, etc.) | | `publish_google_ads_campaign` | Create new campaigns | | `generate_google_ads_report` | Comprehensive 18-section performance report | | `generate_google_ads_csv` | Export data as CSV/spreadsheet | | `google_ads_health` | Campaign health analyzer (RSA contamination, auto-generated headlines, QS issues, missing sitelinks, budget pacing) — returns score 0-100 | | `google_ads_assets` | Asset management including `create_image_asset` (validates aspect ratio, base64 upload, 5MB limit) | | `google_ads_recommendations` | Google's recommendation engine integration | | `google_ads_audiences` | Audience management | | `google_ads_bidding` | Bidding strategy management | | `google_ads_pmax` | Performance Max campaign management | | `google_ads_experiments` | A/B test management | | `google_ads_shared_sets` | Shared negative keyword lists | | `google_ads_account` | Account-level settings | | `keyword_planner` | Keyword research | | `upload_conversions` | Offline conversion upload | | `list_conversion_actions` / `create_conversion_action` | Conversion tracking management |

20.2 Strategic playbook

Embedded in agent instructions via `instructionBuilder.ts` → `buildGoogleAdsPlaybookBlock()`:

  • RSA diagnosis workflow (contamination detection, auto-generated headline patterns)
  • Headline best practices (≤30 chars, keyword inclusion, CTA, social proof)
  • Multi-step orchestration protocol (batch without per-item confirmation)
  • Product context injection from landing pages

20.3 Safety gates

  • Budget 3x gate: rejects budget changes >3x current without confirmation
  • Destructive action confirmation: pause/enable campaign, update budget, update bidding, remove keywords
  • Google Ads playbook assessment via `assessGoogleAdsPlaybook()` in policyCallback

---

21. Grounding Modules — Traffic, Weather, Maps

These three modules share a common architecture: a shared service layer in `src/core/` with in-memory cache and graceful fallback, plus co-located tools in `src/tools/`. All tools are `READ_ONLY` / `ALWAYS_SAFE`.

21.1 Traffic module (`src/core/trafficService.ts`, 4 tools)

  • `traffic_route` — optimal route with live traffic (Google Routes API, TRAFFIC_ON_POLYLINE)
  • `traffic_conditions` — current conditions in a region/corridor (Waze → Google fallback)
  • `traffic_incidents` — active alerts (Waze → Google fallback, filters generic warnings)
  • `traffic_commute` — multi-time comparison with per-slot congestion breakdown

**Key capabilities:**

  • `geocodeAddress` via Google Geocoding API (24h cache); reused by weather and maps.
  • `travelAdvisory` parsing yields NORMAL/SLOW/TRAFFIC_JAM segments and toll info.
  • `probeAreaTraffic`: corridor route for known highways (1 API call), 4-cardinal probes for unknown areas.
  • Waze backoff on 403/429 (30 min cooldown); automatic fallback to Google Routes.
  • **Waze UGC enrichment**: each alert carries `reportDescription` (author's free-text, mapped from feed field `additionalInfo`), `comments[]` (follow-ups from other drivers with age), `nThumbsUp`, `reliability` (0–10), `ageMinutes`, `nearBy`, `roadType`. Exposed in `traffic_incidents` and in `traffic_route` via the route-corridor filter below.
  • **Waze anti-bot bypass (`src/core/wazeBrowserClient.ts`)**: since 2026 Waze fechou `/live-map/api/*` com fingerprinting TLS/JA3 (Node clients recebem 403 mesmo com UA/headers de browser). Fallback em Chromium headless via `puppeteer-core`: browser singleton lazy-init, cada chamada navega o proprio live-map e intercepta o XHR `user-feed`/`georss`. Primeiro request ~7s, subsequentes 2-4s, cache de 2min reduz chamadas reais. `getWazeAlerts()` tenta HTTP direto primeiro; browser fallback so sobe quando todos endpoints HTTP falham. Shutdown hook em `telegram.ts` fecha o browser em SIGINT/SIGTERM.
  • **Polyline utilities**: `decodePolyline`, `polylineToBoundingBox`, `haversineMeters`, `distanceToPolylineMeters`, `filterAlertsOnRoute` — enable intersecting Waze alerts with a Google Routes polyline (default buffer 250m) so `traffic_route` reports only alerts on the actual corridor, not the whole city.

**Config:** `GOOGLE_MAPS_API_KEY` with Routes + Geocoding APIs enabled.

21.2 Weather module (`src/core/weatherService.ts`, 4 tools)

  • `weather_current` — temperature, humidity, wind, UV, pressure, visibility
  • `weather_forecast` — 1-16 days ahead, min/max, precipitation, UV, sunrise/sunset
  • `weather_hourly` — 1-48 hours ahead, hour-by-hour table
  • `weather_alerts` — official INMET alerts filtered by UF

**Providers:**

  • Primary: **Open-Meteo** (free, no key, ECMWF/GFS models)
  • Fallback: **OpenWeather** (requires `OPENWEATHER_API_KEY`, free tier 1k/day)
  • Alerts: **INMET** (Brazilian National Meteorology Institute)

**Key details:**

  • WMO weather codes (0-99) translated to PT-BR.
  • INMET state filter uses dictionary of state-name → UF code to prevent substring collisions (e.g. "ESP" in "Espírito Santo").
  • Reuses `resolveLocationToLatLng` from trafficService.

21.3 Maps grounding module (`src/core/mapsService.ts`, 6 tools)

  • `maps_place_search` — text search (name, rating, price, open/closed, distance)
  • `maps_place_details` — full details: hours, reviews, photos, services (delivery/takeout/dineIn/reservable)
  • `maps_nearby_search` — POIs in a radius by type (pharmacy, hospital, atm, etc)
  • `maps_geocode` — forward + reverse geocoding with full components (street, neighborhood, CEP, etc)
  • `maps_timezone` — timezone ID, offset, local time
  • `maps_elevation` — altitude in meters with resolution

**Providers:** Google Places API (New), Geocoding API, Time Zone API, Elevation API — all on the same `GOOGLE_MAPS_API_KEY`.

**Key capabilities:**

  • Places API New requires field masks (`X-Goog-FieldMask`) — selective masks reduce cost.
  • Distance calculated client-side via haversine for results anchored near a location.
  • Cache TTLs sized by data volatility: place search 10 min, place details 24h, reverse geocode 24h, timezone 7d, elevation 30d.

**GCP APIs to enable:** `places.googleapis.com`, `timezone-backend.googleapis.com`, `elevation-backend.googleapis.com`.