What Gets Measured

Tandemu captures metrics from two sources: your ticket system (tasks, status, assignments) and Claude Code telemetry (sessions, code changes, friction events). Everything is derived from real activity — nothing is estimated or self-reported.

Task metrics

These come from the /morning → /finish lifecycle.

Cycle time

The wall-clock time between starting a task (/morning) and completing it (/finish). This is the real lead time for a unit of work — no estimation, no story points.

What it tells you: How long tasks actually take. Compare across task types (bug fix vs feature), across team members, or over time to spot trends.

What it doesn’t tell you: Whether the time was spent efficiently. A 4-hour cycle time could be 3 hours of productive work and 1 hour of meetings, or 4 hours of deep focus. Tandemu measures elapsed time, not quality of attention.

Tasks completed

Count of /finish calls with status “completed” per developer, per day or week.

What it tells you: Team throughput. When combined with cycle time, it shows whether the team is delivering quickly or just starting a lot of tasks.

AI-to-manual code ratio

When a task is finished, Tandemu diffs the branch against main. Commits with a Co-Authored-By: Claude tag are classified as AI-generated. The rest are manual.

Ratio	Interpretation
0-20%	AI is barely being used. Developers may need training or better prompts.
20-60%	Healthy mix. Developers are using AI for implementation and writing critical code manually.
60-90%	Heavy AI usage. Worth checking that code quality and test coverage are keeping up.
90%+	Almost entirely AI-generated. Review processes should be extra rigorous.

What it tells you: How much your team is leveraging AI as a tool.

What it doesn’t tell you: Whether the AI-generated code is good. A high ratio with low friction is a positive signal. A high ratio with high friction means the AI is generating code that doesn’t work well.

Session metrics

These come from Claude Code’s native OpenTelemetry output and from task session spans.

Active time

Total time spent in task sessions per developer per day. Derived from the duration between /morning and /finish (or /pause).

What it tells you: How many hours of actual development work happened. This is the passive timesheet — no manual entry required.

Session count

Number of task sessions (completed or paused) per developer per day.

What it tells you: Whether developers are working in focused blocks (few long sessions) or switching frequently (many short sessions). Neither is inherently better — it depends on the nature of the work.

DORA metrics

Tandemu derives DORA metrics from task completion data.

Deployment frequency

Number of completed tasks per day. In Tandemu’s model, a finished task equals a unit of shipped work.

Rate	DORA Classification
1+ per day	Elite
1+ per week	High
1+ per month	Medium
Less	Low

Lead time for changes

Average cycle time (from /morning to /finish) across completed tasks.

Lead Time	DORA Classification
< 1 hour	Elite
< 1 day	High
< 1 week	Medium
More	Low

Change failure rate and time to restore

These metrics require integration with CI/CD pipelines and are not yet derived from Tandemu’s task lifecycle. They will show as zero until a CI/CD integration is connected.

Friction metrics

These come from Claude Code’s telemetry events.

Prompt loops

When a developer repeatedly prompts Claude to fix the same file or error, that’s a prompt loop. High prompt loop counts on specific files indicate problematic code — complex logic, poor abstractions, or undocumented behavior that confuses the AI.

Tool errors

Failed tool executions (file writes that error, bash commands that fail) aggregated by repository path. High error counts in a specific area suggest fragile infrastructure or missing prerequisites.

Friction severity

Tandemu classifies repository paths by friction severity:

Severity	Criteria
High	10+ prompt loops or 5+ errors across multiple sessions
Medium	5-10 prompt loops or 2-5 errors
Low	Below medium thresholds

What friction tells you: Where your codebase needs attention. High-friction files are candidates for refactoring, better documentation, or dedicated test coverage. This is more actionable than a retrospective complaint — it’s backed by data from actual development sessions.

What Tandemu does NOT measure

Keystrokes or typing speed — not captured
Screen activity or idle time — not captured
Individual productivity rankings — not calculated. Metrics are shown per-person for context, not for comparison.
Code quality scores — Tandemu measures friction (a proxy), not quality directly
Meeting time or non-coding activities — only Claude Code sessions are tracked
Estimate accuracy — there are no estimates to compare against. Actual cycle time is the only number.

Using the data

The dashboard shows these metrics to engineering leads. But the most important audience is the team itself.

Developers can see their own cycle times and AI ratios. If they notice their cycle time creeping up, they can ask: am I picking up harder tasks, or am I getting stuck? The friction data helps answer that.

Leads can spot systemic issues: a file that causes friction for every developer who touches it, a team member whose cycle times are much longer than peers (which might indicate they need help, not that they’re slow), or an AI ratio that’s dropping (which might mean the tooling needs attention).

The goal is not to optimize every number. It’s to make the invisible visible — to replace gut feelings about team productivity with real signals from real work.