The Metering Project Usually Looks Clean Until Finance Asks Why One Incident Created Two Days of Billable Usage
The launch was supposed to be quiet because the feature sounded administrative.
A B2B operations platform wanted to add usage metering to a set of high-cost APIs used for document analysis, workflow execution, and downstream sync. Product wanted better visibility into who consumed the most expensive operations. Finance wanted a path toward usage-based packaging for larger enterprise accounts. Engineering wanted one shared ledger instead of a week of guesswork every time someone asked where platform cost went.
None of that sounded controversial.
The team defined a billable unit, added request logging, attached tenant IDs, and wrote a pipeline that turned successful API calls into usage events. The rollout looked responsible. The initial dashboards lined up with what everyone expected. A few noisy accounts stood out. A few internal estimates were validated. Leadership finally felt like the system was becoming measurable.
Then an upstream outage hit a partner integration.
The client retried aggressively for forty minutes. Internal repair jobs reran partially failed operations. Support replayed stuck workflow steps for a handful of enterprise accounts. A reconciliation worker reissued a narrow batch to close the state gap before finance exports ran overnight. The platform recovered.
The next morning, the billing dashboard said those tenants had dramatically increased usage.
They had not.
What increased was traffic that existed because the system was recovering from failure:
- duplicate deliveries after timeouts
- client retries against work already partly completed
- support-triggered replays
- internal repair traffic
- reconciliation calls needed to restore correctness
The metering layer had counted all of it as if it were fresh customer intent.
That is the real trap in usage metering. The hard part is not counting requests. The hard part is deciding which requests are allowed to become commercial truth.
Most teams start with a dangerously simple assumption:
if the request reached the product and returned success, it probably belongs in usage.
That assumption fails exactly where serious systems stop being tidy:
- retries after ambiguous failure
- delayed redelivery
- replay after partial completion
- internal repair traffic
- operator-initiated recovery actions
- backfills or reconciliation runs needed to restore source-of-truth agreement
The point of this article is to help you avoid that failure mode.
The reader is usually an engineer, platform owner, staff-level IC, or technically minded operator introducing usage metering to APIs, workflow engines, AI-backed endpoints, or internal automation surfaces that can retry, replay, or repair work after something goes wrong. The goal is not to design a perfect billing platform in one shot. The goal is to define the billable event honestly, separate customer intent from recovery traffic, roll out metering without turning operational cleanup into invoiceable activity, and keep finance, engineering, and support aligned on what the numbers actually mean.
Consider a B2B workflow automation platform serving logistics and operations teams. Customers call APIs to submit jobs, upload documents, trigger enrichment, and sync state into downstream systems. Under the hood, the product includes retries, internal repair jobs, support replay tools, and batch reconciliation. That makes it a good metering example because the expensive work is real, but not every unit of technical effort deserves to become billable usage.
That distinction is where metering systems either become trustworthy or become arguments.
Metering Is Not About Requests. It Is About Interpreting Commercial Intent
Teams often talk about usage metering like a measurement problem.
How many requests happened?
How many tokens were consumed?
How many files were processed?
How many workflow runs completed?
Those are useful questions. They are not enough.
A metering system is not just a counter. It is an interpretation layer that decides which system activity represents customer-consumable value strongly enough to become a commercial record.
That means metering has at least three layers, and weak rollouts usually over-design the first while under-designing the other two.
Activity capture
What technically happened in the system? A request arrived, a job was retried, a replay was triggered, a batch was reprocessed, or a worker called a downstream model.
Usage qualification
Which captured activities count as legitimate usage under the commercial contract? This is the real policy layer. It decides whether the event represents fresh requested value, duplicate technical effort, or internal recovery work.
Financial representation
How should qualified usage later appear in invoices, dashboards, plan enforcement, or customer conversations? This layer matters, but it depends on the previous two being honest first.
Most metering incidents come from collapsing all three layers into one shortcut:
successful request = billable event
That shortcut is appealing because it is easy to implement and easy to explain. It is also wrong in most systems that have real failure handling.
Take a document-analysis API.
A customer submits one document.
The request times out client-side but finishes server-side.
The client retries.
An internal deduplication layer prevents duplicate storage.
A downstream enrichment worker still reruns because the original completion signal was missing.
Support later replays one step to restore a stuck audit trail.
Several real system activities happened.
The commercial question is not "how many things ran?"
The commercial question is "how many units of customer-consumable work should count as usage?"
That is why metering belongs closer to business semantics than raw request accounting.
If you meter too low in the stack, you are mostly capturing effort.
If you meter too high without enough evidence, you are mostly guessing.
The right boundary usually sits where the system can answer something like this:
Did a distinct unit of customer-requested value become accepted, completed, or meaningfully delivered, independent of the retries, replays, and repair traffic used to get there?
That is a much stronger question than "did a request finish with 200?"
It is also why finance and engineering often disagree during early metering rollouts. Finance wants consistent usage records. Engineering sees a noisy distributed system full of retries and side effects. Both are right from their vantage point. The metering design fails when nobody takes responsibility for converting system behavior into a stable commercial interpretation.
Metering is therefore less like logging and more like state definition.
You are deciding:
- what counts as original work
- what counts as duplicated transport
- what counts as recovery effort
- what counts as internal cleanup
- what counts as customer-visible value
If those answers are fuzzy, the invoice will still become precise. That is the dangerous part.
Scenario: Metering a Workflow Platform Without Billing for Recovery Traffic
The platform serves logistics and operations teams that process freight documents, exceptions, and downstream updates across several systems. Customers use the platform in four main ways:
- they submit jobs through an API
- they upload documents for extraction and validation
- they trigger enrichment and classification on workflow steps
- they sync state outward into ERP or warehouse systems
The platform also includes the kind of machinery most serious products accumulate over time:
- client retries after timeout or transient failure
- webhook retries from partners
- internal queue-based retries
- support replay tools for stuck workflows
- reconciliation jobs that repair drift between systems
- limited backfills after outages or vendor issues
The pricing team wants better usage visibility before rolling out larger enterprise plans. The initial idea is straightforward:
- every successful
POST /jobs - every completed document extraction
- every successful enrichment call
- every outbound sync operation
becomes a usage event tied to tenant_id.
The implementation looks solid on paper.
Each request gets:
- a tenant ID
- an authenticated actor
- a request ID
- an endpoint name
- a status code
- a completion timestamp
The metering service consumes these records and writes one usage ledger row for every successful operation.
Then the platform experiences the kind of week every real platform eventually has.
A partner API times out intermittently. Customers retry. The server occasionally completes work after the client already gave up. Internal workers replay downstream steps to recover partial completion. Support uses a replay action to repair several enterprise workflows before the end of the day. A nightly reconciliation job catches mismatches left behind by the incident.
By Thursday, the systems are mostly healthy again.
By Friday, billing looks wrong.
One tenant appears to have doubled its document-analysis usage.
Another appears to have consumed a surprising number of sync operations.
Support insists much of that traffic came from replay and repair.
Engineering says the requests were real and the service did do work.
Finance says the meter cannot become subjective every time something goes wrong.
The tension here is useful because it exposes the actual design failure.
The platform does not lack data. It lacks classification.
Its metering system knows activity happened.
It does not yet know whether that activity represented:
- fresh customer intent
- duplicate delivery of the same intent
- operator recovery after partial failure
- internal repair work required to restore correctness
That is the distinction the rest of the article is designed to solve.
Start by Separating Customer Intent From System Effort
One of the biggest metering mistakes is treating all expensive work as equally billable simply because the platform had to perform it.
That logic sounds intuitive at first. If the system spent compute, storage, queue time, or model cost, surely that should appear in usage somehow.
But customer billing and internal cost accounting are not the same thing.
The platform may need separate views for:
- what the infrastructure cost
- what the customer requested
- what the business has chosen to bill
If you flatten those into one stream, you make every recovery path commercially explosive.
In this system, one submitted job can produce multiple kinds of effort:
Original requested effort
The customer intentionally asked the product to do a unit of work, such as analyzing a document or executing a workflow step.
Duplicate transport effort
The same request or delivery arrived again because of timeout, retry, redelivery, or concurrency ambiguity.
Recovery effort
Support or the platform reissued work because the first attempt left the system in a partial or uncertain state.
Corrective effort
A reconciliation or backfill action reran work to restore system agreement after something else failed.
All of these can consume resources.
They should not all be treated as the same commercial event.
A useful first principle is this:
billable usage should usually track accepted customer intent, not every unit of technical effort required to repair that intent.
That does not mean every recovery action must be free under every business model. Some platforms intentionally price for reprocessing, premium reliability, or bulk replay. Fine. The key word is intentionally. If recovery or replay is billable, that should be a product decision with explicit semantics, not an accidental side effect of logging successful retries.
This distinction becomes clearer when you compare examples.
If a customer intentionally submits three separate documents, that may be three billable units.
If the customer submits one document, experiences a timeout, retries twice, and the backend eventually finishes once, that is usually one billable unit plus duplicated technical effort.
If support later replays the enrichment step because a downstream audit write failed, that is not obviously new customer usage. It is often platform recovery.
If a reconciliation job reruns an export so downstream finance state matches the source-of-truth system again, that is definitely real work. It is not necessarily customer-requested value.
This is why metering design should begin with intent classes rather than endpoint lists.
At minimum, classify events into:
- original customer intent
- duplicate delivery of original intent
- operator-initiated replay
- automated recovery or repair
- internal maintenance or reconciliation
Only after that should you decide which classes enter the billable ledger.
This classification also helps keep customer communication honest. If a tenant disputes a usage spike, you need a way to explain whether the meter reflects fresh demand, system recovery, or an artifact of repeated delivery. "The API returned success seven times" is not a good commercial explanation if six of those were technical attempts to achieve one intended result.
The metering system should not force your support team into philosophical arguments about what a request "really meant." It should have the evidence and policy boundary already encoded.
Choose the Billable Unit Before You Build the Ledger
Teams often start metering by designing the ledger schema:
- tenant ID
- request ID
- endpoint
- timestamp
- status
- quantity
That is understandable. Schemas feel concrete.
The trouble is that you can build a beautiful ledger that records the wrong thing very consistently.
Before you define the ledger row, define the billable unit.
A billable unit is not "a request" by default. It may be:
- one accepted workflow run
- one successfully processed document
- one completed sync operation with a distinct business target
- one analysis job tied to one customer-submitted artifact
- one batch accepted as a commercial unit, regardless of internal chunking
The right unit depends on what the product is promising.
In this system, the team originally wants to meter per successful API call. That seems simple because the request boundary is easy to observe.
But the actual product promise is closer to:
- one customer job submission
- one document extraction result
- one workflow execution accepted by the platform
Those are not always identical to one request.
Why not?
Because one request can fail and be retried.
One accepted job can fan out into several internal calls.
One support replay may rerun a downstream step without representing new requested value.
One reconciliation batch may re-emit sync work after a mismatch.
If the billable unit lives above the request layer, the ledger must be keyed accordingly.
A useful design exercise is to force every candidate billable unit through three tests.
Distinctness test
Can the system tell when two technical actions are actually the same commercial unit of work?
Intent test
Does this unit represent something the customer or product explicitly chose to consume, rather than something the platform had to do because delivery was messy?
Explainability test
If the customer disputes a charge, can a support or finance team explain why this unit exists without reconstructing logs across five services?
If a candidate unit fails those tests, it is probably too low-level or too ambiguous.
For this system, the billable unit becomes:
accepted_customer_work_id
This is an internal canonical identifier that represents one customer-requested unit of work, even if the system sees multiple requests, retries, replay actions, or downstream repair attempts around it.
The ledger still stores technical evidence:
- request IDs
- replay markers
- source lane
- attempt counts
- timestamps
But those are attached to the billable unit rather than replacing it.
This approach gives the platform a stable answer to the question:
what exactly are we charging for?
Without that answer, the ledger quickly becomes an expensive activity archive masquerading as commercial truth.
Design a Qualification Layer Between Raw Activity and Billable Usage
Once the billable unit is defined, do not let raw traffic write directly into the billable ledger.
That shortcut is the source of most metering pain.
A better architecture has a qualification layer.
The qualification layer receives captured technical activity and decides whether the event is:
- billable as original intent
- non-billable duplicate activity
- replay pending review
- internal repair or reconciliation
- suspicious or ambiguous and therefore not ready for commercial use
In this system, the team moves from:
successful operation -> usage ledger row
to:
captured activity -> usage qualification -> billable ledger or non-billable evidence store
That extra step sounds like complexity. It is actually where the system becomes explainable.
A qualification record might contain:
activity_id
tenant_id
canonical_work_id
source_class
attempt_number
request_id
replay_origin
qualification_result
billable_reason
non_billable_reason
qualified_at
The important field is not only qualification_result. It is the reason.
Reasons matter because metering disputes are usually not purely technical. People want to know why a unit counted or did not count. Without reason codes, the system may technically separate billable and non-billable traffic while still leaving support, finance, and engineering to improvise explanations.
Useful result classes include:
billable_original_intentnon_billable_duplicate_retrynon_billable_operator_replaynon_billable_internal_repairnon_billable_reconciliationreview_required_ambiguous_origin
This structure has several benefits.
First, it lets the product keep rich operational evidence without prematurely monetizing it.
Second, it gives finance a controlled dataset instead of an argument with log lines.
Third, it creates room for future policy changes. Maybe today operator replay is non-billable. Tomorrow a premium plan introduces explicit paid reprocessing under certain conditions. That change becomes a policy update in the qualification layer, not a wholesale rewrite of the ledger model.
This is also where source classes matter. A request should arrive with more than tenant and endpoint if you expect metering to stay honest. The system should know whether the activity came from:
- a customer-initiated API call
- a webhook redelivery
- a retry lane
- a support replay tool
- an internal reconciliation worker
- a backfill or repair job
If source classification is missing, qualification becomes guesswork. Guesswork in billing systems becomes politics very quickly.
You do not need a giant policy engine on day one. But you do need a deliberate place in the architecture where "technical success" and "billable usage" are allowed to diverge.
That is one of the healthiest divergences a serious product can have.
Retries, Replays, and Recovery Work Need Distinct Metering Rules
This is the section many teams try to compress into one line like "dedupe requests."
That is not enough because retries, replays, and recovery are not the same phenomenon.
Retries
Retries usually happen because delivery certainty is missing. The client timed out. The worker lost a lease. A partner redelivered after a transient failure. Retries are not automatically non-billable, but they often should not create new billable units when tied to the same accepted intent.
Replays
Replays are deliberate reintroductions of work after something went wrong or became ambiguous. They are usually initiated by support, operators, or controlled tooling. Replays may re-consume expensive resources. They still do not automatically represent new customer-requested value.
Recovery work
Recovery work includes reconciliation, repair jobs, backfills, compensating actions, and partial state correction. This traffic often exists because the platform is cleaning up after failure or inconsistency. Treating it as billable by default is one of the fastest ways to turn reliability issues into customer distrust.
In this system, the team eventually writes separate rules for all three.
For retries:
- if they map to the same
canonical_work_id, they do not generate a second billable unit - they still produce activity evidence for cost and reliability analysis
- repeated retries beyond a threshold raise an operational flag, not a revenue event
For replays:
- support-initiated replays are recorded distinctly
- replay traffic never writes directly to the billable ledger
- if a replay is part of an explicit productized reprocessing feature, it must carry a separate commercial marker rather than piggyback on raw success
For recovery work:
- reconciliation and repair lanes are non-billable by default
- their volume is visible internally because it matters for platform cost
- any attempt to make them billable later requires explicit product policy, not incidental reuse of the request meter
This is also where age and origin matter.
A retry that arrives seconds after an original timeout is different from a replay initiated two days later during support recovery.
A replay initiated by a trusted operator is different from a background repair worker recomputing stale state.
If your metering system stores only "request succeeded" and "tenant ID," all of these become commercially indistinguishable. That is not a small observability gap. It is a policy failure.
One simple rule helps a lot:
every billable candidate should carry an origin class and a causal link to the customer intent it claims to represent.
Without causal linkage, the meter cannot prove that an event is original rather than derivative.
This is where the architecture begins to resemble incident-safe workflow design more than traditional analytics logging. You are not just counting. You are preserving enough lineage that the platform can later say:
- this was the original accepted work
- this was a transport retry
- this was a support replay
- this was internal recovery to restore correctness
That lineage is what keeps metering from being parasitic on reliability work.
Roll Out Metering in Shadow Mode Before You Let It Touch Billing Truth
One of the highest-value habits in metering design is delaying commercial consequences until the classification model has survived real system mess.
That means shadow mode.
In shadow mode, the platform captures and qualifies usage as if the meter were real, but the output does not yet affect invoices, plan enforcement, or external customer reporting. Instead, the team compares:
- raw technical activity
- qualified billable candidates
- non-billable duplicates and recovery traffic
- known operational incidents and support replays
This period is where bad assumptions get exposed cheaply.
In this system, the first shadow week reveals three major surprises.
Surprise 1: some tenants look noisier only because their clients retry aggressively after short timeouts.
Raw request counts exaggerated apparent usage by a meaningful margin.
Surprise 2: support replay actions are rare most days but highly concentrated during incident windows.
If they had entered billing directly, usage spikes would have correlated with platform instability in exactly the worst way.
Surprise 3: one reconciliation path called the public workflow API and therefore looked like normal tenant consumption even though it existed only to repair internal drift.
None of those issues would have been obvious from the ledger schema alone.
Shadow mode should answer five questions before the system graduates into billing truth:
- how often does technical activity differ from qualified billable usage
- which source classes create the largest disagreement
- are replay and repair paths being classified correctly
- can support explain sample records without log archaeology
- do finance and engineering agree on why disputed samples counted or did not count
That last point is more important than it sounds. Metering is one of the places where cross-functional trust either strengthens or erodes. If finance thinks engineering is hiding usage and engineering thinks finance is charging for recovery noise, the problem is rarely personal. It is usually that the qualification boundary never became explicit enough for both sides to reason from the same evidence.
Shadow mode also gives the team a chance to validate customer-facing semantics before they become commitments. Suppose the product wants to say "you are billed per successfully processed document." Shadow data may reveal edge cases:
- what if the document was processed once but the completion callback was replayed
- what if support reran extraction to repair audit state only
- what if one customer batch creates five internal processing attempts for one accepted unit
Those are not reasons to abandon the pricing idea. They are reasons to let real operational ambiguity teach the policy before the invoice teaches the customer.
Do not rush this phase. A metering model that survives one week of normal traffic but has never seen incident recovery, replay, partial failure, and repair work is not ready. It has only survived the clean part of the system.
Treat Metering Disputes as a Design Surface, Not a Support Escalation
If your metering system is successful, usage disputes will not disappear. But their shape should improve.
Bad metering disputes sound like this:
- "the logs say you called the endpoint"
- "we were only replaying work because your system got stuck"
- "support told us to retry"
- "we are being billed for your outage cleanup"
Those disputes are expensive because they reveal that the platform cannot distinguish customer demand from system turbulence.
Good metering disputes sound more like:
- "here is the original accepted work ID"
- "these five retries were linked to the same non-billable transport event"
- "this support replay was classified as recovery and excluded from billable usage"
- "this second event counted because it represented a new accepted work unit under your reprocessing feature"
That kind of dispute may still involve policy disagreement, but it is at least arguing over explicit rules instead of interpretive chaos.
For that reason, metering design should include a dispute packet from the beginning.
For each billable record, the system should be able to surface:
- the canonical work ID
- the original source class
- the qualification result and reason
- linked retries or replays
- whether operator or internal recovery activity occurred around it
- enough timestamps to reconstruct the sequence without combing through raw logs
If support cannot retrieve that packet quickly, the system is not truly metered. It is just counted.
This is also why I prefer reason-coded non-billable records over silent exclusion. Finance teams understandably worry when engineering says some traffic "doesn't count" but cannot show where it went. A non-billable evidence store solves that problem:
- the activity still exists
- the cost still exists
- the classification still exists
- the commercial consequence is different
That is a much more defensible model than making recovery traffic invisible just to keep invoices clean.
The platform eventually gives support a simple internal view:
Work ID: wk_8821
Customer intent: original document analysis
Billable status: billable_original_intent
Linked retries: 3
Linked support replay: 1
Linked reconciliation events: 0
Commercial quantity charged: 1
That one screen does more for trust than another month of policy debate.
Good dispute handling is not the end of metering design. It is proof that the architecture remembers why the ledger row exists.
Asset: The Billable Event Classification Table
The most useful reusable artifact for this problem is not a pricing spreadsheet. It is a classification table that forces the team to say what kinds of activity can become billable and why.
Use something like this:
Billable Event Classification Table
Event class: original customer request
Canonical work ID present: yes
Origin: customer or productized automation
Billable by default: yes
Reason: represents fresh accepted customer intent
Event class: duplicate retry
Canonical work ID present: yes
Origin: client retry or transport retry
Billable by default: no
Reason: repeated attempt for same accepted work
Event class: webhook redelivery
Canonical work ID present: yes
Origin: sender retry after ambiguous delivery
Billable by default: no
Reason: delivery ambiguity, not new commercial unit
Event class: support replay
Canonical work ID present: yes
Origin: operator recovery tool
Billable by default: no
Reason: recovery action after failure or uncertainty
Event class: internal reconciliation
Canonical work ID present: maybe
Origin: platform repair or state correction
Billable by default: no
Reason: internal correctness restoration
Event class: productized reprocessing feature
Canonical work ID present: yes
Origin: explicit customer action to rerun work
Billable by default: depends on plan policy
Reason: may represent new purchased value if intentionally sold that way
Event class: ambiguous or unclassified activity
Canonical work ID present: no or uncertain
Origin: unknown
Billable by default: never automatically
Reason: missing evidence for commercial truth
This table is powerful because it forces the team to separate technical possibility from commercial policy.
If a row is hard to classify, you usually found one of the real problems:
- no stable canonical work ID
- no reliable source classification
- no agreement on whether a replay is part of the product or part of recovery
- no evidence good enough to support billing consequences
That is not a paperwork issue. It is an architectural gap.
The table also creates a shared language across teams. Finance can see why reconciliation exists without entering the billable ledger. Support can understand why retries remain visible but non-billable. Engineering can implement policies without guessing where the commercial line should be.
Most importantly, it gives you one place to say something that should never be implicit:
not every expensive unit of system effort deserves to become invoiceable usage.
Asset: The Metering Rollout Review Card
The second asset worth keeping is a short rollout review card. This prevents the team from shipping a technically complete meter that is commercially naive.
Metering Rollout Review Card
Change name:
Owning team:
Primary product surface:
Proposed billable unit:
Questions
1. What exact customer intent does one billable unit represent?
2. Which technical events can surround that same intent without becoming new billable units?
3. Which replay or recovery paths exist today?
4. Which internal jobs can generate traffic that looks customer-originated?
5. What evidence proves an event is original rather than derivative?
Required inputs
- canonical work ID defined: yes/no
- source classification defined: yes/no
- qualification layer exists: yes/no
- shadow mode completed: yes/no
- support dispute packet available: yes/no
Stop conditions before invoice use
- retries are still entering the billable ledger directly
- support replay traffic is commercially indistinguishable from customer intent
- internal reconciliation uses customer-facing paths without classification
- finance and engineering cannot explain sample records consistently
- ambiguous events are counted by default
What makes this review card useful is not the formatting. It is that it blocks the most common bad rollout story:
"we collected enough request data, so we assumed the rest would sort itself out."
The rest does not sort itself out.
It becomes customer communication debt, support conflict, finance rework, and pricing mistrust.
In this system, the review card caught two issues before billing exposure began:
- a support replay tool still called the same public path with no recovery marker
- one backfill utility wrote usage candidate events exactly like original customer work
Both were easy to miss in a pure engineering review because the requests were technically valid. Both would have been painful to explain in a metering dispute.
That is why I like short forcing functions more than giant governance decks. Either the product knows what it is charging for or it does not. The card makes that visible fast.
A Good Meter Tells the Truth About Value. A Bad One Monetizes Turbulence.
The real test of a usage meter is not whether it counts accurately at the request layer.
The real test is whether it tells the truth about value strongly enough that finance, engineering, support, and customers can all reason from the same events.
A good meter says:
- this was original accepted work
- these were retries for the same work
- this was operator recovery
- this was internal repair
- this is why exactly one unit was charged
A bad meter says something weaker:
- many successful things happened
- some of them were expensive
- we decided that probably meant billable usage
That second version is how products end up charging for instability, charging for ambiguity, or charging for the platform's own cleanup.
The danger is not only customer anger.
It is also internal distortion.
If the meter monetizes turbulence, then:
- reliability incidents inflate revenue-looking signals
- finance dashboards become harder to trust
- support avoids replay tools because they may trigger billing conflict
- engineering hesitates to run repair workflows because the commercial side effects are unclear
That is an ugly place for a platform to be.
The healthier outcome is very achievable.
Define the billable unit above raw requests.
Separate customer intent from technical effort.
Add a qualification layer between activity and billing truth.
Treat retries, replays, and recovery work as distinct classes.
Run shadow mode until the system survives messy reality.
Give support and finance evidence they can actually use.
If you remember one rule, make it this:
successful system activity is not automatically billable usage.
Usage becomes billable when the platform can prove it represents a distinct unit of customer-consumable value, not just another attempt to recover it.
That is the difference between a meter people can trust and a counter that turns operational mess into commercial noise.