Issue #019 - Service account tokens: the expiry that breaks your CI on weekends

projected volume tokens, 1h expiry, kubelet rotation, legacy operator cache, CI breakage timeline

May 26, 2026

Pager goes off at 3:07 on a Saturday morning. The alert is CIBuildFailureRateAbove50pct. It's been firing for nine minutes by the time anyone looks at it, because the only on-call awake enough to read Slack is in Berlin and the rest of the team is in two American time zones that are still asleep. Every CI build that started in the last hour has died with 401 Unauthorized from the Kubernetes API. Nothing in the cluster has been deployed since Thursday. Nobody pushed a config change. The cluster, by every dashboard, is green.

This is the story of the next four hours, what the on-call kept missing, and why a token nobody had touched in two years finally expired.

Debug Story: the Saturday the cache won

03:07 - the page

The first thing the on-call did was the thing everyone does. kubectl get nodes. All Ready. kubectl get pods -A | grep -v Running returned nothing scary. The control plane was healthy. Etcd metrics were boring. The alerting rule was right that CI was broken, but the cluster wasn't.

Second instinct was the CI namespace. The build pods were crash-looping on a roughly twelve-second cadence and the kubelet was already past its second restart-backoff bump on most of them. Logs from the failed pods all ended the same way:

error: failed to retrieve secrets from kubernetes api:
  Unauthorized

So the failure wasn't scheduling or container start. Something inside the build pod was talking to the Kubernetes API and getting told no.

By 03:14 the on-call had a working theory: the API server was rejecting their auth, probably an RBAC change someone pushed late on Friday. They paged the SRE lead. Half-asleep, the lead asked them to dump events sorted by lastTimestamp. What came back was a wall of restart warnings about the CI pods and, aside from that, nothing useful - no RBAC denial, no admission webhook complaint pointing the on-call at anything to actually go fix.

03:42 - the false lead

By 03:42 they were in the audit logs. The cluster had API audit logging on, dumped into Loki, and the on-call typed in a query for GET requests against secrets coming from any service account in the ci namespace:

{cluster="prod-1"} | json
  | verb = "get"
  | objectRef_resource = "secrets"
  | user_username =~ "system:serviceaccount:ci:.+"

The hits came back as 401s with the message "Unauthorized: token expired".

That should have been the moment. It wasn't — and the reason it wasn't is the kind of thing you only see in hindsight.

The on-call read token expired and went to look at the Secret that holds the CI service account's token. They ran kubectl get sa builder -n ci -o yaml and saw a secrets: field pointing at a Secret called builder-token-xxxxx. They ran kubectl get secret builder-token-xxxxx -n ci -o yaml, base64-decoded the token, pasted it into jwt.io, and saw an exp claim from 2024. The token was eighteen months old. It wasn't going to expire.

So the cluster was rejecting a token that, on its face, was still valid. That sent the investigation in a different wrong direction for about forty minutes. Was someone rotating the cluster's signing key mid-incident? Was a mutating webhook eating the auth header? Both turned out to be a waste of forty minutes.

04:31 - the operator nobody owned

Around 04:31 the SRE lead joined the call and asked a question the on-call hadn't asked: which container inside the build pod was actually making the API call? The build job itself didn't talk to the K8s API directly. It called a helper service in the same namespace, an internal operator called ci-secret-resolver that fetched secrets from various places (Vault, AWS Secrets Manager, K8s Secrets) and exposed them to the build through a unix socket.

The on-call had never thought about ci-secret-resolver because it was old. It had been deployed by someone who left in 2023, it was a single Deployment with one replica, it didn't have an owner team, and it never broke. The Helm chart that managed it pinned the image to a digest from two years ago.

They kubectl exec-ed into the resolver pod and ran curl -k -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://kubernetes.default/api/v1/namespaces/ci/secrets. It came back with the secret list. The token in the file worked.

Then they looked at the resolver's own logs. The resolver was logging the Authorization header it was sending (poor hygiene, but lucky for this debug). The token it was sending was different from the one in the file. Different iat. Different exp. The one in the file had been written ten minutes ago. The one the resolver was sending had been issued at 04:11 Friday morning, expiring at 05:11 Friday morning.

The resolver had read the token file once, at pod start, twenty-three hours and fifty-six minutes ago. It had cached the bytes in memory and was still sending them. The token's exp had passed at 05:11 Friday. From that point on, every call the resolver made to the API server returned 401. CI didn't notice until the next batch of builds started Saturday morning, because the only Friday builds had been before 05:11 and the cluster was quiet over Friday night.

04:47 - the fix and the realization

Killing the resolver pod resolved everything in about ninety seconds. The new pod read the current token from the file, started using it, CI started passing. The fix was that small.

The realization was bigger. The resolver had been deployed against a Kubernetes 1.21 cluster, where service account tokens were issued without expiry as long-lived bearer tokens. The cluster had been upgraded to 1.24 in 2024, which silently switched to BoundServiceAccountTokenVolume and projected tokens with a one-hour TTL. The resolver kept working for a year and a half because every time the resolver Pod restarted (deploys, evictions, node rotations), it picked up a fresh token. Nothing had restarted the resolver in twenty-four hours, which was the first time that had ever happened. Stability had become a bug.

The on-call wrote it up. Three things in the postmortem:

The resolver, and probably others like it, was reading the SA token once at startup. That contract was wrong on any modern Kubernetes.
The cluster had no detection for "pods using a token older than its own expiry." That should be a dashboard.
There was no inventory of which operators in the cluster were old enough to predate bound tokens.

The third one was the scary one. They didn't know how many other resolvers were out there. Item three was the actual root cause. Items one and two were symptoms of nobody asking the question.

Trace: how kubelet, the API server, and the operator each see the token

What KEP-1205 changed

The mechanics that bit the resolver come from KEP-1205, Bound Service Account Tokens, beta in 1.21, default-on in 1.22. Before KEP-1205, every ServiceAccount had a Secret of type kubernetes.io/service-account-token with a JWT inside that had no exp claim. The token was eternal. The kubelet mounted that Secret into pods at /var/run/secrets/kubernetes.io/serviceaccount/token. Anyone who exfiltrated that token kept it forever.

KEP-1205 replaced that mount with a projected volume containing a serviceAccountToken source. The projected volume isn't a Secret. It's a virtual mount that the kubelet writes to directly, with a token requested from the API server's TokenRequest endpoint. The token has an aud (audience), an exp (default one hour), and an iat. The kubelet refreshes the file before expiry. The pod sees the same path it always did, but the bytes change.

A modern pod spec, even one you didn't write, has this hidden in it. Run kubectl get pod some-pod -o yaml and look under spec.volumes:

- name: kube-api-access-abc12
  projected:
    defaultMode: 420
    sources:
    - serviceAccountToken:
        expirationSeconds: 3607
        path: token
    - configMap:
        items:
        - key: ca.crt
          path: ca.crt
        name: kube-root-ca.crt
    - downwardAPI:
        items:
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.namespace
          path: namespace

The admission controller ServiceAccountTokenVolumeProjection adds this projection automatically to every pod that has a ServiceAccount, which is every pod. The expirationSeconds: 3607 is hardcoded by the controller (3600 plus a small jitter). You don't set it. The pod author doesn't see it. It's just there.

What the kubelet actually does

The kubelet has a goroutine per projected token that watches the token's expiry and refreshes when the token has 20% of its TTL left, or whenever the kubelet itself restarts. The refresh path is straightforward: kubelet calls TokenRequest against the API server, gets back a new JWT with a fresh exp, atomically rewrites the file in the projected volume. The path the pod mounts at startup keeps pointing at the same file, but the bytes inside that file rotate roughly hourly - anything that reads the file once and caches the bytes will quietly fall behind.

With --v=4 on the kubelet, the refresh leaves these breadcrumbs:

I0517 04:11:24.317894 1 reconciler.go:268]
  operationExecutor.MountVolume started for volume "kube-api-access-abc12"
  (UniqueName: "kubernetes.io/projected/pod-uid-1234-kube-api-access-abc12")
  pod "resolver-7c9d-xyz12" (UID: "pod-uid-1234")
I0517 05:01:12.401829 1 projected.go:241]
  ServiceAccountToken refreshed for pod resolver-7c9d-xyz12,
  new expiration 2026-05-17 06:01:12 +0000 UTC

Refresh fired on schedule at 05:01, and the file on disk was good through 06:01. The resolver, sitting in user space inside the container, hadn't reopened that file since pod start. Its in-memory copy was still the 04:11-issued token that had already expired at 05:11. From then on every call it made to the API server came back 401 - against a current token sitting unread two file descriptors away.

How the API server sees it

When the resolver made a call to kubernetes.default, the API server ran the TokenAuthenticator chain - bootstrap tokens first, service-account tokens second. The ServiceAccountToken authenticator parses the JWT, checks the signature against the cluster's signing keys, and then validates exp. The relevant code path in the apiserver is pkg/serviceaccount/jwt.go, which calls claims.ExpiresAt and rejects anything in the past.

The audit log entry from the on-call's investigation, decoded, had user.username: system:anonymous, responseStatus.code: 401, and the annotation authentication.k8s.io/legacy-token-expired: "true". The system:anonymous was the giveaway. The token failed validation, so the request fell through to the anonymous authenticator, which the API server still runs by default for /healthz and a few other paths. Anonymous can't get secrets, so the response is 401. The user record on the audit entry is anonymous, not the resolver's service account. That's why the on-call's first Logql query had to filter by URL path and not by user.

Why the file watch wasn't there

Client-go has had a helper for this since 2020. transport.NewCachedFileTokenSource reads the token file on each call (with a small cache to avoid hitting the filesystem every time) and produces a fresh Bearer for each request. The standard rest.InClusterConfig() uses it. Any client built on the standard helper would have been fine.

The resolver was older than the helper. It was written against client-go 0.18 and hand-rolled its auth:

token, err := os.ReadFile("/var/run/secrets/kubernetes.io/serviceaccount/token")
if err != nil { return nil, err }
return &http.Client{
    Transport: &authedTransport{token: string(token)},
}, nil

That's the whole bug, in five lines. os.ReadFile once, hold the bytes, never look again. A fix is the same five lines, with the read moved inside authedTransport.RoundTrip. Or, less invasively, swap the constructor for transport.NewCachedFileTokenSource("/var/run/secrets/kubernetes.io/serviceaccount/token") and let client-go do the right thing. External clients (CI, ArgoCD, anything outside the cluster) use the TokenRequest API instead, which mints a fresh short-lived JWT per call - no file to cache, so this whole bug class is impossible by construction there.

Policy: detecting cached-token operators before they bite

The first move after the incident was an inventory pass. Three layers, cheap to run.

Layer one: find pods using projected SA tokens at all

Easy filter, gives you the universe of candidates. Every pod that has a ServiceAccount has one of these mounts unless automountServiceAccountToken: false is set explicitly.

kubectl get pods -A -o json | \
  jq -r '.items[] | select(.spec.volumes[]?.projected.sources[]?.serviceAccountToken) |
    "\(.metadata.namespace)/\(.metadata.name)"'

In a normal cluster this is almost every pod. That's not interesting on its own. What's interesting is which of those pods are old or unmaintained.

Layer two: find pods running long enough to have refreshed

A pod that's been running longer than the token TTL has gone through at least one kubelet-driven refresh — kubelet's contract guarantees it. So the cluster gives you a free signal: long-running pod plus 401s against the API equals suspect. Pods that simply re-read the file each call don't show up here.

kubectl get pods -A -o json | jq -r '
  .items[] | select(.status.phase == "Running") |
  select((now - (.status.startTime | fromdate)) > 3600) |
  "\(.metadata.namespace)/\(.metadata.name) age=\((now - (.status.startTime | fromdate)) | floor)s"' | \
  sort -k2 -t= -n -r | head -30

When we ran this on our cluster, the head-of-list was a pod that had been Running for 437 days. Pair the long-running list with audit logs filtered for 401s coming from an empty username field (the anonymous fallback) - anything in both lists is suspect. The other tool worth running here is kube-no-trouble (kubent); it originally caught deprecated APIs but recent versions check SA token patterns too.

Layer three: detection at runtime

Falco has a rule pattern that catches authentication failures at the apiserver, but the better place we landed on was the apiserver's audit policy itself. We added a rule that logs ResponseComplete events for the legacy-token-expired annotation, then alerted on a non-zero rate of those:

apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
  verbs: ["*"]
  resources:
  - group: ""
    resources: ["*"]
  omitStages: ["RequestReceived"]
  # Match anonymous fallback caused by expired tokens
  users: ["system:anonymous"]
  userGroups: ["system:unauthenticated"]

Promtail or Vector reads the audit log, the system:anonymous + 401 against /api/v1/.+/secrets pattern is the alert. The alert routes to the team that owns the namespace where the call originated. Hard to attribute (the user is anonymous on the audit side), but the source IP on the audit entry usually maps back to a pod CIDR you can correlate.

Remediation lives in the operator code. For in-cluster operators, swap to client-go's transport.NewCachedFileTokenSource - that's the five-line change. External clients use the TokenRequest API path mentioned earlier. Either way the change is small. Finding which operators need it is the slow part. The bar to fix it is low. The bar to find it is everything. We also wrote a Kyverno (Issue #16) admission rule that flags Pods mounting SA tokens with digest-pinned images for manual review - noisy but the right kind of noisy, surfaces five to ten genuinely old operators per cluster.

The bound-token transition is one of those Kubernetes changes that ages out of memory. The KEP was promoted to default in 2021. Every Kubernetes engineer hired since 2023 has only ever seen the new world. The bugs that remain are in code older than the change, owned by teams that turned over, running in clusters whose upgrade history nobody remembers. Auditing for them is a one-time pass that prevents a Saturday call that nobody on the current team has the context to debug.

What's next

Issue #20 stays on the theme of invisible failure modes that wake the on-call. The topic is image pull, specifically what happens when a pod's image registry quietly goes read-only mid-deploy and the kubelet's pull backoff never converges, while every other dashboard tells you the cluster is fine. Same general shape as this one: the system did exactly what it was designed to do, the assumptions baked in years ago no longer hold, and the alert that catches it is the one nobody thought to write.

- Ilia

Podo Stack

Discussion about this post

Ready for more?