Resolving the Mystery of Zombie Node Services in KubernetesResolving the Mystery of Zombie Node Services in Kubernetes

Oct 24, 2023

•

4 min read

On this page

Mystery Unfolds
Intertwined Clues and Haphazard Diagnosis
Original Dockerfile
Decoding Deprecated Flags
Grand Solution
Modified Dockerfile
Reflective Conclusion

Mysteriously, services were spontaneously restarting, while others, confoundingly, remained running but were dead to any incoming requests. Let's unravel this enigma, from its discovery to its resolution, with the hope that our experiences can assist others in a similar bind.

Mystery Unfolds

Our Node services began displaying some eyebrow-raising behaviour:

A sporadic yet notable number of services started restarting without discernible cause.
Certain others, while ostensibly operational and listening on their assigned ports, were starkly unresponsive to any requests.

Such erratic conduct directly jeopardized our system's stability and efficiency.

Intertwined Clues and Haphazard Diagnosis

Unraveling this situation wasn't straightforward. Even as we delved deep into logs, configurations, and Kubernetes event streams, the solution remained elusive. It was during this quest that a coworker, somewhat serendipitously, uncovered the anomaly of the Node application. Despite its unresponsiveness, it continued to actively listen to its designated port.

File permissions seemed to be at the heart of this issue. Piecing together disparate clues embedded in documentation and combining it with our observations, the trail led us to the npm update notifier. Its role? To periodically verify new npm versions and notify users of any updates. Yet, herein lay the twist: to maintain a record of its last npm update status, the notifier attempted to write to a file. But given our Docker setup, where strict write permissions were enforced and images ran as non-root, this operation invariably failed.

Instead of gracefully failing or throwing a conspicuous error, the application did something rather unexpected: it clung onto its port, listening but coldly ignoring any incoming requests. This behaviour mislead Kubernetes into presuming the pod was operational in many instances.

To validate our growing suspicions, we tinkered with the notifier's status file. By artificially adjusting the file's creation date, a pattern emerged: the system faltered whenever this status file aged past seven days.

Original Dockerfile

Dockerfile

dockerfile
FROM node:16-alpine AS base
 
ARG NPM_TOKEN
ENV HOME=/home/node
ENV NO_UPDATE_NOTIFIER=true
ENV NPM_CONFIG_PREFIX=$HOME/.npm-global
ENV NODE_PATH=$NPM_CONFIG_PREFIX/lib/node_modules
 
WORKDIR $HOME/app
 
RUN adduser node root
RUN chgrp -R 0 $HOME/app && chmod -R g=u $HOME/app
 
COPY package.json package-lock.json $HOME/app/
 
FROM base AS dependencies
RUN apk add --update --no-cache \
  g++ make python3 \
  nss
USER node
RUN touch $HOME/app/.npmrc && \
  mkdir $HOME/.npm-global $HOME/app/node_modules
RUN npm -v
RUN npm set progress=false && npm config set depth 0
RUN npm ci
 
FROM base AS release
USER node
RUN mkdir $HOME/.npm-global && \
  mkdir -p $HOME/.npm && \
  chmod -R g+rwx $HOME/.npm && \
  chown -R node:root $HOME/.npm $HOME/.npm-global
 
COPY --from=dependencies $HOME/app/node_modules ./node_modules
COPY . .
 
EXPOSE 4000
CMD npm run start

Decoding Deprecated Flags

With our problem cornered, the solution seemed imminent. In past iterations, we'd quelled the update notifier using the NO_UPDATE_NOTIFIER flag. Yet, npm 7 threw a wrench in our plans by retiring this flag, a change subtly tucked away in the release notes.

Grand Solution

Our diligence led us to a lifeline: NPM_CONFIG_UPDATE_NOTIFIER. Setting this beacon to false disarmed the update notifier and its contentious disk write attempts.

Recognizing the scale of our challenge:

We armed our teams with guidance and context, catering to their varied Docker proficiency, to fix the issue in their services
We kept track of the progress of adoption of this new flag across our services

Modified Dockerfile

Dockerfile

dockerfile
FROM node:20 AS base
 
ARG NPM_TOKEN
ENV HOME=/home/node
ENV NO_UPDATE_NOTIFIER=true
ENV NPM_CONFIG_UPDATE_NOTIFIER=false
ENV NPM_CONFIG_PREFIX=$HOME/.npm-global
ENV NPM_CONFIG_SCRIPT_SHELL=/bin/bash
ENV NPM_CONFIG_DEPTH=0
ENV NODE_PATH=$NPM_CONFIG_PREFIX/lib/node_modules
 
WORKDIR $HOME/app
 
RUN adduser node root
RUN chgrp -R 0 $HOME/app && chmod -R g=u $HOME/app
 
COPY package.json package-lock.json $HOME/app/
 
FROM base AS dependencies
USER node
RUN npm set progress=false
RUN npm ci
 
FROM base AS release
USER node
COPY --from=dependencies $HOME/app/node_modules ./node_modules
COPY --chown=node:root . .
 
EXPOSE 4000
CMD ["node", "src/index.js"]

Reflective Conclusion

Our takeaways?

Even subtle changes in third-party utilities can spiral into monumental challenges.
Release notes, no matter how mundane, deserve meticulous scrutiny.
Ensuring robust security, as we did with our Docker configurations, can sometimes spotlight lurking issues that might otherwise remain camouflaged.

To our peers navigating the vast seas of Node, Docker, and Kubernetes, remember the little savior: NPM_CONFIG_UPDATE_NOTIFIER. It may be diminutive, but it’s powerful enough to ward off an avalanche of issues!

Here's to fewer mysteries and more seamless coding!

Kubernetes
Node.js
Docker