All posts

Resolving the Mystery of Zombie Node Services in Kubernetes

4 min read

Mysteriously, services were spontaneously restarting, while others, confoundingly, remained running but were dead to any incoming requests. Let's unravel this enigma, from its discovery to its resolution, with the hope that our experiences can assist others in a similar bind.

Mystery Unfolds

Our Node services began displaying some eyebrow-raising behaviour:

Such erratic conduct directly jeopardized our system's stability and efficiency.

Intertwined Clues and Haphazard Diagnosis

Unraveling this situation wasn't straightforward. Even as we delved deep into logs, configurations, and Kubernetes event streams, the solution remained elusive. It was during this quest that a coworker, somewhat serendipitously, uncovered the anomaly of the Node application. Despite its unresponsiveness, it continued to actively listen to its designated port.

File permissions seemed to be at the heart of this issue. Piecing together disparate clues embedded in documentation and combining it with our observations, the trail led us to the npm update notifier. Its role? To periodically verify new npm versions and notify users of any updates. Yet, herein lay the twist: to maintain a record of its last npm update status, the notifier attempted to write to a file. But given our Docker setup, where strict write permissions were enforced and images ran as non-root, this operation invariably failed.

Instead of gracefully failing or throwing a conspicuous error, the application did something rather unexpected: it clung onto its port, listening but coldly ignoring any incoming requests. This behaviour mislead Kubernetes into presuming the pod was operational in many instances.

To validate our growing suspicions, we tinkered with the notifier's status file. By artificially adjusting the file's creation date, a pattern emerged: the system faltered whenever this status file aged past seven days.

Original Dockerfile

Dockerfile
dockerfile
FROM node:16-alpine AS base ARG NPM_TOKEN ENV HOME=/home/node ENV NO_UPDATE_NOTIFIER=true ENV NPM_CONFIG_PREFIX=$HOME/.npm-global ENV NODE_PATH=$NPM_CONFIG_PREFIX/lib/node_modules WORKDIR $HOME/app RUN adduser node root RUN chgrp -R 0 $HOME/app && chmod -R g=u $HOME/app COPY package.json package-lock.json $HOME/app/ FROM base AS dependencies RUN apk add --update --no-cache \ g++ make python3 \ nss USER node RUN touch $HOME/app/.npmrc && \ mkdir $HOME/.npm-global $HOME/app/node_modules RUN npm -v RUN npm set progress=false && npm config set depth 0 RUN npm ci FROM base AS release USER node RUN mkdir $HOME/.npm-global && \ mkdir -p $HOME/.npm && \ chmod -R g+rwx $HOME/.npm && \ chown -R node:root $HOME/.npm $HOME/.npm-global COPY --from=dependencies $HOME/app/node_modules ./node_modules COPY . . EXPOSE 4000 CMD npm run start

Decoding Deprecated Flags

With our problem cornered, the solution seemed imminent. In past iterations, we'd quelled the update notifier using the NO_UPDATE_NOTIFIER flag. Yet, npm 7 threw a wrench in our plans by retiring this flag, a change subtly tucked away in the release notes.

Grand Solution

Our diligence led us to a lifeline: NPM_CONFIG_UPDATE_NOTIFIER. Setting this beacon to false disarmed the update notifier and its contentious disk write attempts.

Recognizing the scale of our challenge:

Modified Dockerfile

Dockerfile
dockerfile
FROM node:20 AS base ARG NPM_TOKEN ENV HOME=/home/node ENV NO_UPDATE_NOTIFIER=true ENV NPM_CONFIG_UPDATE_NOTIFIER=false ENV NPM_CONFIG_PREFIX=$HOME/.npm-global ENV NPM_CONFIG_SCRIPT_SHELL=/bin/bash ENV NPM_CONFIG_DEPTH=0 ENV NODE_PATH=$NPM_CONFIG_PREFIX/lib/node_modules WORKDIR $HOME/app RUN adduser node root RUN chgrp -R 0 $HOME/app && chmod -R g=u $HOME/app COPY package.json package-lock.json $HOME/app/ FROM base AS dependencies USER node RUN npm set progress=false RUN npm ci FROM base AS release USER node COPY --from=dependencies $HOME/app/node_modules ./node_modules COPY --chown=node:root . . EXPOSE 4000 CMD ["node", "src/index.js"]

Reflective Conclusion

Our takeaways?

To our peers navigating the vast seas of Node, Docker, and Kubernetes, remember the little savior: NPM_CONFIG_UPDATE_NOTIFIER. It may be diminutive, but it’s powerful enough to ward off an avalanche of issues!

Here's to fewer mysteries and more seamless coding!


Foxy seeing you here! Let's chat!
Logo