Validate env at module load with Zod, and throw
A misconfigured app should not boot. Here is the 25-line zod validator I copy into every project, why throwing at module load is the right call, and the bug that taught me.
the bug that taught me to throw
A few years ago I shipped a Node service that read process.env.STRIPE_SECRET_KEY lazily, the first time the app needed to talk to Stripe. The env var was misconfigured in production. The app booted, served /health checks for nine minutes, then took its first checkout request and crashed.
Nine minutes of green health checks while the app was fundamentally broken. The post-mortem was instructive: nobody had thought to test env validation, because nothing told us env validation was a thing.
This post is about why your env validator should throw at module load, why zod is the right tool, and the file pattern I now copy into every project.
the principle
A misconfigured app should not boot. That's it. That's the whole principle.
If STRIPE_SECRET_KEY is missing, the process should crash at boot, log a clear error, and let the orchestrator (Docker, k8s, Vercel) report failure. The bad case where the app boots, serves a few requests, and then crashes deep in user code is the worst possible outcome — health checks pass, monitoring is happy, the bug is invisible until a customer hits it.
This means env validation must be:
- Synchronous. At module load. Not on first use.
- Loud. Throws a real error, not a logged warning.
- Specific. Says exactly which var is missing or invalid.
- Type-safe. The validated env is a typed object the rest of the app reads from.
Zod is a perfect fit.
the file
In every project I touch, this file lives at src/lib/env.ts:
import { z } from 'zod';
const schema = z.object({
// Public — exposed to the client. Prefix with NEXT_PUBLIC_ for Next.js.
NEXT_PUBLIC_APP_URL: z.string().url(),
// Server-only — must never leak to the client.
DATABASE_URL: z.string().url(),
REDIS_URL: z.string().url(),
STRIPE_SECRET_KEY: z.string().startsWith('sk_'),
// Optional with default.
LOG_LEVEL: z.enum(['debug', 'info', 'warn', 'error']).default('info'),
PORT: z.coerce.number().int().min(1).max(65535).default(3000),
// Feature flags.
ENABLE_NEW_CHECKOUT: z.coerce.boolean().default(false),
});
const parsed = schema.safeParse(process.env);
if (!parsed.success) {
console.error('❌ Invalid environment variables:');
console.error(parsed.error.flatten().fieldErrors);
throw new Error('Invalid environment variables. See errors above.');
}
export const env = parsed.data;That's the entire validator. About 25 lines. Every app I've shipped in the last two years has a version of this.
why safeParse and not parse
schema.parse(process.env) throws on validation failure. That would seem fine for our "throw at module load" principle, but the error message is generic — a Zod error object that the average ops engineer can't read at 3am.
safeParse returns a { success: false, error: ZodError } object that we can flatten into per-field errors and print clearly. The thrown error is then ours, with a clean message above it.
The output for a bad env looks like:
❌ Invalid environment variables:
{
STRIPE_SECRET_KEY: [ 'String must start with "sk_"' ],
REDIS_URL: [ 'Invalid url' ]
}
Error: Invalid environment variables. See errors above.That tells whoever's debugging exactly what to fix.
the import pattern
Every place that needs an env var imports from @/lib/env, never from process.env directly:
// ❌ wrong
const url = process.env.DATABASE_URL;
// ✅ right
import { env } from '@/lib/env';
const url = env.DATABASE_URL;Two reasons:
Type safety. env.DATABASE_URL is typed as string (because the schema validated it), so TypeScript knows it isn't undefined. process.env.DATABASE_URL is string | undefined, and forces a check at every use site.
Module-load enforcement. Every import { env } triggers the validator. There's no path where a module reads an env var that wasn't validated.
I add a Biome rule to ban direct process.env access in source code (no-process-env analog), with the validator file as the only exception.
the gotchas
A few I've hit:
Next.js client-side bundling. Vars used on the client must be prefixed NEXT_PUBLIC_. The schema should mirror this — NEXT_PUBLIC_APP_URL rather than APP_URL. Next.js inlines these at build time, and the client bundle gets the literal value.
Server-only secrets that leak. If you accidentally import env.STRIPE_SECRET_KEY into a client component, Next.js will throw a build error in production but only a warning in dev. Be vigilant about which files are 'use client'. The schema can mark server-only vars and a small lint rule can flag client-side imports of them.
Edge runtime restrictions. Edge runtime (proxy.ts, edge route handlers) reads env at build time, not runtime. The validator runs in Node during the build, so this works for static config — but it means changing an env var requires a redeploy, not a restart. Some teams expect "edit env, restart pod" semantics that the Edge runtime doesn't provide.
Coercion vs. parsing. z.coerce.boolean() is permissive — it'll convert "true", "1", "yes" to true. z.boolean() is strict and rejects strings entirely. For env vars (which are always strings) you want coerce. For request body validation, you usually want strict.
the boot test
Add a test that imports env with a known-bad environment and asserts it throws:
import { describe, it, expect } from 'vitest';
describe('env validation', () => {
it('throws on missing required vars', () => {
const original = process.env.DATABASE_URL;
delete process.env.DATABASE_URL;
// Force re-import to trigger module-level validation
expect(() => {
vi.resetModules();
require('@/lib/env');
}).toThrow('Invalid environment variables');
process.env.DATABASE_URL = original;
});
});This is the test that would have caught the Stripe bug at the start of this post. It's worth one or two ugly tests like this in the suite — they're the only thing that ensures the validator stays load-bearing as the schema evolves.
the operational story
Once the validator is in place, deployment becomes simpler:
- Deploy the new code.
- Pod boots, validator runs at load.
- If env is wrong → process exits → orchestrator marks deploy as failed → previous version stays serving.
- If env is right → process serves traffic.
There's no path where a misconfigured pod sneaks into the live pool. The orchestrator's existing failure-handling does the work.
Compare to the lazy-validate world: the pod boots, passes /health, joins the live pool, then crashes when a real request hits a misconfigured code path. The orchestrator marks the pod unhealthy, but only after it has served real users a 500.
The first failure mode is a deploy that didn't happen. The second is an outage. They are not the same thing.
the meta-point
Most "loud failures vs. silent degradation" arguments come down to this: do you want bugs to be small and visible, or large and hidden?
Throwing at module load makes env bugs small and visible. They are pre-deploy or first-second-of-pod-life issues. They never become production outages because they never reach production.
Twenty-five lines of zod for the trade. It's the highest-leverage validator in any backend.
import { z } from 'zod';
const schema = z.object({ /* … */ });
const parsed = schema.safeParse(process.env);
if (!parsed.success) throw new Error('Invalid env');
export const env = parsed.data;Copy it, adapt the schema, never read process.env again.