Patch management has a gap problem. The average enterprise takes 60+ days to patch a critical vulnerability after disclosure. Attackers are exploiting many of those same vulnerabilities within days. The WannaCry ransomware attack in 2017 exploited EternalBlue — a vulnerability patched by Microsoft two months earlier. The target organisations hadn’t patched.
The solution isn’t a manual patching schedule — it’s an automated, risk-tiered patching pipeline that applies patches to production faster than attackers can reliably exploit disclosure timelines.
This guide covers the architecture, tools, and operations of a modern patch management programme for organisations running a mix of cloud, on-premises, and containerised infrastructure. Our managed vulnerability management service handles this process end-to-end.
Why Patch Management Fails
Before building a programme, understand the failure modes:
No asset inventory: You can’t patch systems you don’t know about. Shadow IT, cloud sprawl, and acquired companies all contribute to unmanaged assets that never get patched.
Manual processes don’t scale: A team manually patching 500 servers on a monthly schedule can’t keep up with the velocity of CVE disclosures. Monthly patch cycles leave weeks of exposure.
No risk prioritisation: Treating all patches equally means teams are overwhelmed by volume and patch the wrong things first. A non-critical cosmetic update gets the same treatment as a CISA KEV critical.
Fear of breaking production: Patching breaks things sometimes. Without a testing pipeline (dev → staging → production), patching becomes a manual, risky event that people avoid.
No measurement: Without metrics on patch compliance rates, mean time to patch, and outstanding vulnerabilities, there’s no accountability and no way to demonstrate improvement.
The Patching Hierarchy: What Gets Patched First
Not all patches are equal. Tier your response:
Tier 0: Emergency (24-hour response)
- CISA KEV (Known Exploited Vulnerabilities): Being actively exploited in the wild. Patch immediately — any delay is negligible compared to the risk of active exploitation.
- Zero-day in widely deployed, internet-facing technology: Log4Shell-class vulnerabilities.
- Vendor emergency patches for critical, internet-facing services (RCE in your web server, authentication bypass in your VPN)
Process: Out-of-cycle emergency patching, war-room coordination, compensating controls (WAF rules, network blocks) until patches are applied.
Tier 1: Critical (7 days)
- CVSS 9.0+ vulnerabilities in internet-facing or production systems
- Vulnerabilities with public PoC exploit in internet-facing systems
Process: Expedited patching through dev → staging → production on a compressed timeline.
Tier 2: High (30 days)
- CVSS 7.0–8.9, internet-facing or production
- CVSS 9.0+ on internal systems with no public exploit
Process: Normal patch cycle with higher priority queue.
Tier 3: Medium (90 days)
- CVSS 4.0–6.9, production systems
Process: Monthly patch cycles.
Tier 4: Low (next cycle, or accept)
- CVSS < 4.0
- Vulnerabilities with no practical exploitation path
Process: Patch when convenient, or formally accept risk.
Asset Inventory: The Foundation
Discovery tools:
| Tool | Best For |
|---|---|
| Nmap | Network host and service discovery |
| Shodan + Censys | Internet-exposed assets |
| AWS Config, Azure Resource Graph | Cloud asset inventory |
| Microsoft Intune / Jamf | Managed endpoints |
| Kubernetes API | Container workloads |
| Qualys / Tenable | Combined discovery + scanning |
Cloud asset inventory (AWS):
# List all EC2 instances
aws ec2 describe-instances \
--query 'Reservations[].Instances[].[InstanceId,State.Name,Tags[?Key==`Name`].Value|[0],Platform]' \
--output table
# All RDS instances
aws rds describe-db-instances \
--query 'DBInstances[].[DBInstanceIdentifier,EngineVersion,Engine]' \
--output table
# All Lambda functions and their runtimes (often neglected for patching)
aws lambda list-functions \
--query 'Functions[].[FunctionName,Runtime]' \
--output table
Maintain a CMDB (Configuration Management Database) — Servicenow, Jira Assets, or even a well-maintained spreadsheet for smaller organisations. Every asset in your environment should be in it with:
- Owner (team responsible for patching)
- Asset tier (production, staging, dev)
- OS and version
- Last patched date
- Outstanding vulnerabilities
Windows Patching
WSUS / SCCM (On-Premises)
For Windows endpoints and servers, WSUS (Windows Server Update Services) or SCCM/MECM (Microsoft Configuration Manager) remain the standard for on-premises environments:
- WSUS downloads patches from Microsoft and distributes to managed systems
- SCCM adds deployment targeting, compliance reporting, and software distribution
- Deploy patches first to a test ring (non-production), validate for 3–5 days, then deploy to production
WSUS deployment rings:
Ring 1 (Pilot) → 5% of systems (volunteer/IT systems)
Ring 2 (Early) → 15% of systems
Ring 3 (Standard) → 80% of systems
Emergency → All systems (CISA KEV)
Automate approval using WSUS scripting:
# Auto-approve Critical and Security updates older than 7 days
$rule = $wsus.CreateInstallApprovalRule("AutoApprove-Critical")
$cat = $wsus.GetUpdateCategories().GetEnumerator() | Where-Object {$_.Title -eq "Critical Updates"}
$rule.SetCategories($cat)
$rule.Deadline = (Get-Date).AddDays(7)
$rule.SetGroups($wsus.GetComputerTargetGroups() | Where-Object {$_.Name -eq "Production"})
$rule.Save()
Microsoft Intune (Cloud/Hybrid)
For cloud-managed Windows endpoints (Azure AD joined), use Intune Update Rings:
- Configure update rings with deferral periods per group
- Automatic approval of quality updates (security) with shorter deference than feature updates
- Compliance policies report which devices are patched
Linux Patching
On-Premises Linux
Ansible is the most widely adopted tool for Linux patch automation:
# Ansible playbook — patch all Linux systems
- name: Patch all Linux servers
hosts: all_linux
become: yes
tasks:
- name: Update package cache (Debian/Ubuntu)
apt:
update_cache: yes
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Upgrade all packages (Debian/Ubuntu)
apt:
upgrade: dist
autoremove: yes
when: ansible_os_family == "Debian"
register: apt_upgrade_result
- name: Update all packages (RHEL/CentOS/Amazon Linux)
yum:
name: "*"
state: latest
security: yes # Security patches only, use '*' for all
when: ansible_os_family == "RedHat"
register: yum_upgrade_result
- name: Check if reboot is required (Debian)
stat:
path: /var/run/reboot-required
register: reboot_required
- name: Reboot if required
reboot:
msg: "Rebooting for patch application"
connect_timeout: 5
reboot_timeout: 300
when: reboot_required.stat.exists
Schedule with AWX/Ansible Tower for centralised management, role-based targeting, and audit trails.
AWS EC2 Linux Patching
AWS Systems Manager Patch Manager is the native tool for EC2 instances:
# Create a patch baseline (AWS console or CLI)
aws ssm create-patch-baseline \
--name "AmazonLinux2-Security-Baseline" \
--operating-system "AMAZON_LINUX_2" \
--approval-rules '{"PatchRules":[{"PatchFilterGroup":{"PatchFilters":[{"Key":"SEVERITY","Values":["Critical","High"]}]},"ApproveAfterDays":3}]}' \
--description "Auto-approve Critical and High patches after 3 days"
# Associate baseline to instances via maintenance window
aws ssm create-maintenance-window \
--name "Weekly-Patching-Sunday-2AM" \
--schedule "cron(0 2 ? * SUN *)" \
--duration 4 \
--cutoff 1 \
--allow-unassociated-targets false
# Run patch immediately (emergency)
aws ssm send-command \
--document-name "AWS-RunPatchBaseline" \
--targets "Key=tag:Environment,Values=production" \
--parameters "Operation=Install" \
--timeout-seconds 600
SSM Patch Manager integrates with AWS Security Hub to surface compliance status and with CloudWatch for patching operation logging.
Container and Kubernetes Patching
Containers require a different patching model — you patch the image, not the running container:
Image Scanning and Rebuild Pipeline
# GitHub Actions — automated image rebuild on CVE detection
name: Weekly Image Rebuild
on:
schedule:
- cron: '0 2 * * 1' # Every Monday at 2 AM
jobs:
scan-and-rebuild:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build image
run: docker build -t myapp:${{ github.sha }} .
- name: Scan with Trivy
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp:${{ github.sha }}
severity: CRITICAL,HIGH
exit-code: 0 # Don't fail — report findings
output: trivy-results.json
- name: Update base image if CVEs found
run: |
# Extract CVE count from Trivy results
CVE_COUNT=$(cat trivy-results.json | jq '[.Results[].Vulnerabilities // [] | .[] | select(.Severity == "CRITICAL")] | length')
if [ "$CVE_COUNT" -gt "0" ]; then
echo "Found $CVE_COUNT critical CVEs — triggering rebuild with latest base image"
# Update FROM line to latest base image
sed -i 's|FROM node:20-alpine.*|FROM node:20-alpine|' Dockerfile
docker build -t myapp:patched .
docker push myapp:patched
# Trigger deployment update
fi
Kubernetes Node Patching
For managed Kubernetes (EKS, AKS, GKE):
EKS:
# Update node group AMI (rolling update — cordons old nodes, drains, replaces)
aws eks update-nodegroup-version \
--cluster-name production \
--nodegroup-name standard-workers \
--release-version "1.31.x-20260115"
# Check update status
aws eks describe-update \
--name <update-id> \
--cluster-name production
Enable EKS auto mode or managed node group auto-updates to automate node AMI updates on a defined schedule.
For self-managed Kubernetes, use kured (KUbernetes REboot Daemon) to automatically drain and reboot nodes requiring kernel updates:
# kured DaemonSet — auto-reboots nodes when /var/run/reboot-required exists
helm upgrade --install kured weaveworks/kured \
--namespace kube-system \
--set configuration.rebootSentinel=/var/run/reboot-required \
--set configuration.rebootSchedule="0 2 * * *" # 2 AM nightly
Third-Party Application Patching
OS patches are straightforward. Third-party applications are harder:
Browsers (critical attack surface):
- Deploy Chrome/Edge/Firefox updates via group policy or MDM
- Target: within 24 hours of browser security patch release
Java, Python, Node.js runtimes:
- Container images: update base image, rebuild
- On-prem: Ansible playbook targeting runtime versions
- Lambda: update runtime version in function configuration
Database engines (PostgreSQL, MySQL, SQL Server):
- Cloud managed (RDS): apply minor version updates automatically, major version upgrades as project
- On-prem: database change management process with tested rollback
Network appliances (firewalls, routers, VPN concentrators):
- Often neglected — but FireEye’s 2024 research showed 60%+ of enterprise firewall appliances running OS versions > 2 years old
- Monthly review of vendor security bulletins
- Dedicated patching schedule with maintenance windows
Metrics and Reporting
Track patching effectiveness with these metrics:
| Metric | Target | Measure |
|---|---|---|
| Critical patch SLA compliance | 95% within 24 hours | % CISA KEV patched within SLA |
| High patch SLA compliance | 90% within 30 days | % High CVEs patched within SLA |
| Mean time to patch (MTTP) | < 7 days for Critical | Average days from disclosure to patch applied |
| Patch coverage | 95% of known assets | % of inventory with active patching |
| Vulnerability backlog | Trending down QoQ | Open vulnerabilities by severity |
| Unpatched CISA KEV | Zero | Any CISA KEV older than 7 days = incident |
Report monthly to the security team, quarterly to leadership with trend data.
Patch Testing and Rollback
Pre-production testing:
- Every patch should be tested in dev/staging before production (Tier 1+ patches at minimum)
- Automated regression tests should run after patch application in staging
- Canary deployment for high-risk patches: deploy to 5% of production, monitor for 24 hours, then full rollout
Rollback plan:
- Snapshot/backup before patching production systems
- Documented rollback procedure for each patch type
- Rollback decision criteria defined in advance (what error rate or incident triggers rollback?)
Compensating Controls When Patching Isn’t Immediate
When you can’t patch immediately (legacy systems, change management requirements, business-critical windows), apply compensating controls:
- WAF rules targeting the vulnerability’s attack vector
- Network segmentation — isolate the vulnerable system from the internet or from internal systems
- Disable the vulnerable feature or service if not essential
- Enhanced monitoring — SIEM rules targeting exploitation attempts of the specific CVE
- Access restriction — require VPN + MFA to reach the vulnerable system
- Document the accepted risk — formal risk acceptance with CISO sign-off, time-limited
Compensating controls are temporary. They reduce exposure but don’t eliminate the vulnerability. Patch as soon as the window allows.
CyberneticsPlus designs and implements patch management programmes for enterprises across cloud, on-premises, and hybrid environments. We also conduct vulnerability assessments to identify your highest-priority patching gaps. Contact us to build your patch management strategy.