Replace Your Monitoring Tools with PowerShell
Task Manager is fine for a live CPU and memory snapshot. What it can’t tell you: why the machine rebooted at 3am, which SSD is quietly wearing out, what auto-start service died last week, or what’s connecting to an unfamiliar IP right now. PowerShell answers all of that — built in, free, scriptable, and more precise than any GUI. No third-party tools, no dashboard.
These are diagnostics I run in production, grouped by what they actually solve, so you can go from “something feels wrong” to a root cause in minutes. The live-performance snippets come last on purpose — Task Manager covers that ground well, so the things it can’t do are front-loaded.
[i] Requires: Standard local admin. No extra modules, no installs.
Crash and Stability Analysis
System errors in the last 24 hours
Get-WinEvent -FilterHashtable @{
LogName = 'System'
Level = 1,2
StartTime = (Get-Date).AddHours(-24)
} | Select-Object TimeCreated, Id, LevelDisplayName, Message | Format-List
Queries the System log for Critical (Level 1) and Error (Level 2) events from the past 24 hours, with full message text.
The System log is where Windows first records driver failures, hardware errors, and OS-level faults. Filtering to Level 1 and 2 cuts the noise hard — you read only what broke, not a wall of Informational events. The 24-hour window is tight enough to stay relevant but wide enough to catch intermittent issues.
Unexpected shutdowns and reboots
$os = Get-CimInstance Win32_OperatingSystem
[PSCustomObject]@{ LastBoot = $os.LastBootUpTime; Uptime = (Get-Date) - $os.LastBootUpTime }
# 41 = dirty/kernel-power, 6008 = unexpected, 1074 = who initiated a restart
Get-WinEvent -FilterHashtable @{LogName='System'; Id=41,6008,1074; StartTime=(Get-Date).AddDays(-7)} -EA SilentlyContinue |
Select-Object TimeCreated, Id, Message | Format-List
Two things in one. First, when the machine last booted and how long it’s been up. Second, the past 7 days for three event IDs:
- 41 — Kernel-Power: the system did not shut down cleanly (BSOD or hard power cut)
- 6008 — the previous shutdown was unexpected
- 1074 — a process or user initiated a restart/shutdown
If a machine rebooted overnight, was it Windows Update (1074, source = Windows Update), a kernel panic (41), or a power failure (6008)? These three IDs answer that definitively, with timestamps.
Application crashes and hangs
# 1000 = App Error, 1002 = App Hang, 1001 = WER/BugCheck
Get-WinEvent -FilterHashtable @{LogName='Application'; Id=1000,1002,1001; StartTime=(Get-Date).AddDays(-7)} -EA SilentlyContinue |
Select-Object TimeCreated, Id, ProviderName, Message | Format-List
Queries the Application log for the past week, targeting three IDs:
- 1000 — Application Error (crash with exception)
- 1002 — Application Hang (process stopped responding)
- 1001 — Windows Error Reporting / BugCheck (BSOD summary)
Users say “Excel just closes” or “the screen goes blue sometimes.” These events are the machine’s own record of exactly that. ID 1001 even captures the stop code and faulting module for BSODs, which saves you decoding a minidump by hand.
Storage Health
Disk health and wear
# Full report for all physical disks
Get-PhysicalDisk | Get-StorageReliabilityCounter | Select-Object DeviceId, Temperature, Wear, ReadErrorsTotal, PowerOnHours
# Flag disks over 80% wear
Get-PhysicalDisk | Get-StorageReliabilityCounter | Where-Object {$_.Wear -gt 80} | Select-Object DeviceId, Wear
Pulls SMART-equivalent reliability data from every physical disk. Wear matters most for SSDs — it’s the percentage of write endurance consumed (0 = new, 100 = fully worn). The second line flags any SSD over 80%.
Hard drives fail without warning; SSDs wear out predictably. PowerOnHours tells you how long a drive has run (40,000+ hours is statistically riskier). ReadErrorsTotal creeping up on a spinning disk is an early sign of surface degradation. Running this monthly beats any GUI SMART tool.
Service Health
Auto-start services that aren’t running
Get-CimInstance Win32_Service -Filter "StartMode='Auto' AND State!='Running'" |
Select-Object Name, DisplayName, State, StartMode
Lists every service set to start automatically that currently isn’t, with display name and state.
An Auto service that isn’t running was either stopped by an error, crashed silently, or disabled by something that shouldn’t have touched it. This surfaces all of them at once — no clicking through services.msc row by row. It’s especially useful after a Windows Update, where service states can shift unexpectedly.
Network Monitoring
Active connections with process ownership
Get-NetTCPConnection -State Established |
Select-Object LocalAddress, LocalPort, RemoteAddress, RemotePort,
@{N='Process';E={(Get-Process -Id $_.OwningProcess -EA SilentlyContinue).Name}} |
Sort-Object RemoteAddress | Format-Table -AutoSize
Lists every established TCP connection, adds the owning process name to each row, and sorts by remote address so connections to the same destination group together.
This is netstat -b, but faster and easier to read. If something is phoning home — a process connecting to an unfamiliar IP — it surfaces immediately. It’s also handy for app connectivity issues: you can confirm whether a service actually established its connection or is stuck somewhere else.
Live Performance Monitoring
Task Manager handles most of this fine. Reach for these when you need a trend over time, a scriptable snapshot, or output you can drop into a report.
CPU and memory at a glance
Get-Counter '\Processor(_Total)\% Processor Time','\Memory\Available MBytes' -SampleInterval 2 -MaxSamples 30 |
ForEach-Object {
[PSCustomObject]@{
Time = $_.Timestamp
CPUPercent = [math]::Round(($_.CounterSamples | Where-Object Path -like '*processor*').CookedValue, 1)
FreeMB = [int](($_.CounterSamples | Where-Object Path -like '*available*').CookedValue)
}
} | Format-Table Time, CPUPercent, FreeMB -AutoSize:$false
Samples CPU utilisation and free RAM every 2 seconds for 30 readings (1 minute), then prints a clean timestamped table.
Task Manager shows you right now. This shows you a trend. If CPU spikes every 10 seconds like clockwork, that’s a scheduled task or a polling loop, not a runaway process. If free memory steadily declines and never recovers, you have a leak. The timestamps make it trivial to correlate against other logs.
Top 10 processes by RAM, refreshing live
while ($true) {
Clear-Host
Get-Process |
Sort-Object WorkingSet64 -Descending |
Select-Object -First 10 Name, Id, @{Name='RAM_MB';Expression={[math]::Round($_.WorkingSet64/1MB,1)}} |
Format-Table -AutoSize
Start-Sleep -Seconds 1
}
Every second, clears the screen and reprints the top 10 RAM consumers — a poor man’s htop for Windows.
When memory pressure is high, you need to know which process is eating it and whether that’s growing. This loop makes the answer obvious within seconds. Press Ctrl+C when done.
Per-process CPU breakdown
Get-Counter '\Process(*)\% Processor Time' -SampleInterval 1 -MaxSamples 2 |
Select-Object -Last 1 -ExpandProperty CounterSamples |
Where-Object { $_.InstanceName -notin '_total','idle' } |
Sort-Object CookedValue -Descending | Select-Object -First 10 InstanceName,
@{N='CPU%';E={[math]::Round($_.CookedValue/$env:NUMBER_OF_PROCESSORS,1)}}
Samples CPU time per process across two intervals to get a proper delta, drops the _total and idle pseudoprocesses, normalises the value per logical core, and shows the top 10 offenders.
Total CPU% tells you there’s a problem. Per-process CPU tells you who’s responsible. The normalisation step matters — without it, a process on a 16-core machine could report 1600% and mean nothing. This gives you a clean 0–100% figure per process.
Long-Term Logging and On-Demand Auditing
Everything above is a point-in-time check. The bigger wins come from logging continuously and auditing later — the 3am reboot, the leak that takes two days to surface, the CPU spike that only happens during the nightly backup. Log to CSV now, answer the question whenever it comes up.
Log performance to CSV, indefinitely
$LogPath = 'C:\Logs\perf-log.csv'
$Counters = '\Processor(_Total)\% Processor Time',
'\Memory\Available MBytes',
'\LogicalDisk(_Total)\% Free Space'
# Sample every 60s and append one timestamped row per sample — runs until you stop it
Get-Counter -Counter $Counters -SampleInterval 60 -Continuous |
ForEach-Object {
[PSCustomObject]@{
Time = $_.Timestamp
CPU = [math]::Round(($_.CounterSamples | Where-Object Path -like '*processor*').CookedValue, 1)
FreeMB = [int](($_.CounterSamples | Where-Object Path -like '*available*').CookedValue)
DiskFree = [math]::Round(($_.CounterSamples | Where-Object Path -like '*free space*').CookedValue, 1)
} | Export-Csv -Path $LogPath -Append -NoTypeInformation
}
Samples CPU, free memory, and free disk every 60 seconds and appends one timestamped row to a CSV — until you press Ctrl+C. One row a minute is ~1,440 rows a day; a month is still a small file.
To make it survive reboots and run unattended, register it as a scheduled task set to At startup running as SYSTEM, or kick it off in the background with Start-Job.
[!] WARNING: Make sure the log folder exists and has free space. Left running for months, archive or rotate the CSV so it doesn’t grow unbounded.
Audit the log when something happened
$data = Import-Csv 'C:\Logs\perf-log.csv'
# Overall CPU summary across the whole period
$data | Measure-Object CPU -Average -Maximum -Minimum | Format-List Average, Maximum, Minimum
# Every window where free memory dropped below 1 GB
$data | Where-Object { [int]$_.FreeMB -lt 1024 } | Select-Object Time, FreeMB, CPU
# Zoom into a specific incident — e.g. the 30 minutes around a 3am reboot
$data | Where-Object {
[datetime]$_.Time -gt '2026-06-09 02:45' -and
[datetime]$_.Time -lt '2026-06-09 03:15'
} | Format-Table -AutoSize
Import the CSV and ask it questions. Get the average and peak CPU over the whole period, list every moment memory ran low, or zoom into the minutes around an incident to see what the machine was doing when it fell over. Because it’s just objects, you filter and sort it like any other PowerShell data.
Putting It Together
These snippets cover the full diagnostic loop:
| Category | Snippet | First question it answers |
|---|---|---|
| Stability | System error events | What broke in the last 24 hours? |
| Stability | Shutdown/reboot events | Why did the machine restart? |
| Stability | App crashes & hangs | Which apps are crashing? |
| Storage | Disk reliability counter | Is a disk about to fail? |
| Services | Stopped auto-start services | What service silently died? |
| Network | Active TCP connections | What is connecting to what? |
| Performance | CPU/Memory counter | Is the machine under load over time? |
| Performance | Top RAM processes | Which process is eating memory? |
| Performance | Per-process CPU | Which process is burning CPU? |
| Logging | Continuous CSV logger | What’s the trend over hours or days? |
| Logging | CSV audit queries | What was happening at 3am? |
None of these need privileges beyond a standard admin. None need an install. And because they’re all scriptable, you can fold them into one diagnostic report, run them on boot, or wrap them in a function that emails you when something looks wrong.
The monitoring software was never the point. The data was.
References
- Get-Counter — Microsoft Learn
- Get-WinEvent — Microsoft Learn
- Get-StorageReliabilityCounter — Microsoft Learn
- Inspired by I stopped using monitoring software once I learned these PowerShell diagnostics (MakeUseOf)
🤝 Connect with Me
Found this useful? I write about PowerShell, Windows infrastructure, and enterprise automation.
