进程监控
进程监控的基本概念
进程监控是Node.js应用开发中不可或缺的一环,它帮助开发者实时掌握应用运行状态,及时发现和处理异常。Node.js提供了多种内置模块和第三方工具来实现进程监控,包括性能指标收集、内存泄漏检测、CPU使用率监控等功能。
进程监控的核心指标
CPU使用率
CPU使用率是最直接的性能指标之一,Node.js的os
模块可以获取系统CPU信息:
const os = require('os');
function getCPUUsage() {
const cpus = os.cpus();
const totalIdle = cpus.reduce((acc, cpu) => acc + cpu.times.idle, 0);
const totalTick = cpus.reduce((acc, cpu) =>
acc + Object.values(cpu.times).reduce((a, b) => a + b), 0);
return 100 - (100 * totalIdle / totalTick);
}
setInterval(() => {
console.log(`CPU Usage: ${getCPUUsage().toFixed(2)}%`);
}, 1000);
内存监控
Node.js进程内存使用情况可以通过process.memoryUsage()
获取:
setInterval(() => {
const memory = process.memoryUsage();
console.log(`
RSS: ${(memory.rss / 1024 / 1024).toFixed(2)} MB
Heap Total: ${(memory.heapTotal / 1024 / 1024).toFixed(2)} MB
Heap Used: ${(memory.heapUsed / 1024 / 1024).toFixed(2)} MB
External: ${(memory.external / 1024 / 1024).toFixed(2)} MB
`);
}, 5000);
事件循环监控
事件循环延迟
事件循环延迟是Node.js性能的关键指标,可以使用perf_hooks
模块测量:
const { performance, monitorEventLoopDelay } = require('perf_hooks');
const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();
setInterval(() => {
console.log(`Event Loop Delay (ms):
Min: ${h.min / 1e6}
Max: ${h.max / 1e6}
Mean: ${h.mean / 1e6}
Stddev: ${h.stddev / 1e6}
`);
h.reset();
}, 10000);
进程异常监控
未捕获异常处理
全局异常捕获是进程监控的重要部分:
process.on('uncaughtException', (err) => {
console.error('Uncaught Exception:', err.stack);
// 执行必要的清理工作
process.exit(1);
});
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled Rejection at:', promise, 'reason:', reason);
});
高级监控方案
使用PM2进行进程管理
PM2提供了强大的进程监控功能:
# 安装PM2
npm install pm2 -g
# 启动应用并监控
pm2 start app.js --name "my-app" --watch
# 查看监控面板
pm2 monit
自定义健康检查端点
在Express应用中添加健康检查路由:
const express = require('express');
const app = express();
app.get('/health', (req, res) => {
const memory = process.memoryUsage();
const uptime = process.uptime();
res.json({
status: 'healthy',
uptime: `${uptime.toFixed(2)} seconds`,
memory: {
rss: `${(memory.rss / 1024 / 1024).toFixed(2)} MB`,
heapTotal: `${(memory.heapTotal / 1024 / 1024).toFixed(2)} MB`,
heapUsed: `${(memory.heapUsed / 1024 / 1024).toFixed(2)} MB`
},
eventLoopDelay: `${h.mean / 1e6} ms`
});
});
app.listen(3000);
日志与告警集成
Winston日志集成
结合Winston实现结构化日志:
const winston = require('winston');
const { combine, timestamp, json } = winston.format;
const logger = winston.createLogger({
level: 'info',
format: combine(timestamp(), json()),
transports: [
new winston.transports.File({ filename: 'process-monitor.log' })
]
});
// 记录内存使用情况
setInterval(() => {
const memory = process.memoryUsage();
logger.info('Memory usage', {
rss: memory.rss,
heapTotal: memory.heapTotal,
heapUsed: memory.heapUsed
});
}, 60000);
告警阈值设置
实现简单的告警机制:
const ALERT_THRESHOLD = 80; // 80% CPU使用率
setInterval(() => {
const cpuUsage = getCPUUsage();
if (cpuUsage > ALERT_THRESHOLD) {
logger.error('High CPU usage alert', {
cpuUsage,
threshold: ALERT_THRESHOLD
});
// 可以集成邮件/短信通知
}
}, 5000);
分布式环境下的进程监控
使用OpenTelemetry
分布式追踪是现代监控的重要组成部分:
const { NodeTracerProvider } = require('@opentelemetry/node');
const { SimpleSpanProcessor } = require('@opentelemetry/tracing');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const provider = new NodeTracerProvider();
provider.register();
const exporter = new JaegerExporter({
serviceName: 'node-process-monitor'
});
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
性能优化实践
内存泄漏检测
使用heapdump
模块捕获堆快照:
const heapdump = require('heapdump');
// 当内存超过1GB时生成堆快照
setInterval(() => {
if (process.memoryUsage().rss > 1024 * 1024 * 1024) {
const filename = `heapdump-${Date.now()}.heapsnapshot`;
heapdump.writeSnapshot(filename, (err) => {
if (err) console.error(err);
else console.log(`Heap snapshot written to ${filename}`);
});
}
}, 60000);
事件循环阻塞检测
检测长时间运行的同步代码:
const { performance } = require('perf_hooks');
const threshold = 200; // 200ms阈值
let lastLoopTime = performance.now();
setInterval(() => {
const now = performance.now();
const delta = now - lastLoopTime;
if (delta > threshold) {
logger.warn('Event loop blocked', {
duration: delta,
threshold
});
}
lastLoopTime = now;
}, 1000);
容器化环境监控
Docker健康检查
在Dockerfile中添加健康检查指令:
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:3000/health || exit 1
Kubernetes探针配置
Kubernetes部署中的存活探针和就绪探针:
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
可视化监控面板
使用Grafana展示指标
结合Prometheus和Grafana创建监控面板:
const client = require('prom-client');
// 创建指标
const cpuGauge = new client.Gauge({
name: 'node_cpu_usage_percent',
help: 'Current CPU usage in percent'
});
const memoryGauge = new client.Gauge({
name: 'node_memory_usage_bytes',
help: 'Current memory usage in bytes',
labelNames: ['type']
});
// 更新指标
setInterval(() => {
cpuGauge.set(getCPUUsage());
const memory = process.memoryUsage();
memoryGauge.set({ type: 'rss' }, memory.rss);
memoryGauge.set({ type: 'heapTotal' }, memory.heapTotal);
memoryGauge.set({ type: 'heapUsed' }, memory.heapUsed);
}, 5000);
// 暴露指标端点
app.get('/metrics', async (req, res) => {
res.set('Content-Type', client.register.contentType);
res.end(await client.register.metrics());
});
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn