мониторинг процессора по snmp
1. CPU Load Average - это среднее количество процессов готовых к выполнению и находящихся в очереди на обслуживание процессором. Историчиски сложилось, что это строится в МРТГ:
# 5 minute load average
Target[192.168.1.1_cpu]: .1.3.6.1.4.1.2021.10.1.5.2&.1.3.6.1.4.1.2021.10.1.5.2:myrocomm@192.168.1.1:
Directory[192.168.1.1_cpu]: 192.168.1.1
AbsMax[192.168.1.1_cpu]: 500000
MaxBytes[192.168.1.1_cpu]: 1000
Options[192.168.1.1_cpu]: gauge,growright,nopercent,noo,unknaszero,nobanner
YLegend[192.168.1.1_cpu]: CPU Load average X 100
ShortLegend[192.168.1.1_cpu]:
Legend2[192.168.1.1_cpu]: загрузка CPU
Legend4[192.168.1.1_cpu]: Макс. значение за интервал
LegendI[192.168.1.1_cpu]: загрузка CPU (load average)
Title[192.168.1.1_cpu]: CPU Load average X 100 -- 192.168.1.1
PageTop[192.168.1.1_cpu]: <H1>CPU Load average X 100 for 192.168.1.1 inet</H1>
2. CPU Times графики строятся в RRD.
а) Создаем базу rrd
create-cpu.sh:
#!/bin/sh
RRDDIR="/home/citrin/rrd/db"
HOST=$1
[ -z "$HOST" ] && echo "please specify host name" && exit 1
shift 1
[ -e "$RRDDIR/$HOST" ] || mkdir "$RRDDIR/$HOST"
if [ ! -e "$RRDDIR/$HOST/cpu.rrd" ]; then
echo creating cpu.rrd
rrdtool create $RRDDIR/$HOST/cpu.rrd -s 300 \
DS:user:COUNTER:600:0:U \
DS:nice:COUNTER:600:0:U \
DS:sys:COUNTER:600:0:U \
DS:idle:COUNTER:600:0:U \
DS:intr:COUNTER:600:0:U \
RRA:AVERAGE:0.5:1:600 \
RRA:AVERAGE:0.5:6:700 \
RRA:AVERAGE:0.5:24:775 \
RRA:AVERAGE:0.5:288:797 \
RRA:MAX:0.5:1:600 \
RRA:MAX:0.5:6:700 \
RRA:MAX:0.5:24:775 \
RRA:MAX:0.5:288:797
else
echo $RRDDIR/$HOST/cpu.rrd already exists
fi
2. Каждые 5 минут по крону запускеаем скрипт update.sh:
#!/bin/sh
RRDDIR="/home/citrin/rrd/db"
CMD="/usr/local/bin/snmpget -r 9 -t 3 -Oqv -v2c"
RRDCMD="/usr/local/rrdtool-1.0.46/bin/rrdtool"
##################################################
## Begin of CPU Times (States) ##
##################################################
OID="1.3.6.1.4.1.2021.11.50.0 1.3.6.1.4.1.2021.11.51.0 1.3.6.1.4.1.2021.11.52.0 1.3.6.1.4.1.2021.11.53.0"
HOST=192.168.1.1
COMNAME=secret
STRING=`$CMD -c $COMNAME $HOST $OID | awk '{printf ":"$1}'`
if [ -w "$RRDDIR/$HOST/cpu.rrd" ]; then
$RRDCMD update $RRDDIR/$HOST/cpu.rrd N$STRING
else
echo $HOST/cpu.rrd is not writable
fi
3. Просматривается cgi-скриптом, который на лету генерит картинки:
#!/usr/local/rrdtool-1.0.46/bin/rrdcgi
<HTML>
<HEAD><TITLE>CPU Times at INET 192.168.1.1</TITLE></HEAD>
<BODY>
<CENTER>
<H2>CPU Times at INET 192.168.1.1</H2>
<P>
<H3>last 24 hours</H3>
<RRD::GRAPH img/192.168.1.1-cpu.png
--imginfo '<IMG SRC=img/%s WIDTH=%lu HEIGHT=%lu >'
--lazy
--width 850 --height 250 --vertical-label "cpu time, %" -a PNG --start -24hours --upper-limit 100 --lower-limit 0 --rigid
DEF:CP_USER=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:user:AVERAGE
DEF:CP_NICE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:nice:AVERAGE
DEF:CP_SYS=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:sys:AVERAGE
DEF:CP_IDLE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:idle:AVERAGE
DEF:CP_INTR=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:intr:AVERAGE
CDEF:SUM="CP_USER,CP_NICE,+,CP_SYS,+,CP_IDLE,+"
CDEF:USER="CP_USER,100,*,SUM,/"
CDEF:INTR="CP_INTR,100,*,SUM,/"
CDEF:NICE="CP_NICE,100,*,SUM,/"
CDEF:SYS="CP_SYS,CP_INTR,-,100,*,SUM,/"
CDEF:IDLE="CP_IDLE,100,*,SUM,/"
AREA:SYS#FF0000:" SYS"
STACK:INTR#FFFF00:" INTR"
STACK:USER#000080:" USER"
STACK:NICE#EE799F:" NICE"
STACK:IDLE#90EE90:" IDLE"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:LAST: NOW - %.2lf %% user,"
GPRINT:"NICE:LAST: %.2lf %% nice,"
GPRINT:"SYS:LAST: %.2lf %% system,"
GPRINT:"IDLE:LAST: %.2lf %% idle"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:AVERAGE: AVERAGE - %.2lf %% user,"
GPRINT:"NICE:AVERAGE: %.2lf %% nice,"
GPRINT:"SYS:AVERAGE: %.2lf %% system,"
GPRINT:"IDLE:AVERAGE: %.2lf %% idle"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:MAX: MAXIMUM - %.2lf %% user,"
GPRINT:"NICE:MAX: %.2lf %% nice,"
GPRINT:"SYS:MAX: %.2lf %% system,"
GPRINT:"IDLE:MAX: %.2lf %% idle"
></BR>
Last updated at <RRD::TIME::LAST /home/citrin/rrd/db/192.168.1.1/cpu.rrd "%H:%M, %d %b %Y">
</P>
<H3>last week</H3>
<RRD::GRAPH img/192.168.1.1-cpu-w.png
--imginfo '<IMG SRC=img/%s WIDTH=%lu HEIGHT=%lu >'
--lazy
--width 850 --vertical-label "cpu time, %" -a PNG --start -1weeks --upper-limit 100 --lower-limit 0 --rigid
DEF:CP_USER=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:user:AVERAGE
DEF:CP_NICE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:nice:AVERAGE
DEF:CP_SYS=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:sys:AVERAGE
DEF:CP_IDLE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:idle:AVERAGE
DEF:CP_INTR=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:intr:AVERAGE
CDEF:SUM="CP_USER,CP_NICE,+,CP_SYS,+,CP_IDLE,+"
CDEF:USER="CP_USER,100,*,SUM,/"
CDEF:NICE="CP_NICE,100,*,SUM,/"
CDEF:SYS="CP_SYS,CP_INTR,-,100,*,SUM,/"
CDEF:IDLE="CP_IDLE,100,*,SUM,/"
CDEF:INTR="CP_INTR,100,*,SUM,/"
AREA:SYS#FF0000:" SYS"
STACK:INTR#FFFF00:" INTR"
STACK:USER#000080:" USER"
STACK:NICE#EE799F:" NICE"
STACK:IDLE#90EE90:" IDLE"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:AVERAGE: AVERAGE - %.2lf %% user,"
GPRINT:"NICE:AVERAGE: %.2lf %% nice,"
GPRINT:"SYS:AVERAGE: %.2lf %% system,"
GPRINT:"IDLE:AVERAGE: %.2lf %% idle"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:MAX: MAXIMUM - %.2lf %% user,"
GPRINT:"NICE:MAX: %.2lf %% nice,"
GPRINT:"SYS:MAX: %.2lf %% system,"
GPRINT:"IDLE:MAX: %.2lf %% idle"
>
</P>
<H3>last month</H3>
<RRD::GRAPH img/192.168.1.1-cpu-m.png
--imginfo '<IMG SRC=img/%s WIDTH=%lu HEIGHT=%lu >'
--lazy
--width 850 --vertical-label "cpu time, %" -a PNG --start -1months --upper-limit 100 --lower-limit 0 --rigid
DEF:CP_USER=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:user:AVERAGE
DEF:CP_NICE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:nice:AVERAGE
DEF:CP_SYS=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:sys:AVERAGE
DEF:CP_IDLE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:idle:AVERAGE
DEF:CP_INTR=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:intr:AVERAGE
CDEF:SUM="CP_USER,CP_NICE,+,CP_SYS,+,CP_IDLE,+"
CDEF:USER="CP_USER,100,*,SUM,/"
CDEF:NICE="CP_NICE,100,*,SUM,/"
CDEF:SYS="CP_SYS,CP_INTR,-,100,*,SUM,/"
CDEF:IDLE="CP_IDLE,100,*,SUM,/"
CDEF:INTR="CP_INTR,100,*,SUM,/"
AREA:SYS#FF0000:" SYS"
STACK:INTR#FFFF00:" INTR"
STACK:USER#000080:" USER"
STACK:NICE#EE799F:" NICE"
STACK:IDLE#90EE90:" IDLE"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:AVERAGE: AVERAGE - %.2lf %% user,"
GPRINT:"NICE:AVERAGE: %.2lf %% nice,"
GPRINT:"SYS:AVERAGE: %.2lf %% system,"
GPRINT:"IDLE:AVERAGE: %.2lf %% idle"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:MAX: MAXIMUM - %.2lf %% user,"
GPRINT:"NICE:MAX: %.2lf %% nice,"
GPRINT:"SYS:MAX: %.2lf %% system,"
GPRINT:"IDLE:MAX: %.2lf %% idle"
>
</P>
<H3>last year</H3>
<RRD::GRAPH img/192.168.1.1-cpu-y.png
--imginfo '<IMG SRC=img/%s WIDTH=%lu HEIGHT=%lu >'
--lazy
--width 850 --vertical-label "cpu time, %" -a PNG --start -1years --upper-limit 100 --lower-limit 0 --rigid
DEF:CP_USER=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:user:AVERAGE
DEF:CP_NICE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:nice:AVERAGE
DEF:CP_SYS=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:sys:AVERAGE
DEF:CP_IDLE=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:idle:AVERAGE
DEF:CP_INTR=/home/citrin/rrd/db/192.168.1.1/cpu.rrd:intr:AVERAGE
CDEF:SUM="CP_USER,CP_NICE,+,CP_SYS,+,CP_IDLE,+"
CDEF:USER="CP_USER,100,*,SUM,/"
CDEF:NICE="CP_NICE,100,*,SUM,/"
CDEF:SYS="CP_SYS,CP_INTR,-,100,*,SUM,/"
CDEF:IDLE="CP_IDLE,100,*,SUM,/"
CDEF:INTR="CP_INTR,100,*,SUM,/"
AREA:SYS#FF0000:" SYS"
STACK:INTR#FFFF00:" INTR"
STACK:USER#000080:" USER"
STACK:NICE#EE799F:" NICE"
STACK:IDLE#90EE90:" IDLE"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:AVERAGE: AVERAGE - %.2lf %% user,"
GPRINT:"NICE:AVERAGE: %.2lf %% nice,"
GPRINT:"SYS:AVERAGE: %.2lf %% system,"
GPRINT:"IDLE:AVERAGE: %.2lf %% idle"
COMMENT:\s
COMMENT:\s
GPRINT:"USER:MAX: MAXIMUM - %.2lf %% user,"
GPRINT:"NICE:MAX: %.2lf %% nice,"
GPRINT:"SYS:MAX: %.2lf %% system,"
GPRINT:"IDLE:MAX: %.2lf %% idle"
>
</P>
</CENTER></BODY></HTML>
Небольшой комментарий. Во Фре есть 4 счетчика raw cpu time - user, nice, system, interrupt, idle, показывающие сколько времени процессор находится в том или ином состоянии. По моим наблюдениям system включает в себя interrupt. Чтоб получить загрузку в процентах делим приращение каждого счетчика на сумму приращений остальных. Из system при этом вычитаем interrupt. Если монитроить Линух, то там счетчика interrupt нету.