Print Friendly
Comments

Monitoring an SNMP-Enabled UPS

Using the Check a Network Response task it's possible to poll an SNMP enabled device. This guide describes monitoring of an Uninterruptible Power Supply which supports the UPS Management Information Base (IETF RFC 1628).

Receiving an online alert for such a task will depend on the equipment that maintains the site's Internet connection having power protection and the outage being sufficiently localized it doesn't affect the connection provider (any connection failure should instead yield a heartbeat alert).

DISCLAIMER: The information provided herein is intended as an example only. No representation is made as to its accuracy, completeness or suitability for a particular purpose or platform (see also the Winserver Wingman EULA). Please test your own implementation thoroughly in the specific environment within which it need function and ensure that it operates as required.

Tested On: Emerson Liebert GXT3 with IntelliSlot Network Card

Alarms Present

The RFC defines many values of interest but the simplest mechanism for monitoring a notifiable event is to poll the upsAlarmsPresent scalar which will be set non-zero if an alarm condition occurs;

Check for UPS alarms each minute.

Task Parameters

Task Type: Check a Network Response

Frequency: 1 Minute

Address: snmp://LiebertEM@192.168.1.1/1.3.6.1.2.1.33.1.6.1.0

Port: 161

CAUTION On: (response) ...is not returned or; is numerically not equal to; 0; 1 time.

FAILURE On: (response) ...is returned and; is numerically not equal to; 0; 2 times.


Charge Remaining

If instead (or in combination - given two separate tasks) you'd prefer to monitor the battery charge remaining the upsEstimatedChargeRemaining is available returning an integer in the range 0-100.

Check the UPS is not discharging each minute.

Task Parameters

Task Type: Check a Network Response

Frequency: 1 Minute

Address: snmp://LiebertEM@192.168.1.1/1.3.6.1.2.1.33.1.2.4.0

Port: 161

CAUTION On: (response) ...is not returned or; is numerically less than; 90; 1 time.

FAILURE On: (response) ...is returned and; is numerically less than; 75; 1 time.


The principal practical difference between the two tasks illustrated is that the first will alert as soon as mains power is unavailable where the second won't notify until charge remaining has fallen below 90% (and generally won't at all for very short outages).

Report Frequency

Where battery-backup is concerned minutes count which is why we've recommended the minimum 1-minute test frequency. Mention should be made of how online reporting notifications work with higher task frequencies.

Escalating alerts are always reported immediately, so in the case of the Alarms Present example above, the first evaluation where an alarm is present a CAUTION notification will be posted to the web service (and an e-mail notification sent if required by your account settings in conjunction with any other current alerts). If the alarm is still present on the next evaluation one minute later a FAILURE will be notified.

De-escalating alert levels are withheld until at least 5 minutes after the previous notification so if the power comes back on and the alarm is removed any time in the next 4 test cycles the clearance will not be notified until the 5th (7 minutes after the original CAUTION was posted).

This is intended to minimize repetitive e-mail notifications and unnecessary traffic for high-frequency tasks which are on the cusp of an alert condition (and might otherwise notify constantly) while still ensuring problems are reported as soon as they're detected.

13 August 2016