Re: Power and temperature

From: Peter Phaal <peter.phaal@inmon.com>
Date: 02/20/10
Message-Id: <AA692094-CAB3-43FB-9D46-6EC485359747@inmon.com>

Sujay,

Thanks for all the information. It would make sense to look for a common set of measurements that can be obtained from the broadest range of devices.

sFlow provides a fairly high frequency push of counters, so you can probably get by with fairly simple measurements that are aggregated at the collector. For example if the device can just measure volts and amps, the collector can calculate power and accumulate KWh. Off peak, on-peak accounting can also happen at the collector. By having a simple, widely supported set of measurements it should be possible to trace power consumption throughout the data center.

Measurements that look like good candidates are

1. volts gauge
2. amps gauge
3. power gauge

The following accumulators would require the agent internally monitor power and maintain these values:

4. internal polling frequency
5. time of use
6. over voltage count
7. under voltage count
8. over current count
9. total energy (e.g. KWh - although we would want to support a dynamic range that would allow low power devices to be monitored)

Clearly not all devices will support the full set, but we can make sure that unknown values are defined for each counter so that each device could fill out the structure on a best effort basis.

HP Labs has done some interesting work on power measurement for switches:
http://www.hpl.hp.com/personal/Priya_Mahadevan/pubs/FinalVersion_Networking2009.pdf
http://www.hpl.hp.com/personal/Priya_Mahadevan/pubs/GI2009_paper.pdf

Based on the papers, switch vendors could calibrate power models of their switches and export fairly accurate power estimates based on the number of switch ports enabled and the port speeds. Looking at the switches in our network, none seem to have power metering for the power supply, but some of the PoE switches do provide per port power metering for the PoE ports.

It looks like server vendors include power monitoring capabilities (at least as an option):
http://apcmag.com/server-tour-server-management.htm
http://support.dell.com/support/edocs/software/smitasst/8.2/en/ug/perf_mon.htm

Most switches and servers seem to have thermal monitoring. Temperature thresholding and over temperature alerts can be generated by the sFlow collector since it continually receives temperature values.

Peter

On Feb 15, 2010, at 8:57 PM, sujay gupta wrote:

> Hi,
>
> Agree, it does make sense to add counters carrying this energy/power
> related information.
>
> Assuming the target networks to monitor for power/energy measurements
> are Data-Centers. A little digging would reveal that there is more
> info besides the basic counters of KWhr & Temp which sFlow needs to
> carry.
> A typical Data-Center consisting of UPS, switches, routers, storage,
> servers, base power meters has much more amount of information to be
> assimilated.
> "Smartmeters" carry at least these information;
> (i) on-peak kWhr
> (ii) off-peak kWhr
> (iii) Time of Use(TOU)
> (iv) Outage counts
> (v) Voltage
> (vi) Current
> (vii) Polling frequency - (how frequently the volt/current is polled and stored)
> ( This is not a complete list but more or less sufficient, there are
> quite a few IEC & ANSI specifications
> which talk about accuracy, measurement and testing for the above)
>
> On the device level, modern day Digital Power Controllers used in UPS,
> power supply units in
> routers, switches, storage measure and provide the following information;
> ( these are also often present in proprietary MIB's)
> (i) overvoltage/undervoltage info - fault/warnings
> (ii) overcurrent/undercurrent info -fault/warnings
> (iii) overtemperature/undertemperature info -fault/warnings
> (iv) fan fault
> (v) manufacturer specific device error.
> (vi) realtime(or almost) voltage
> (vii) reatime (or almost) current
> (viii) realtime( or almost) temperature- sub-unit wise
> ( Refer to PMbus specifications,http://pmbus.org/ )
>
> Having said that, both the above lists are not complete but mostly
> sufficient, IMO also there are no strict standards being followed
> all across, it seems to be varying with country, vendor & their partners.
>
> So, if sFlow were to carry power /energy information it should be
> sub-set or a smart aggregation of them.
>
> Another interesting point is the applications which monitor the above
> not only give gross-data,sometimes they also provide fault
> prediction.(that gives a perspective into the type of data sFlow
> should mine)
>
> BR,
> -Sujay
>
>
> On Tue, Jan 5, 2010 at 5:03 PM, Peter Phaal <peter.phaal@inmon.com> wrote:
>> Power management is an increasingly important consideration in managing networks.
>>
>> Adding sFlow counters to allow agents to report on power and temperature would provide useful information for power optimization.
>>
>> /* Energy consumption */
>> /* opaque = counter_data; enterprise = 0; format = 3000 */
>> struct energy {
>> unsigned hyper mJ; /* energy in millijoules */
>> unsigned int pf; /* power factor (expressed as a percent), 0 for DC power */
>> }
>>
>> /* Temperature */
>> /* opaque = counter_data; enterprise = 0; format = 3001 */
>> struct temperature {
>> int oC<>; /* array of temperatures (1 for each thermometer) expressed in degrees Celsius */
>> }
>>
>> Each measurement is scoped by the data source reporting it. In the case of energy, a switch might report total energy for the whole box (as measured by its power supply) and may also report energy for each of its PoE ports.
>>
>> Adding these counters to sFlow (currently contained in proprietary SNMP MIBs if available at all) would provide an efficient, multi-vendor way to track power usage and temperature across all the devices and links in the network. SFlow counter polling is very efficient, providing a scalable way to monitor the large numbers of devices in a data center.
>>
>> Peter
Received on Sat Feb 20 20:42:00 2010

This archive was generated by hypermail 2.1.8 : 02/20/10 PST