NVIDIA Corp.                                            Robert Alexander
http://www.nvidia.com/                                      NVIDIA Corp.

                                                             Peter Phaal
                                                             InMon Corp.




                                                            August 2012


                       sFlow NVML GPU Structures



Copyright Notice

Copyright (C) NVIDIA Corp. (2012). All Rights Reserved.

Abstract

This memo describes an sFlow version 5 structures to report on NVIDIA
GPU related data.

Table of Contents

1. Overview ......................................................  1
2. sFlow Datagram Extension ......................................  1
3. References ....................................................  2
4. Author's Addresses ............................................  2

1. Overview

   This document describes additional structures that allow an sFlow
   agent to export information from NVIDIA GPUs via the NVIDIA
   Management Library (NVML) [1]. sFlow version 5 is an extensible
   protocol that allows the addition of new data without impacting
   existing collectors.  This document does not change the sFlow version
   5 protocol [2], it simply defines additional, optional, data
   structures through which NVIDIA GPUs can report monitoring metrics.

2. sFlow Datagram Extension

   Graphics Processing Units (GPUs) are a type of computer hardware
   commonly used to render graphics or accelerate High Performance
   Computing (HPC) jobs.  Defining standard sFlow structures simplifies
   management of GPU enabled clusters by providing metrics that define



FINAL                          nvidia.com                       [Page 1]

FINAL                   sFlow NVML GPU Structure             August 2012


   GPU performance, status and health.

   The sFlow Host Structures [3] specification defines performance
   metrics for hosts. The nvidia_gpu extends the set of host metrics to
   include GPU performance.

/* NVIDIA GPU statistics */
/* opaque = counter_data; enterprise = 5703, format=1 */
struct nvidia_gpu {
  unsigned int device_count; /* see nvmlDeviceGetCount */
  unsigned int processes;    /* see nvmlDeviceGetComputeRunningProcesses */
  unsigned int gpu_time;     /* total milliseconds in which one or more
                                kernels was executing on GPU
                                sum across all devices */
  unsigned int mem_time;     /* total milliseconds during which global device
                                memory was being read/written
                                sum across all devices */
  unsigned hyper mem_total;  /* sum of framebuffer memory across devices
                                see nvmlDeviceGetMemoryInfo */
  unsigned hyper mem_free;   /* sum of free framebuffer memory across devices
                                see nvmlDeviceGetMemoryInfo */
  unsigned int ecc_errors;   /* sum of volatile ECC errors across devices
                                see nvmlDeviceGetTotalEccErrors */
  unsigned int energy;       /* sum of millijoules across devices
                                see nvmlDeviceGetPowerUsage */
  unsigned int temperature;  /* maximum temperature in degrees Celsius
                                across devices
                                see nvmlDeviceGetTemperature */
  unsigned int fan_speed;    /* maximum fan speed in percent across devices
                                see nvmlDeviceGetFanSpeed */
}


3. References

[1]  "NVIDIA Management Library", http://devel-
     oper.nvidia.com/cuda/nvidia-management-library-nvml

[2]  Phaal, P. and Lavine, M., "sFlow Version 5",
     http://www.sflow.org/sflow_version_5.txt, July 2006

[3]  Phaal, P. and Jordan, R., "sFlow Host Structures",
     http://www.sflow.org/sflow_host.txt July 2010

4. Author's Address

   Robert Alexander
   NVIDIA Corp.



FINAL                          nvidia.com                       [Page 2]

FINAL                   sFlow NVML GPU Structure             August 2012


   2701 San Tomas Expressway
   Santa Clara, CA 95050

   EMail: ralexander@nvidia.com


   Peter Phaal
   InMon Corp.
   580 California Street, 5th Floor
   San Francisco, CA 94104

   Phone: (415) 283-3263
   EMail: peter.phaal@inmon.com






































FINAL                          nvidia.com                       [Page 3]