sFlow.org Peter Phaal http://www.sFlow.org/ InMon Corp. info@sflow.org Brian Bogard PayPal Josep M. Ferrandiz PayPal September 2012 sFlow Application Structures Copyright Notice Copyright (C) sFlow.org (2012). All Rights Reserved. Abstract This memo describes sFlow version 5 structures for exporting data relating to generic applications. Table of Contents 1. Overview ...................................................... 1 2. Discussion .................................................... 1 3. sFlow Datagram Extensions ..................................... 2 4. References .................................................... 6 5. Author's Addresses ............................................ 7 1. Overview This document describes additional structures that allow an sFlow agent to export information relating to generic application entities. sFlow version 5 is an extensible protocol that allows the addition of new data structures without impacting existing collectors. This document does not change the sFlow version 5 protocol [1], it simply defines additional, optional, data structures that an enterprise application aware entity can use to report performance. 2. Discussion Many organisations have applications that are specific to their business. Managing the performance of these applications currently FINAL sFlow.org [Page 1] FINAL sFlow Application Structures September 2012 involves developing custom metrics, monitoring solutions and reporting tools. The definition of standard sFlow metrics for monitoring in-house applications simplifies management by centralizing monitoring of performance across the diverse range of entities and applications, allowing a common set of performance analysis tools to process the measurements. Generic application metrics are only intended to be used in cases where application specific structures do not exist and where the application is organisation specific. For widely used, standard, applications such as HTTP [3] and Memcache [4], application specific metrics provide greater detail. The sFlow Host Structures [2] specification defines a framework implementing sFlow in application layer protocols and linking the data to sFlow monitoring of the computational and network services required to deliver the service. This document builds on that framework by defining structures for reporting generic application performance. 3. sFlow Datagram Extensions Each counter_sample must include an app_operations structure reporting counts of operations by status code. The counter_sample may also include an app_resources structure reporting resources allocated to and consumed by the application.and/or an app_workers structure reporting on workers within the application and dropped/delayed requests. The app_operation structure is used to export attributes of randomly sampled application operations. Each application flow_sample includes an app_operation structure as well as an extended_socket_ip4 or extended_socket_ipv6 structure [2]. The flow_sample input and output interface fields [1] are used to indicate service direction. If the sFlow agent is running on a server then the input interface must be set to the ifIndex corresponding to the interface the request was received on (0 if unknown) and the output interface must be set to 0x3FFFFFFF, indicating that the target of the operation is the local server. If the sFlow agent is running as part of the client, then the input interface must be set to 0x3FFFFFFF and the output interface set to the ifIndex corresponding to the interface the request was sent on (0 if unknown). An sFlow sub-agent embedded within the application entity is responsible for reporting on the application logical entity data source. The sub_agent_id and the data source index must be unique FINAL sFlow.org [Page 2] FINAL sFlow Application Structures September 2012 within the host. To ensure uniqueness and provide persistence, an embedded sub-agent must use the lowest numbered port number that is being used to receive application requests as the sub_agent_id and data source index. For example, if an application server is listening for application requests on TCP port 1234 then sub_agent_id = 1234 and the data source type = 3 (entLogicalEntry) and index = 1234. In the case of intermediate application entities, such as load balancers and proxies, the entity may act as both a server and a client. An intermediate entity must report itself as the server and should include an extended_proxy_socket [3] structure indicating the connection used to retrieve the response from the downstream server. An application may make use of other downstream applications (services) in order to process a request. Client and server interactions are treated independently. A data source associated with the application reports on requests to the server. Requests made by the server in order to process requests will typically be monitored by the downstream servers hosting each application, but may be monitored as client requests using a separate sFlow data source. The sampled client transactions should include an app_parent_context structure providing information about the enclosing request. The same sub-agent reports on both server and dependent client operations. However, the client data source index must be unique. Data source index numbers in the range 200001-265535 are reserved for for client data sources, providing a one-to-one correspondence between server data source index and client data source index. Continuing the example, the client data source associated with server data source 1234 would have sub_agent_id = 1234, data source type = 3 and index = 201234. The following sFlow structures are defined to export application performance: /* UTF-8 encoded string */ typedef opaque utf8string; /* Application name */ /* Encode hierarchical names using '.' as a separator with most general category on left and specific name on right. e.g. payment, mail.smtp, mail.exchange, db.oracle, db.oracle.mysql */ typedef utf8string<32> application; /* Operation name */ /* Encode hierarchical names using '.' as a separator with most general category on left and specific operation name on right. e.g. get.customer.name, upload.photo, upload.audio */ FINAL sFlow.org [Page 3] FINAL sFlow Application Structures September 2012 typedef utf8string<32> operation; /* Operation attributes */ /* name=value pairs encoded as an HTTP query string e.g cc=visa&loc=mobile */ typedef utf8string<255> attributes; /* Status codes */ /* The status enumeration may be expanded over time. Applications receiving sFlow must be prepared to receive enterprise_operation structures with unknown status values. The authoritative list of machine types will be maintained at www.sflow.org */ enum status { SUCCESS = 0; OTHER = 1; TIMEOUT = 2; INTERNAL_ERROR = 3; BAD_REQUEST = 4; FORBIDDEN = 5; TOO_LARGE = 6; NOT_IMPLEMENTED = 7; NOT_FOUND = 8; UNAVAILABLE = 9; UNAUTHORIZED = 10; } /* Operation context */ struct context { application application; operation operation; attributes attributes; } /* Sampled Application Operation */ /* opaque = flow_data; enterprise = 0; format = 2202 */ struct app_operation { context context; /* attributes describing the operation */ utf8string<64> status_descr; /* additional text describing status (e.g. "unknown client") */ unsigned hyper req_bytes; /* size of request body (exclude headers) */ unsigned hyper resp_bytes; /* size of response body (exclude headers) */ unsigned int uS; /* duration of the operation (microseconds) */ status status; /* status code */ FINAL sFlow.org [Page 4] FINAL sFlow Application Structures September 2012 } /* Optional parent context information for sampled client operation The parent context represents the server operation that resulted in the sampled client operation being initiated */ /* opaque = flow_data; enterprise = 0; format = 2203 */ struct app_parent_content { context context; } /* Actor */ /* A business level identifier associated with a transaction. Examples include customer id, vendor id, merchant id, etc. */ typedef utf8string<64> actor; /* Actor initiating the request */ /* e.g. customer sending a payment */ /* opaque = flow_data; enterprise = 0; format = 2204 */ app_initiator { actor actor; } /* Actor targetted by the request */ /* e.g. recipient of payment */ /* opaque = flow_data; enterprise = 0; format = 2205 */ app_target { actor actor; } /* Application counters */ /* Count of operations by status code */ /* opaque = counter_data; enterprise = 0; format = 2202 */ struct app_operations { application application; unsigned int success; unsigned int other; unsigned int timeout; unsigned int internal_error; unsigned int bad_request; unsigned int forbidden; unsigned int too_large; unsigned int not_implemented; unsigned int not_found; unsigned int unavailable; unsigned int unauthorized; } FINAL sFlow.org [Page 5] FINAL sFlow Application Structures September 2012 /* Application resources */ /* see getrusage, getrlimit values represent totals across all application processes/threads */ /* opaque = counter_data; enterprise = 0; format = 2203 */ struct app_resources { unsigned int user_time; /* time spent executing application user instructions (in milliseconds) */ unsigned int system_time; /* time spent in operating system on behalf of application (in milliseconds) */ unsigned hyper mem_used; /* memory used in bytes */ unsigned hyper mem_max; /* max. memory in bytes */ unsigned int fd_open; /* number of open file descriptors */ unsigned int fd_max; /* max. number of file descriptors */ unsigned int conn_open; /* number of open network connections */ unsigned int conn_max; /* max. number of network connections */ } /* Application workers */ /* Each worker concurrently processes requests with other workers. Workers may by represented by threads, processes, or in the case of asynchronous server the number of requests that are in progress. */ /* opaque = counter_data; enterprise = 0; format = 2206 */ app_workers { unsigned int workers_active; /* number of active workers */ unsigned int workers_idle; /* number of idle workers */ unsigned int workers_max; /* max. number of workers */ unsigned int req_delayed; /* number of times processing of a client request was delayed because of lack of resources */ unsigned int req_dropped; /* number of times a client request was dropped because of a lack of resources */ } 4. References [1] Phaal, P. and Lavine, M., "sFlow Version 5", http://www.sflow.org/sflow_version_5.txt, July 2006 [2] Phaal, P. and Jordan, R., "sFlow Host Structures", http://www.sflow.org/sflow_host.txt, July 2010 [3] Phaal, P. and Mangot, D., sFlow HTTP Structures", http://sflow.org/sflow_http.txt, December 2011 [4] Phaal, P. and Mangot, D., sFlow Memcache Structures", http://sflow.org/sflow_memcache.txt, December 2011 FINAL sFlow.org [Page 6] FINAL sFlow Application Structures September 2012 5. Author's Address Peter Phaal InMon Corp. 580 California Street, 5th Floor San Francisco, CA 94104 Phone: (415) 283-3263 EMail: peter.phaal@inmon.com Brian Bogard PayPal 9999 N. 9th Street Scottsdale, AZ 85258 Phone: (480) 862-7295 EMail: bbogard@paypal.com Josep M. Ferrandiz PayPal 2211 North First Street San Jose, CA 95131 Phone: (408) 967-3268 EMail: jferrandiz@ebay.com FINAL sFlow.org [Page 7]