Re: Re: sFlow Datagram Extensibility

From: Marc Lavine (
Date: 09/13/02

  • Next message: Peter Phaal: "RE: Re: sFlow Datagram Extensibility"


    Thanks for the quick response.

    > > I think there's an additional change that is worth making as well, which
    would be
    > > to add length information for each overall sample.
    > > ...
    > > Note that I haven't used the "enterprise" extension mechanism here,
    > although you
    > > could if you want to keep it consistent with the other structures.
    > Good idea. It's not a bad idea to use the the same mechanism for defining
    > structure types. It would allow vendor specific data to be specified at
    > level.


    > > ... my intent was that the organization that had defined a particular
    > > structure could later define an extended version of that same structure
    > with some
    > > additional fields added to the end. Existing fields in the structure
    > would not
    > > be allowed to be modified, only new ones could be added to the end. The
    > > presence of the length information would allow a collector to determine
    > which
    > > version of a structure was in use, so that it could take advantage of
    > > additional fields if it had been updated to understand them.
    > Agreed. Vendors would be allowed to extend their own structures, but not
    > standard structures or those of other vendors. Keeping a single authority
    > for each structure definition ensures that there won't be any clashes on
    > extensions.

    To support this, I think it should be documented that any software decoding
    an sFlow packet, should always use the encoded length information, and not
    assume that a structure is of a particular length, since structures may
    grow. This should apply to standard structures as well, since presumably
    they could also be extended as a result of an update to the standard.

    > The current data_format definition does provide a very large (excessive?)
    > name space. It allows 2*32-1 enterprises and 2*32-1 structures per
    > enterprise.
    > Currently the largest enterprise number assignment is 14609
    > A 24 bit enterprise and 8 bit struct number does seem a little
    > How about 20 bits for the enterprise and 12 bits for the structure? This
    > would allow for over a million enterprises and allows each enterprise over
    > 4000 structures. I don't see hitting either of these limits any time soon.

    Yes, this sounds fine. I'm not sure why I was fixated on using a byte
    boundary in my earlier message ;-).

    > > I noticed the addition of the sampled_ethernet format. Since the data
    > > structures no longer use a union to restrict a flow sample to being
    > represented
    > > by a single structure, I think there need to be some guidelines on how
    > > sampled_* structures should be used.
    > I agree that this should be clarified. How about adding the following
    > comment
    > to the standard structures file?
    > /* Flow Data types
    > A flow_sample must contain packet header information. The
    > prefered format for reporting packet header information is
    > the sampled_header. However, if the packet header is not
    > available to the sampling process then one or more of
    > sampled_ethernet, sampled_ipv4, sampled_ipv6 may be used.
    > enterprise = 0 refers to standard sFlow structures. An
    > sFlow implementor should use the standard structures
    > where possible, even if they can only be partially
    > populated. Vendor specific structures are allowed, but
    > should only be used to supplement the existing
    > structures, or to carry information that hasn't yet
    > been standardized. */
    > Does this sufficiently clarify the issue?

    Yes, that seems sufficient.

    Overall, things look pretty good. Originally, I was a little concerned
    about the long data format ids making the encoding inefficient, and I'd
    tried composing an alternate design (with mixed success), but the change to
    using the shorter ids makes that a moot point.

    Here some other issues that have occurred to me:

    You might want to explicitly specify whether data_format values need to be
    unique across different structure types. In other words, can a single value
    be used to identify a flow structure, a counter structure, and a sample
    structure, or should values not be reused in that manner?

    You might consider renaming counter_block to counter_record for consistency
    with the other structures.

    In the data structures document, I think it would be good to have some
    documentation about how the structures should be filled in, particularly
    when not all information is available. For example, for extended_switch, if
    either the source or destination VLAN information is not available, should
    the corresponding fields be set to zero? Likewise for extended_user, I
    presume it's acceptable to encode a zero-length string if one of the user
    ids is not available.

    In the extended_router documentation, it is not clearly specified whether
    the mask fields' format is a bit mask or a count of bits.

    For flow_sample.drops, I think it would be good to clarify the documentation
    with regard to what kind of packets are being counted (i.e. are they only
    sFlow packet drops that are being counted?).

    Should the ETHERNET-ISO8023 enum be named ETHERNET-ISO88023 instead?

    In flow_sample, the input and output fields have special values to represent
    the case where the interface is "unknown". If packets originating or
    terminating at the switch itself are sampled, then one of the two interface
    fields will not apply. I'm wondering if it might be good to have an
    additional special "none" value to indicate this, rather than using the
    "unknown" value, which might wind up getting used for other cases as well.

    In the extended_user data, there is an issue of what character set and
    encoding the user ids are expressed in. I'm sure there will be contexts in
    which they will not be in ASCII. In an ideal world, I'd just say these
    should be encoded in UTF-8, but agents may receive the data in different
    encodings, and it seems better for the agents not to need to delve into
    character set translations. Therefore, I think it would be a good idea to
    be able to include information about the character set of each user id (for
    each field independently). This may assist a collector in being able to
    properly display the ids or map them into different character sets. For
    character set issues, see RFCs 2277 and 2978. RFC 2978 defines a scheme for
    registering character sets and encodings (collectively dubbed "charsets").
    The registry contents can be found at Fortunately, the registry
    includes a "MIBenum" integer for each charset. I propose that these values
    be used to identify the charset for each user id string, with the reserved
    value zero being used to indicate that the charset is unknown. So, for
    example, if an agent knows that a user id is in UTF-8, the MIBenum value
    would be 106. UTF-8 could probably be considered the preferred charset, if
    the agent is able to obtain the data in different charsets.


    Marc Lavine
    Foundry Networks, Inc.

    This archive was generated by hypermail 2.1.4 : 09/13/02 PDT