From: Marc Lavine (mlavine@foundrynet.com)
Date: 09/13/02
Peter,
Thanks for the quick response.
> > I think there's an additional change that is worth making as well, which
would be
> > to add length information for each overall sample.
> > ...
> > Note that I haven't used the "enterprise" extension mechanism here,
> although you
> > could if you want to keep it consistent with the other structures.
>
> Good idea. It's not a bad idea to use the the same mechanism for defining
> structure types. It would allow vendor specific data to be specified at
this
> level.
Agreed.
> > ... my intent was that the organization that had defined a particular
> > structure could later define an extended version of that same structure
> with some
> > additional fields added to the end. Existing fields in the structure
> would not
> > be allowed to be modified, only new ones could be added to the end. The
> > presence of the length information would allow a collector to determine
> which
> > version of a structure was in use, so that it could take advantage of
the
> > additional fields if it had been updated to understand them.
>
> Agreed. Vendors would be allowed to extend their own structures, but not
the
> standard structures or those of other vendors. Keeping a single authority
> for each structure definition ensures that there won't be any clashes on
the
> extensions.
To support this, I think it should be documented that any software decoding
an sFlow packet, should always use the encoded length information, and not
assume that a structure is of a particular length, since structures may
grow. This should apply to standard structures as well, since presumably
they could also be extended as a result of an update to the standard.
> The current data_format definition does provide a very large (excessive?)
> name space. It allows 2*32-1 enterprises and 2*32-1 structures per
> enterprise.
>
> Currently the largest enterprise number assignment is 14609
> http://www.iana.org/assignments/enterprise-numbers
>
> A 24 bit enterprise and 8 bit struct number does seem a little
constraining.
> How about 20 bits for the enterprise and 12 bits for the structure? This
> would allow for over a million enterprises and allows each enterprise over
> 4000 structures. I don't see hitting either of these limits any time soon.
Yes, this sounds fine. I'm not sure why I was fixated on using a byte
boundary in my earlier message ;-).
> > I noticed the addition of the sampled_ethernet format. Since the data
> > structures no longer use a union to restrict a flow sample to being
> represented
> > by a single structure, I think there need to be some guidelines on how
the
> > sampled_* structures should be used.
>
> I agree that this should be clarified. How about adding the following
> comment
> to the standard structures file?
>
> /* Flow Data types
>
> A flow_sample must contain packet header information. The
> prefered format for reporting packet header information is
> the sampled_header. However, if the packet header is not
> available to the sampling process then one or more of
> sampled_ethernet, sampled_ipv4, sampled_ipv6 may be used.
>
> enterprise = 0 refers to standard sFlow structures. An
> sFlow implementor should use the standard structures
> where possible, even if they can only be partially
> populated. Vendor specific structures are allowed, but
> should only be used to supplement the existing
> structures, or to carry information that hasn't yet
> been standardized. */
>
> Does this sufficiently clarify the issue?
Yes, that seems sufficient.
Overall, things look pretty good. Originally, I was a little concerned
about the long data format ids making the encoding inefficient, and I'd
tried composing an alternate design (with mixed success), but the change to
using the shorter ids makes that a moot point.
Here some other issues that have occurred to me:
You might want to explicitly specify whether data_format values need to be
unique across different structure types. In other words, can a single value
be used to identify a flow structure, a counter structure, and a sample
structure, or should values not be reused in that manner?
You might consider renaming counter_block to counter_record for consistency
with the other structures.
In the data structures document, I think it would be good to have some
documentation about how the structures should be filled in, particularly
when not all information is available. For example, for extended_switch, if
either the source or destination VLAN information is not available, should
the corresponding fields be set to zero? Likewise for extended_user, I
presume it's acceptable to encode a zero-length string if one of the user
ids is not available.
In the extended_router documentation, it is not clearly specified whether
the mask fields' format is a bit mask or a count of bits.
For flow_sample.drops, I think it would be good to clarify the documentation
with regard to what kind of packets are being counted (i.e. are they only
sFlow packet drops that are being counted?).
Should the ETHERNET-ISO8023 enum be named ETHERNET-ISO88023 instead?
In flow_sample, the input and output fields have special values to represent
the case where the interface is "unknown". If packets originating or
terminating at the switch itself are sampled, then one of the two interface
fields will not apply. I'm wondering if it might be good to have an
additional special "none" value to indicate this, rather than using the
"unknown" value, which might wind up getting used for other cases as well.
In the extended_user data, there is an issue of what character set and
encoding the user ids are expressed in. I'm sure there will be contexts in
which they will not be in ASCII. In an ideal world, I'd just say these
should be encoded in UTF-8, but agents may receive the data in different
encodings, and it seems better for the agents not to need to delve into
character set translations. Therefore, I think it would be a good idea to
be able to include information about the character set of each user id (for
each field independently). This may assist a collector in being able to
properly display the ids or map them into different character sets. For
character set issues, see RFCs 2277 and 2978. RFC 2978 defines a scheme for
registering character sets and encodings (collectively dubbed "charsets").
The registry contents can be found at
http://www.iana.org/assignments/character-sets. Fortunately, the registry
includes a "MIBenum" integer for each charset. I propose that these values
be used to identify the charset for each user id string, with the reserved
value zero being used to indicate that the charset is unknown. So, for
example, if an agent knows that a user id is in UTF-8, the MIBenum value
would be 106. UTF-8 could probably be considered the preferred charset, if
the agent is able to obtain the data in different charsets.
Regards,
Marc
--- Marc Lavine Foundry Networks, Inc.
This archive was generated by hypermail 2.1.4 : 09/13/02 PDT