IPFIX Working Group E. Boschi Internet-Draft B. Trammell Intended status: Experimental Hitachi Europe Expires: October 1, 2009 March 30, 2009 IP Flow Anonymisation Support draft-boschi-ipfix-anon-03.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 1, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document describes anonymisation techniques for IP flow data and the export of anonymised data using the IPFIX protocol. It provides a categorization of common anonymisation schemes and defines the Boschi & Trammell Expires October 1, 2009 [Page 1] Internet-Draft IP Flow Anonymisation Support March 2009 parameters needed to describe them. It provides guidelines for the implementation of anonymised data export and storage over IPFIX, and describes an Options-based method for anonymization metadata export within the IPFIX protocol, providing the basis for the definition of information models for configuring anonymisation techniques within an IPFIX Metering or Exporting Process, and for reporting the technique in use to an IPFIX Collecting Process. Table of Contents 1. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 5 2.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Categorisation of Anonymisation Techniques . . . . . . . . . . 6 5. Anonymisation of IP Flow Data . . . . . . . . . . . . . . . . 7 5.1. IP Address Anonymisation . . . . . . . . . . . . . . . . . 8 5.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 9 5.1.2. Random Permutation . . . . . . . . . . . . . . . . . . 9 5.1.3. Prefix-preserving Pseudonymisation . . . . . . . . . . 9 5.2. Timestamp Anonymisation . . . . . . . . . . . . . . . . . 10 5.2.1. Precision Degradation . . . . . . . . . . . . . . . . 10 5.2.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 11 5.2.3. Random Time Shifts . . . . . . . . . . . . . . . . . . 11 5.3. Counter Anonymisation . . . . . . . . . . . . . . . . . . 11 5.3.1. Precision Degradation . . . . . . . . . . . . . . . . 11 5.3.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 12 5.3.3. Random Noise Addition . . . . . . . . . . . . . . . . 12 5.4. Anonymisation of Other Flow Fields . . . . . . . . . . . . 12 5.4.1. Binning . . . . . . . . . . . . . . . . . . . . . . . 13 5.4.2. Random Permutation . . . . . . . . . . . . . . . . . . 13 6. Parameters for the Description of Anonymisation Techniques . . 13 6.1. Stability . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2. Truncation Length . . . . . . . . . . . . . . . . . . . . 14 6.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.4. Permutation . . . . . . . . . . . . . . . . . . . . . . . 14 6.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 14 7. Anonymisation Export Support in IPFIX . . . . . . . . . . . . 15 7.1. Anonymisation Options Template . . . . . . . . . . . . . . 15 7.2. Recommended Information Elements for Anonymisation Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 16 7.2.1. anonymisationStability . . . . . . . . . . . . . . . . 16 7.2.2. anonymisationTechnique . . . . . . . . . . . . . . . . 17 7.2.3. informationElementIndex . . . . . . . . . . . . . . . 18 8. Applying Anonymisation Techniques to IPFIX Export and Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Boschi & Trammell Expires October 1, 2009 [Page 2] Internet-Draft IP Flow Anonymisation Support March 2009 8.1. Arrangement of Processes in IPFIX Anonymisation . . . . . 19 8.2. IPFIX-Specific Anonymisation Guidelines . . . . . . . . . 20 8.2.1. Appropriate Use of Information Elements for Anonymised Data . . . . . . . . . . . . . . . . . . . 20 8.2.2. Anonymisation of Header Data . . . . . . . . . . . . . 20 8.2.3. Anonymisation of Options Data . . . . . . . . . . . . 21 9. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 10. Security Considerations . . . . . . . . . . . . . . . . . . . 22 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 23 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 Boschi & Trammell Expires October 1, 2009 [Page 3] Internet-Draft IP Flow Anonymisation Support March 2009 1. Open Issues There is not yet a mechanism for exporting information about defined- time anonymisation stability. The terminology section is incomplete; we should decide which of the terms introduced in this document are to be treated as terminology. Between "classes" of techniques and "parameters", there may be "properties" as well; for example, binning and timestamp anonymisation may be "ordered" or not (x>y in real --> x>y in anonymized). We should verify that we're splitting these up correctly. In parallel with this, the anonymisationTechnique values might be useful as a bitfield, with properties and classes being represented by some set of the bits in the field. We'll have to make sure that the properties and classes are exhaustive, if we do this. Both anonymisationStability and anonymisationTechnique might benefit from the creation of IANA registries; HOWEVER, in this case, it would be very important to ensure that such a registry contains only classes and properties of anonymised data, not information about specific algorithms. Certain technique/IE combinaitons (e.g. structure-preserving counters) don't make any sense; these should be noted in "IPFIX- Specific Anonymisation Guidelines". Guidelines should be provided for the evaluation of _new_ IEs added to the IANA registry after the publication of this draft for their anonymisation potential. This document does not cover the anonymisation of sub-IP level information, specifically MAC addresses. It should. 2. Introduction The standardisation of an IP flow information export protocol [RFC5101] and associated representations removes a technical barrier to the sharing of IP flow data across organizational boundaries and with network operations, security, and research communities for a wide variety of purposes. However, with wider dissemination comes greater risks to the privacy of the users of networks under measurement, and to the security of those networks. While it is not a complete solution to the issues posed by distribution of IP flow information, anonymisation is an important tool for the protection of Boschi & Trammell Expires October 1, 2009 [Page 4] Internet-Draft IP Flow Anonymisation Support March 2009 privacy within network measurement infrastructures. This document presents a mechanism for representing anonymised data within IPFIX and guidelines for using it. It begins with a categorization of anonymisation techniques. It then describes applicability of each technique to commonly anonymisable fields of IP flow data, organized by information element data type and semantics as in [RFC5102]; enumerates the parameters required by each of the applicable anonymisation techniques; and provides guidelines for the use of each of these techniques in accordance with best practices in data protection. Finally, it specifies a mechanism for exporting anonymised data and binding anonymisation metadata to templates using IPFIX Options. 2.1. IPFIX Protocol Overview In the IPFIX protocol, { type, length, value } tuples are expressed in templates containing { type, length } pairs, specifying which { value } fields are present in data records conforming to the Template, giving great flexibility as to what data is transmitted. Since Templates are sent very infrequently compared with Data Records, this results in significant bandwidth savings. Various different data formats may be transmitted simply by sending new Templates specifying the { type, length } pairs for the new data format. See [RFC5101] for more information. The IPFIX information model [RFC5102] defines a large number of standard Information Elements which provide the necessary { type } information for Templates. The use of standard elements enables interoperability among different vendors' implementations. Additionally, non-standard enterprise-specific elements may be defined for private use. 2.2. IPFIX Documents Overview "Specification of the IPFIX Protocol for the Exchange of IP Traffic Flow Information" [RFC5101] and its associated documents define the IPFIX Protocol, which provides network engineers and administrators with access to IP traffic flow information. "Architecture for IP Flow Information Export" [I-D.ietf-ipfix-architecture] defines the architecture for the export of measured IP flow information out of an IPFIX Exporting Process to an IPFIX Collecting Process, and the basic terminology used to describe the elements of this architecture, per the requirements defined in "Requirements for IP Flow Information Export" [RFC3917]. The IPFIX Protocol document [RFC5101] then covers the details of the method for transporting IPFIX Data Records and Templates via a Boschi & Trammell Expires October 1, 2009 [Page 5] Internet-Draft IP Flow Anonymisation Support March 2009 congestion-aware transport protocol from an IPFIX Exporting Process to an IPFIX Collecting Process. "Information Model for IP Flow Information Export" [RFC5102] describes the Information Elements used by IPFIX, including details on Information Element naming, numbering, and data type encoding. Finally, "IPFIX Applicability" [I-D.ietf-ipfix-as] describes the various applications of the IPFIX protocol and their use of information exported via IPFIX, and relates the IPFIX architecture to other measurement architectures and frameworks. Additionally, the "Specification of the IPFIX File Format" [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX Protocol for the storage of flow data. This document references the Protocol and Architecture documents for terminology, and extends the IPFIX Information Model to provide new Information Elements for anonymisation metadata. The anonymisation techniques described herein are equally applicable to the IPFIX Protocol and data stored in IPFIX Files. 3. Terminology Terms used in this document that are defined in the Terminology section of the IPFIX Protocol [RFC5101] document are to be interpreted as defined there. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 4. Categorisation of Anonymisation Techniques Anonymisation modifies a data set in order to protect the identity of the people or entities described by the data set from disclosure. With respect to network traffic data, anonymisation generally attempts to preserve some set of properties of the network traffic useful for a given application or applications, while ensuring the data cannot be traced back to the specific networks, hosts, or users generating the traffic. Anonymisation may be broadly classified according to two properties: recoverability and countability. All anonymisation techniques map the real space of identifiers or values into a separate, anonymised space, according to some function. A technique is said to be recoverable when the function used is invertible or can otherwise be Boschi & Trammell Expires October 1, 2009 [Page 6] Internet-Draft IP Flow Anonymisation Support March 2009 reversed and a real identifier can be recovered from a given replacement identifier. Countability compares the dimension of the anonymised space (N) to the dimension of the real space (M), and denotes how the count of unique values is preserved by the anonymisation function. If the anonymised space is smaller than the real space, then the function is said to generalise the input, mapping more than one input point to each anonymous value (e.g., as with aggregation). By definition, generalisation is not recoverable. If the dimensions of the anonymised and real spaces are the same, such that the count of unique values is preserved, then the function is said to be a direct substitution function. If the dimension of the anonymised space is larger, such that each real value maps to a set of anonymised values, then the function is said to be a set substitution function. Note that with set substitution functions, the sets of anonymised values are not necessarily disjoint. Either direct or set substitution functions are said to be one-way if there exists no method for recovering the real data point from an anonymised one. This classification is summarised in the table below. +------------------------+-----------------+------------------------+ | Recoverability / | Recoverable | Non-recoverable | | Countability | | | +------------------------+-----------------+------------------------+ | N < M | N.A. | Generalisation | | N = M | Direct | One-way Direct | | | Substitution | Substitution | | N > M | Set | One-way Set | | | Substitution | Substitution | +------------------------+-----------------+------------------------+ 5. Anonymisation of IP Flow Data Due to the restricted semantics of IP flow data, there are a relatively limited set of specific anonymisation techniques available on flow data, though each falls into the broad categories above. Each type of field that may commonly appear in a flow record may have its own applicable specific techniques. While anonymisation is generally applied at the resolution of single fields within a flow record, attacks against anonymisation use entire flows and relationships between hosts and flows within a given data set. Therefore, fields which may not necessarily be identifying by Boschi & Trammell Expires October 1, 2009 [Page 7] Internet-Draft IP Flow Anonymisation Support March 2009 themselves may be anonymised in order to increase the anonymity of the data set as a whole. Of all the fields in an IP flow record, only IP addresses directly identify entities in the real world. Each IP address is associated with an interface on a network host, and can potentially be identified with a single user. Additionally, IP addresses are structured identifiers; that is, partial IP address prefixes may be used to identify networks just as full IP addresses identify hosts. This makes anonymisation of IP addresses particularly important. Port numbers identify abstract entities (applications) as opposed to real-world entities, but they can be used to classify hosts and user behavior. Passive port fingerprinting, both of well-known and ephemeral ports, can be used to determine the operating system running on a host. Relative data volumes by port can also be used to determine the host's function (workstation, web server, etc.); this information can be used to identify hosts and users. While not identifiers in and of themselves, timestamps and counters can reveal the behavior of the hosts and users on a network. Any given network activity is recognizable by a pattern of relative time differences and data volumes in the associated sequence of flows, even without host address information. They can therefore be used to identify hosts and users. Timestamps and counters are also vulnerable to traffic injection attacks, where traffic with a known pattern is injected into a network under measurement, and this pattern is later identified in the anonymised data set. The simplest and most extreme form of anonymisation, which can be applied to any field of a flow record, is black-marker anonymisation, or complete deletion of a given field. Note that black-marker anonymisation is equivalent to simply not exporting the field(s) in question. While black-marker anonymisation completely protects the data in the deleted fields from the risk of disclosure, it also reduces the utility of the anonymised data set as a whole. Techniques that retain some information while reducing (though not eliminating) the disclosure risk will be extensively discussed in the following sections; note that the techniques specifically applicable to IP addresses, timestamps, ports, and counters will be discussed in separate sections. 5.1. IP Address Anonymisation Since IP addresses are the most common identifiers within flow data that can be used to directly identify a person, organization, or Boschi & Trammell Expires October 1, 2009 [Page 8] Internet-Draft IP Flow Anonymisation Support March 2009 host, most of the work on flow and trace data anonymisation has gone into IP address anonymisation techniques. Indeed, the aim of most attacks against anonymisation is to recover the map from anonymised IP addresses to original IP addresses thereby identifying the identified hosts. There is therefore a wide range of IP address anonymisation schemes that fit into the following categories. +------------------------------------+---------------------+ | Scheme | Action | +------------------------------------+---------------------+ | Truncation | Generalisation | | Random Permutation | Direct Substitution | | Prefix-preserving Pseudonymisation | Direct Substitution | +------------------------------------+---------------------+ 5.1.1. Truncation Truncation removes "n" of the least significant bits from an IP address, replacing them with zeroes. In effect, it replaces a host address with a network address for some fixed netblock; for IPv4 addresses, 8-bit truncation corresponds to replacement with a /24 network address. Truncation is a non-reversible generalisation scheme. Note that while truncation is effective for making hosts non-identifiable, it preserves information which can be used to identify an organization, a geographic region, a country, or a continent (or RIR region of responsibility). Truncation to an address length of 0 is equivalent to black-marker anonymisation. Removal of IP address information is only recommended for analysis tasks which have no need to separate flow data by host or network; e.g. as a first stage to per-application (port) or time- series total volume analyses. 5.1.2. Random Permutation Random permutation is a direct substitution technique, replacing each IP address with an address randomly selected from the set of possible IP addresses, guaranteeing that each anonymised address represents a unique original address. The random permutation does not preserve any structural information about a network, but it does preserve the unique count of IP addresses. Any application that requires more structure than host-uniqueness will not be able to use randomly permuted IP addresses. 5.1.3. Prefix-preserving Pseudonymisation Prefix-preserving pseudonymisation is a direct substitution technique, further restricted such that the structure of subnets is Boschi & Trammell Expires October 1, 2009 [Page 9] Internet-Draft IP Flow Anonymisation Support March 2009 preserved at each level while anonymising IP addresses. If two real IP addresses match on a prefix of "n" bits, the two anonymised IP addresses will match on a prefix of "n" bits as well. This is useful when relationships among networks must be preserved for a given analysis task, but introduces structure into the anonymised data which can be exploited in attacks against the anonymisation technique. 5.2. Timestamp Anonymisation The particular time at which a flow began or ended is not particularly identifiable information, but it can be used as part of attacks against other anonymisation techniques or for user profiling. Presice timestamps can be used in injected-traffic fingerprinting attacks [CITE] as well as to identify certain activity by response delay and size fingerprinting [CITE]. Therefore, timestamp information may be anonymised in order to ensure the protection of the entire dataset. +-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation | Generalisation | | Enumeration | Direct or Set Substitution | | Random Shifts | Direct Substitution | +-----------------------+----------------------------+ 5.2.1. Precision Degradation Precision Degradation is a generalisation technique that removes the most precise components of a timestamp, accounting all events occurring in each given interval (e.g. one millisecond for millisecond level degradation) as simultaneous. This has the effect of potentially collapsing many timestamps into one. With this technique time precision is reduced, and sequencing may be lost, but the information at which time the event occurred is preserved. The anonymised data may not be generally useful for applications which require strict sequencing of flows. Note that flow meters with low time precision (e.g. second precision, or millisecond precision on high-capacity networks) perform the equivalent of precision degradation anonymisation by their design. Note also that degradation to a very low precision (e.g. on the order of minutes, hours, or days) is commonly used in analyses operating on time-series aggregated data, and is referred to binning; though the time scales are longer and applicability more restricted, this is in principle the same operation. Boschi & Trammell Expires October 1, 2009 [Page 10] Internet-Draft IP Flow Anonymisation Support March 2009 Precision degradation to infinitely low precision is equivalent to black-marker anonymisation. Removal of timestamp information is only recommended for analysis tasks which have no need to separate flows in time, for example for counting total volumes or unique occurrences of other flow keys in an entire dataset. 5.2.2. Enumeration Enumeration is a substitution function that retains the chronological order in which events occurred while eliminating time information. Timestamps are substituted by equidistant timestamps (or numbers) starting from a randomly chosen start value. The resulting data is useful for applications requiring strict sequencing, but not for those requiring good timing information (e.g. delay- or jitter- measurement for QoS applications or SLA validation). 5.2.3. Random Time Shifts Random time shifts add a random offset to every timestamp within a dataset. This reversible substitution technique therefore retains duration and inter-event interval information as well as chronological order of flows. It is primarily intended to defeat traffic injection fingerprinting attacks. 5.3. Counter Anonymisation Counters (such as packet and octet volumes per flow) are subject to fingerprinting and injection attacks against anonymisation, or for user profiling as timestamps are. Counter anonymisation can help defeat these attacks, but are only usable for analysis tasks for which relative or imprecise magnitudes of activity are useful. +-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation | Generalisation | | Binning | Generalisation | | Random noise addition | Direct or Set Substitution | +-----------------------+----------------------------+ 5.3.1. Precision Degradation As with precision degradation in timestamps, precision degradation of counters removes lower-order bits of the counters, treating all the counters in a given range as having the same value. Depending on the precision reduction, this loses information about the relationships between sizes of similarly-sized flows, but keeps relative magnitude information. Boschi & Trammell Expires October 1, 2009 [Page 11] Internet-Draft IP Flow Anonymisation Support March 2009 5.3.2. Binning Binning can be seen as a special case of precision degradation; the operation is identical, except for in precision degradation the counter ranges are uniform, and in binning they need not be. For example, a common counter binning scheme for packet counters could be to bin values 1-2 together, and 3-infinity together, thereby separating potentially completely-opened TCP connections from unopened ones. Binning schemes are generally chosen to keep precisely the amount of information required in a counter for a given analysis task. Note that, also unlike precision degradation, the bin label need not be within the bin's range. Binning counters to a single bin 0-infinity, or alternately precision degradation to infinitely low precision, is equivalent to black- marker anonymisation. Removal of counter information is only recommended for analysis tasks which have no need to evaluate the removed counter, for example for counting only unique occurrences of other flow keys. 5.3.3. Random Noise Addition Random noise addition adds a random amount to a counter in each flow; this is used to keep relative magnitude information and minimize the disruption to size relationship information while avoiding fingerprinting attacks against anonymisation. Note that there is no guarantee that random noise addition will maintain ranking order by a counter among members of a set. Random noise addition is particularly useful when the derived analysis data will not be presented in such a way as to require the lower-order bits of the counters. 5.4. Anonymisation of Other Flow Fields Other fields, particularly port numbers and protocol numbers, can be used to partially identify the applications that generated the traffic in a a given flow trace. This information can be used in fingerprinting attacks, and may be of interest on its own (e.g., to reveal that a certain application with suspected vulnerabilities is running on a given network). These fields are generally anonymised using one of two techniques. +--------------------+---------------------+ | Scheme | Action | +--------------------+---------------------+ | Binning | Generalisation | | Random Permutation | Direct Substitution | +--------------------+---------------------+ Boschi & Trammell Expires October 1, 2009 [Page 12] Internet-Draft IP Flow Anonymisation Support March 2009 5.4.1. Binning Binning is a generalisation technique mapping a set of potentially non-uniform ranges into a set of abritrarily labeled bins. Common bin arrangements depend on the field type and the analysis application. For example, an IP protocol bin arrangement may preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all other protocols into a single bin, to mitigate the use of uncommon protocols in fingerprinting attacks. Another example arrangement may bin source and destination ports into low (0-1023) and high (1024- 65535) bins in order to tell service from ephemeral ports without identifying individual applications. Binning other flow key fields to a single bin is equivalent to black- marker anonymisation. Removal of other flow key information is only recommended for analysis tasks which have no need to differentiate flows on the removed keys, for example for total traffic counts or unique counts of other flow keys. 5.4.2. Random Permutation Random permutation is a direct substitution technique, replacing each key value with an value randomly selected from the set of possible range, guaranteeing that each anonymised value represents a unique original value. This is used to preserve the count of unique flow key values without preserving information about the keys themselves. 6. Parameters for the Description of Anonymisation Techniques This section details the abstract parameters used to describe the anonymisation techniques examined in the previous section, on a per- parameter basis. These parameters and their export safety inform the design of the IPFIX anonymisation metadata export specified in the following section. 6.1. Stability Any given anonymisation technique may be applied with a varying range of stability. Stability is important for assessing the comparability of anonymised information in different data sets, or in the same data set over different time periods. In general, stability ranges from completely stable to completely unstable; however, note that the completely unstable case is indistinguishable from black-marker anonymisation. A completely stable anonymisation will always map a given value in the real space to the same value in the anonymised space. In practice, an anonymisation may also be stable for every data set published by an a particular producer to a particular Boschi & Trammell Expires October 1, 2009 [Page 13] Internet-Draft IP Flow Anonymisation Support March 2009 consumer, stable for a stated time period within a dataset or across datasets, or stable only for a single data set. If no information about stability is available, users of anonymised data may assume that the techniques used are stable across the entire dataset, but unstable across datasets. Note that stability presents a risk-utility tradeoff, as completely stable anonymisation can be used for longer-term trend analysis tasks but also presents more risk of attack given the stable mapping. 6.2. Truncation Length Truncation and precision degradation are described by the truncation length, or the amount of data still remaining in the anonymised field after anonymisation. Truncation length can be inferred from a given data set, and need not be specially exported or protected. 6.3. Bin Map Binning is described by the specification of a bin mapping function. This function can be generally expressed in terms of an associative array that maps each point in the original space to a bin, although from an implementation standpoint most bin functions are much simpler and more efficient. Since knowledge of the bin mapping function can be used to partially deanonymise binned data, depending on the degree of generalisation, no information about the bin mapping function should be exported. 6.4. Permutation Like binning, permutation is described by the specification of a permutation function. In the general case, this can be expressed in terms of an associative array that maps each point in the original space to a point in the anonymised space. Unlike binning, each point in the anonymised space must correspond to a single, unique point in the original space. Since knowledge of the permutation function can be used to completely deanonymise permuted data, no information about the permutation function or its parameters should be exported. 6.5. Shift Amount Shifting requires an amount to shift each value by. Since the shift amount can be used to deanonymize data protected by shifting, no Boschi & Trammell Expires October 1, 2009 [Page 14] Internet-Draft IP Flow Anonymisation Support March 2009 information about the shift amount should be exported. 7. Anonymisation Export Support in IPFIX Anonymised data exported via IPFIX SHOULD be annotated with anonymisation metadata, which details which fields described by which Templates are anonymised, and provides appropriate information on the anonymisation techniques used. This metadata SHOULD be exported in Data Records described by the recommended Options Templates described in this section; these Options Templates use the additional Information Elements described in the following subsection. Note that fields anonymised using the black-marker (removal) technique do not require any special metadata support. Black-marker anonymised fields SHOULD NOT be exported at all; the absence of the field in a given Data Set is implicitly declared by not including the corresponding Information Element in the Template describing that Data Set; exporting "empty" data elements is inefficient and in the general case impossible, as many non-counter Information Elements do not have semantically distinct null values. 7.1. Anonymisation Options Template The Anonymisation Options Template describes anonymisation records, which allow anonymisation metadata to be exported inline over IPFIX or stored in an IPFIX File, by binding information about anonymisation techniques to Information Elements within defined Templates. IPFIX Exporting Processes SHOULD export anonymisation records for any Template describing exported anonymised Data Records; IPFIX Collecting Processes and processes downstream from them MAY use anonymisation records to treat anonymised data differently depending on the applied technique. An Exporting Process SHOULD export anonymisation records after the Templates they describe have been exported, and SHOULD export anonymisation records reliably. Anonymisation records, like Templates, MUST be handled by Collecting Processes as scoped to the Transport Session in which they are sent. While the anonymisationStability IE can be used to declare that a given anonymisation technique's mapping will remain stable across multiple sessions, each session MUST re-export the anonymisation Records along with the templates. [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the same time is indicated with multiple elements of the same type (in application order as in PSAMP). Need to verify this is actually Boschi & Trammell Expires October 1, 2009 [Page 15] Internet-Draft IP Flow Anonymisation Support March 2009 useful given the defined techniques.] +-------------------------+-----------------------------------------+ | IE | Description | +-------------------------+-----------------------------------------+ | templateId [scope] | The Template ID of the Template | | | containing the Information Element | | | described by this anonymisation record. | | | This Information Element MUST be | | | defined as a Scope Field. | | informationElementId | The Information Element identifier of | | [scope] | the Information Element described by | | | this anonymisation record. This | | | Information Element MUST be defined as | | | a Scope Field. | | informationElementIndex | The Information Element index of the | | [scope] [optional] | instance of the Information Element | | | described by this anonymisation record | | | identified by the informationElementId | | | within the Template. Optional; need | | | only be present when describing | | | Templates that have multiple instances | | | of the same Information Element. This | | | Information Element MUST be defined as | | | a Scope Field if present. This | | | Information Element is defined in | | | Section 7.2, below. | | anonymisationStability | The stability class of the anonymised | | | data. MUST be present. This | | | Information Element is defined in | | | Section 7.2, below. | | anonymisationTechnique | The technique used to anonymise the | | | data. MUST be present. This | | | Information Element is defined in | | | Section 7.2, below. | +-------------------------+-----------------------------------------+ 7.2. Recommended Information Elements for Anonymisation Metadata 7.2.1. anonymisationStability Description: A description of the stability class of the anonymisation technique applied to a referenced Information Element within a referenced Template. Stability classes refer to the stability of the parameters of the anonymisation technique, and therefore the comparability of the mapping between the real and anonymised values over time. This determines which anonymised datasets may be compared with each other. Boschi & Trammell Expires October 1, 2009 [Page 16] Internet-Draft IP Flow Anonymisation Support March 2009 +-------+-----------------------------------------------------------+ | Value | Description | +-------+-----------------------------------------------------------+ | 0 | Undefined: the Exporting Process makes no representation | | | as to how stable the mapping is, or over what time period | | | values of this field will remain comparable; while the | | | Collecting Process MAY assume Session level stability, | | | Session level stability is not guaranteed. This is | | | equivalent to 0x01 Session level stability while advising | | | the Collecting Process that no special effort has been | | | made to ensure stability. Collecting Processes SHOULD | | | assume this is the case in the absence of stability class | | | information; this is the default stability class. | | 1 | Session: the Exporting Process will ensure that the | | | parameters of the anonymisation technique are stable | | | during the Transport Session. All the values of the | | | described Information Element for each Record described | | | by the referenced Template within the Transport Session | | | are comparable. The Exporting Process SHOULD endeavour | | | to ensure at least this stability class. | | 2 | Exporter-Collector Pair: the Exporting Process will | | | ensure that the parameters of the anonymisation technique | | | are stable across Transport Sessions over time with the | | | given Collecting Process, but may use different | | | parameters for different Collecting Processes. Data | | | exported to different Collecting Processes is not | | | comparable. | | 3 | Stable: the Exporting Process will ensure that the | | | parameters of the anonymisation technique are stable | | | across Transport Sessions over time, regardless of the | | | Collecting Process to which it is sent. | +-------+-----------------------------------------------------------+ Abstract Data Type: unsigned8 ElementId: TBD1 Status: Proposed 7.2.2. anonymisationTechnique Description: A description of the anonymisation technique applied to a referenced Information Element within a referenced Template. Boschi & Trammell Expires October 1, 2009 [Page 17] Internet-Draft IP Flow Anonymisation Support March 2009 +-------+-----------------------------------------------------------+ | Value | Description | +-------+-----------------------------------------------------------+ | 0 | Undefined: the Exporting Process makes no representation | | | as to whether the defined field is anonymised or not. | | | While the Collecting Process MAY assume that the field is | | | not anonymised, it is not guaranteed not to be. This is | | | the default anonymisation technique. | | 1 | None: the values exported are real. | | 2 | Precision Degradation/Truncation: the values exported are | | | anonymised using simple precision degradation or | | | truncation. The new precision is implicit in the | | | exported data, and can be deduced by the Collecting | | | Process. | | 3 | Binning: the values exported are anonymised into bins. | | 4 | Enumeration: the values exported are anonymised by | | | enumeration. | | 5 | Permutation: the values exported are anonymised by random | | | permutation. | | 6 | Prefixed Permutation: the values exported are anonymised | | | by random permutation, preserving bit-level structure; | | | this represents prefix-preserving IP address | | | anonymisation. | +-------+-----------------------------------------------------------+ Abstract Data Type: unsigned8 ElementId: TBD2 Status: Proposed 7.2.3. informationElementIndex Description: A zero-based index of an Information Element referenced by informationElementId within a Template referenced by templateId; used to disambiguate scope for templates containing multiple identical Information Elements. Abstract Data Type: unsigned16 ElementId: TBD3 Status: Proposed Boschi & Trammell Expires October 1, 2009 [Page 18] Internet-Draft IP Flow Anonymisation Support March 2009 8. Applying Anonymisation Techniques to IPFIX Export and Storage When exporting or storing anonymised flow data using IPFIX, certain interactions between the IPFIX Protocol and the anonymisation techniques in use must be considered; these are treated in the subsections below. 8.1. Arrangement of Processes in IPFIX Anonymisation Anonymisation may be applied to IPFIX data at three stages within a the collection infrastructure: on initial export, at a mediator, or after collection, as shown in Figure 1. Each of these locations has specific considerations and applicability. +--------------------+ | IPFIX File Storage | +--------------------+ ^ | (Anonymised after collection) | +=======================================+ | Collecting Process | +=======================================+ ^ ^ | (Anonymised at mediator) | | | +=============================+ | | Mediator | | +=============================+ | ^ | | (Anonymised on initial export) | | | +=======================================+ | Exporting Process | +=======================================+ Figure 1: Potential Anonymisation Locations Anonymisation is generally performed before the wider dissemination or repurposing of a flow data set, e.g., adapting operational measurement data for research. Therefore, direct anonymisation of flow data on initial export is only applicable in certain restricted circumstances: when the Exporting Process is "publishing" data to a Collecting Process directly, and the Exporting Process and Collecting Process are operated by different entities. Note that certain guidelines in Section 8.2.2 with respect to timestamp anonymisation may not apply in this case, as the Collecting Process may be able to Boschi & Trammell Expires October 1, 2009 [Page 19] Internet-Draft IP Flow Anonymisation Support March 2009 deduce certain timing information from the time at which each Message is received. A much more flexible arrangement is to anonymise data within a Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data is sent to a Mediator, which performs the anonymisation function and re-exports the anonymised data. Such a Mediator could be located at the administrative domain boundary of the initial Exporting Process operator, exporting anonymised data to other consumers outside the organisation. In this case, the original Exporter SHOULD use TLS as specified in [RFC5101] to secure the channel to the Mediator, and the Mediator should follow the guidelines in Section 8.2, to mitigate the risk of original data disclosure. When data is to be published as an anonymised data set in an IPFIX File [I-D.ietf-ipfix-file], the anonymisation may be done at the final Collecting Process before storage and dissemination, as well. In this case, the Collector should follow the guidelines in Section 8.2, especially as regards File-specific Options in Section 8.2.3 Note that anonymisation may occur at more than one location within a given collection infrastructure, to provide varying levels of anonymisation reversal risk and utility for specific purposes. 8.2. IPFIX-Specific Anonymisation Guidelines In implementing and deploying the anonymisation techniques described in this document, care must be taken that data structures supporting the operation of the protocol itself do not leak data that could be used to reverse the anonymisation applied to the flow data. Such data structures may appear in the header, or within the data stream itself, especially as options data. Each of these and their impact on specific anonymisation techniques is noted in a separate subsection below. 8.2.1. Appropriate Use of Information Elements for Anonymised Data [TODO: reiterate black-marker guidelines here] [TODO: note that precision degradation SHOULD use appropriately-sized fields] 8.2.2. Anonymisation of Header Data Each IPFIX Message contains a Message Header; within this Message Header are contained two fields which may be used to break certain anonymisation techniques: the Export Time, and the Observation Domain Boschi & Trammell Expires October 1, 2009 [Page 20] Internet-Draft IP Flow Anonymisation Support March 2009 ID Export of IPFIX Messages containing anonymised timestamp data where the original Export Time Message header has some relationship to the anonymised timestamps SHOULD anonymise the Export Time header field using an equivalent technique, if possible. Otherwise, relationships between export and flow time could be used to partially or totally reverse timestamp anonymisation. The similarity in size between an Observation Domain ID and an IPv4 address (32 bits) may lead to a temptation to use an IPv4 interface address on the Metering or Exporting Process as the Observation Domain ID. If this address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data are anonymised in a structure-preserving way, then the Observation Domain ID may be used to break the IP address anonymisation. Use of an IPv4 interface address on the Metering or Exporting Process as the Observation Domain ID is NOT RECOMMENDED in this case. 8.2.3. Anonymisation of Options Data IPFIX uses the Options mechanism to export, among other things, metadata about exported flows and the flow collection infrastructure. As with the IPFIX Message Header, certain Options recommended in [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing flow timestamps and network addresses of Exporting and Collecting Processes may be used to break certain anonymisation techniques; care should be taken while using them with anonymised data export and storage. The Exporting Process Reliability Statistics Options Template, recommended in [RFC5101], contains an Exporting Process ID field, which may be an exportingProcessIPv4Address Information Element or an exportingProcessIPv6Address Information Element. If the Exporting Process address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data are anonymised in a structure- preserving way, then the Exporting Process address may be used to break the IP address anonymisation. Exporting Processes exporting anonymised data in this situation SHOULD mitigate the risk of attack either by omitting Options described by the Exporting Process Reliability Statistics Options Template, or by anonymising the Exporting Process address using a similar technique to that used to anonymise the IP addresses in the exported data. Similarly, the Export Session Details Options Template and Message Details Options Template specified for the IPFIX File Format Boschi & Trammell Expires October 1, 2009 [Page 21] Internet-Draft IP Flow Anonymisation Support March 2009 [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address Information Element or the exportingProcessIPv6Address Information Element to identify an Exporting Process from which a flow record was received, and the collectingProcessIPv4Address Information Element or the collectingProcessIPv6Address Information Element to identify the Collecting Process which received it. If the Exporting Process or Collecting Process address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data are anonymised in a structure- preserving way, then the Exporting Process or Collecting Process address may be used to break the IP address anonymisation. Since these Options Templates are primarily intended for storing IPFIX Transport Session data for auditing, replay, and testing purposes, it is NOT RECOMMENDED that storage of anonymised data include these Options Templates in order to mitigate the risk of attack. The Message Details Options Template specified for the IPFIX File Format [I-D.ietf-ipfix-file] also contains the collectionTimeMilliseconds Information Element. As with the Export Time Message Header field, if the exported flow data contains anonymised timestamp information, and the collectionTimeMilliseconds Information Element in a given Message has some relationship to the anonymised timestamp information, then this relationship can be exploited to reverse the timestamp anonymisation. Since this Options Template is primarily intended for storing IPFIX Transport Session data for auditing, replay, and testing purposes, it is NOT RECOMMENDED that storage of anonymised data include this Options Template in order to mitigate the risk of attack. Since the Time Window Options Template specified for the IPFIX File Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow data to provide partial table of contents information for an IPFIX File, care must be taken to ensure that Options described by this template are written using the anonymised timestamps instead of the original ones. 9. Examples [TODO: write this section.] 10. Security Considerations [TODO: write this section.] Boschi & Trammell Expires October 1, 2009 [Page 22] Internet-Draft IP Flow Anonymisation Support March 2009 11. IANA Considerations This document contains no actions for IANA. [EDITOR'S NOTE: creation of anonymisationStability and anonymisationTechnique registries may change this.] 12. Acknowledgments We thank Paul Aitken for his comments and insight, and the PRISM project for its support of this work. 13. References 13.1. Normative References [RFC5101] Claise, B., "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information", RFC 5101, January 2008. [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, January 2008. 13.2. Informative References [I-D.ietf-ipfix-as] Zseby, T., "IPFIX Applicability", draft-ietf-ipfix-as-12 (work in progress), July 2007. [I-D.ietf-ipfix-architecture] Sadasivan, G., "Architecture for IP Flow Information Export", draft-ietf-ipfix-architecture-12 (work in progress), September 2006. [I-D.ietf-ipfix-file] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. Wagner, "Specification of the IPFIX File Format", draft-ietf-ipfix-file-03 (work in progress), October 2008. [I-D.ietf-ipfix-mediators-framework] Kobayashi, A., Nishida, H., and B. Claise, "IPFIX Mediation: Framework", draft-ietf-ipfix-mediators-framework-02 (work in progress), February 2009. Boschi & Trammell Expires October 1, 2009 [Page 23] Internet-Draft IP Flow Anonymisation Support March 2009 [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Authors' Addresses Elisa Boschi Hitachi Europe c/o ETH Zurich Gloriastrasse 35 8092 Zurich Switzerland Phone: +41 44 632 70 57 Email: elisa.boschi@hitachi-eu.com Brian Trammell Hitachi Europe c/o ETH Zurich Gloriastrasse 35 8092 Zurich Switzerland Phone: +41 44 632 70 13 Email: brian.trammell@hitachi-eu.com Boschi & Trammell Expires October 1, 2009 [Page 24]