Network Working Group Greg Bernstein Internet Draft Grotto Networking Intended status: Informational Young Lee Huawei March 12, 2012 Use Cases for High Bandwidth Query and Control of Core Networks draft-bernstein-alto-large-bandwidth-cases-01.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 12, 2011. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Bernstein & Lee, et al. Expires September 12, 2012 [Page 1] Internet-Draft Cross Stratum Optimization Use-cases March 2012 carefully, as they describe your rights and restrictions with respect to this document. Abstract This draft describes two generic use-cases that illustrate application layer traffic optimization applied to high bandwidth core networks. The type of information and interactions needed to perform various optimizations is described. In addition, extensions to the existing ALTO protocol are suggested that provide this functionality. Table of Contents 1. Introduction...................................................3 1.1. Computing Clouds, Data Centers, and End Systems...........4 2. End System Aggregate Networking................................5 2.1. Aggregated Bandwidth Scaling..............................5 2.2. Cross Stratum Optimization Example........................6 2.3. Data Center and Network Faults and Recovery...............7 2.4. Cross Stratum Control Interfaces..........................8 3. Data Center to Data Center Networking..........................9 3.1. Cross Stratum Optimization Examples.......................9 3.2. Network and Data Center Faults and Reliability...........10 4. Potential ALTO Protocol Extensions............................11 4.1. High Bandwidth Network Information.......................12 4.1.1. Maximum Reservable Bandwidth........................13 4.1.2. Latency Information.................................14 4.1.3. Endpoint Access Bandwidth Capacity..................14 4.2. Network Information via Constraint and Cost Graph........14 4.3. Network Updates and Notifications........................17 4.3.1. Notification Interface..............................17 4.4. Application-Network Reservation Interface................18 4.4.1. IP Bypass/Traffic Engineering.......................18 4.4.2. High Bandwidth Reservation/Recovery Interface.......19 5. Conclusion....................................................19 6. Security Considerations.......................................20 7. IANA Considerations...........................................20 8. References....................................................20 8.1. Informative References...................................20 Author's Addresses...............................................22 Intellectual Property Statement..................................22 Disclaimer of Validity...........................................22 Bernstein & Lee Expires September 12, 2012 [Page 2] Internet-Draft Cross Stratum Optimization Use-cases March 2012 1. Introduction Cloud Computing, network applications, software as a service (SaaS), Platform as a service (PaaS), and Infrastructure as a Service (IaaS), are just a few of the terms used to describe situations where multiple computation entities interact with one another across a network. When the communication resources consumed by these interacting entities is significant compared with link or network capacity then opportunities may exist for more efficient utilization of available computation and network resources if both computation and network stratums cooperate in some way. The application layer traffic optimization (ALTO) working group is tackling the similar problem of "better-than-random peer selection" for distributed applications based on peer to peer (P2P) or client server architectures [1]. In addition, such optimization is important in content distribution networks (CDNs) as illustrated in [2]. In the network stratum, particularly at the lower layers such as MPLS and optical, there are many restoration and recovery mechanisms to deal with network faults. The emergence of network based applications or cloud based disaster recovery/business recovery brings a new dimension to fault management, but also opportunities to more efficiently deliver higher levels of reliability. For example, the reliability requirements for mission critical applications are typically quantified by two key time parameters. The first is the Recovery Time Objective (RTO) which is the time to get the application back up and functioning and is similar to network recovery time notions. The second is the Recovery Point Objective (RPO) which quantifies in terms of time the amount of data loss that can be tolerated when a disaster occurs. Different applications and organizations can have greatly different demands from miliseconds to 12 hours. In addition, the amount of data that may need to be transferred to meet these objectives can vary greatly amongst different application types. With recover point objectives of, say an hour or more, a dynamic optical network layer could be very efficiently shared so as to reduce the overall cost to achieve a given layer of reliability. However, to do so requires cooperation between application and network stratum. General multi-protocol label switching (GMPLS) [3] can and is being applied to various core networking technologies such as SONET/SDH and wavelength division multiplexing (WDM) [4]. GMPLS provides dynamic network topology and resource information, and the capability to dynamically allocate resources (provision label switched paths). Furthermore, the path computation element (PCE) [5] provides for traffic engineered path optimization. Bernstein & Lee Expires September 12, 2012 [Page 3] Internet-Draft Cross Stratum Optimization Use-cases March 2012 However, neither GMPLS nor PCE provide interfaces that are appropriate for an application layer entity to use for the following reasons: . GMPLS routing exposes full network topology information which tends to be proprietary to a carrier or require specialized knowledge and techniques to make use of, e.g., the routing and wavelength assignment (RWA) problem in WDM networks [4]. . Core networks typically consist of two or more layers, while applications are typically only know about the IP layer and above. Hence applications would not be able to make direct use of PCE capabilities. . GMPLS signaling interfaces are defined for either peer GMPLS nodes or via a user network interface (UNI) [6]. Neither of these are appropriate for direct use by an application entity. In this paper we discuss two general use-cases that can generate core network flows with significant bandwidth and may vary significantly over time. The "cross stratum optimization" problems generated by these use cases are discussed. Finally, we look at interfaces between the application and network "stratums" that can enable these types of optimizations and how they can be created via extensions to the current ALTO protocol[7]. 1.1. Computing Clouds, Data Centers, and End Systems While the definition of cloud computing or compute clouds is somewhat nebulous (or "foggy" if you will) [8], the physical instantiation of compute resources with network connectivity is very real and bounded by physical and logical constraints. For the purposes of this draft, we will call any network connected compute resources a data center if its network connectivity is significant compared either to the bandwidth of an individual WDM wavelength or with respect to the network links in which it is located. Hence we include in our definition very large data centers that feature multiple fiber access and consume more than 10MW of power, moderate to large content distribution network (CDN) installations located in or near major internet exchange points, medium sized business centers, etc... We will refer to those computational entities that don't meet our bandwidth criteria for a data center as an "end system". Bernstein & Lee Expires September 12, 2012 [Page 4] Internet-Draft Cross Stratum Optimization Use-cases March 2012 2. End System Aggregate Networking In this section we consider the fundamental use case of end systems communicating with data centers as shown in Figure 1. In this figure the "clients" are end systems with relatively small access bandwidth compared to a WDM wavelength, e.g., under 100Mbps. We show these clients roughly partitioned into three network related end user regions ("A", "B", and "C"). Given a particular network application, in a static network application situation, each client in a region would be associated with a particular data center. Region B +---------+ +------+ | Data | |Client| |Center 2 | | B1 |+------+ +------+ +----+----+ +--+---+|Client| |Client| | / | B2 | | A1 `. _.-+--------+-. +--+---+ Region A +------+ `-. ,-'' `--. / ... +------+ ,`: `+. +------+ |Client| / \ |Client| | A2 +------+ \---+ BM | +------+ ( Network ) +------+ ... .-' / +------+ _.-' \ `. |Client|.-' `=. ,-' `. | AN | _.-'' `--. _.-\ +---`.----+ +------+ +----'----+ `----+------+'' \ | Data | | Data | | \ | |Center 3 | |Center 1 | +--+---+ +--+---+ \ +---------+ +---------+ |Client| |Client| \------+ | C1 | | C2 | |Client| +------+ +------+ | CK | Region C +------+ Figure 1. End system to data center communications. 2.1. Aggregated Bandwidth Scaling One of the simplest examples where the aggregation of end system bandwidth can quickly become significant to the "network" is for video on demand (VoD) streaming services. Unlike a live streaming service where IP or lower layer multicast techniques can be generally applied, in VoD the transmissions are unique between the data center and clients. For regular quality VoD we'll use an estimate of 1.5Mbps per stream (assuming H.264 coding), for HD VoD we'll use an estimate of 10Mbps per stream. To fill up a 10Gbps capacity optical wavelength requires either 6,666 or 1,000 clients Bernstein & Lee Expires September 12, 2012 [Page 5] Internet-Draft Cross Stratum Optimization Use-cases March 2012 for regular or high definition respectively. Note that special multicasting techniques such as those discussed in [9] and peer assistance techniques such as provided in some commercial systems [10] can reduce the overall network bandwidth requirements. With current high speed internet deployment such numbers of clients are easily achieved; in addition demand for VoD services can vary significantly over time, e.g., new video releases, inclement weather (increases number of viewers), etc... 2.2. Cross Stratum Optimization Example In an ideal world both data centers and networks would have unlimited capacity, however in actuality both can have constraints and possibly varying marginal costs that vary with load or time of day. For example suppose that in Figure 1 that Data Center 3 has been primarily serving VoD to region "C" but that it has, at a particular period in time, run out of computation capacity to serve all the client requests coming from region "C". At this point we have a fundamental cross stratum optimization (CSO) problem. We want to see if we can accommodate additional client request from region "C" by using a different data center than the fully utilized data center #3. To answer this questions we need to know (a) available capacity on other data centers to meet a request, (b) the marginal (incremental) cost of servicing the request on a particular data center with spare capacity, (c) the ability of the network to provide bandwidth between region "C" to a data center, and (d) the incremental cost of bandwidth from region "C" to a data center. Bernstein & Lee Expires September 12, 2012 [Page 6] Internet-Draft Cross Stratum Optimization Use-cases March 2012 Region B +---------+ +------+ | Data | |Client| |Center 2 | | B1 |+------+ +------+ +----+----+ +--+---+|Client| |Client| | / | B2 | | A1 `. _.-+--------+-. +--+---+ Region A +------+ `-. ,-'' XXXXX XX `--. / ... +------+ ,`: ``---..__ XXXX `+. +------+ |Client| / X | ```--XX \ |Client| | A2 +------+..X`. \ XX--+---+ BM | +------+ ( X `-/ \ ) +------+ ... .-' .' | +----.X / +------+ _.-' \ X/ \ | X `. |Client|.-' `=.X \ XXXX ,-' `. | AN | _.-'' `--. XXXXXXXXX _.-\ +---`.----+ +------+ +----'----+ `----+------+'' \ | Data | | Data | | \ | |Center 3 | |Center 1 | +--+---+ +--+---+ \ +---------+ +---------+ |Client| |Client| \------+ | C1 | | C2 | |Client| +------+ +------+ | CK | Region C +------+ Figure 2. Aggregated flows between end systems and data centers. In Figure 2 we show a possible result of solving the previously mentioned CSO problem. Here we show the additional client requests from region "C" being serviced by data center #2 across the network. Figure 2 also illustrates the possibility of setting up "express" routes across the network at the MPLS level or below. Such techniques, known as "optical grooming" or "optical bypass"[11],[12] at the optical layer, can result in significant equipment and power savings for the network by "bypassing" higher level routers and switches. 2.3. Data Center and Network Faults and Recovery Data center failures, whether partial or complete, can have a major impact on revenues in the VoD example previously described. If there is excess capacity in other data centers within the network associated with the same application then clients could be redirected to those other centers if the network has the capacity. Moreover, MPLS and GMPLS controlled networks have the ability to reroute traffic very quickly while preserving QoS. As with general network recovery techniques [13] various combinations of pre- Bernstein & Lee Expires September 12, 2012 [Page 7] Internet-Draft Cross Stratum Optimization Use-cases March 2012 planning and "on the fly" approaches can be used to tradeoff between recovery time and excess network capacity needed for recovery. In the case of network failures there is the potential for clients to be redirected to other data centers to avoid failed or over utilized links. 2.4. Cross Stratum Control Interfaces Two types of load balancing techniques are currently utilized in cloud computing. The first is load balancing within a data center and is sometimes referred to as local load balancing. Here one is concerned with distributing requests to appropriate machines (or virtual machines) in a pool based on the current machine utilization. The second type of load balancing is known as global load balancing and is used to assign clients to a particular data center out of a choice of more than one within the network and is our concern here. A number of commercial vendors offer both local and global load balancing products. Currently global load balancing systems have very little knowledge of the underlying network. To make better assignments of clients to data centers many of these systems use geographic information based on IP addresses. Hence we see that current systems are attempting to perform cross stratum optimization albeit with very coarse network information. A more elaborate interface for CSO in the client aggregation case would be: 1. A Network Query Interface - Where the global load balancer can inquire as to the bandwidth availability between "client regions" and data centers. 2. A Network Resource Reservation Interface - Where the global load balancer can make explicit requests for bandwidth between client regions and data centers. 3. A Fault Recovery Interface - For the global load balancer to make requests for expedited bulk rerouting of client traffic from one data center to another. Or for the network layer to make requests to the application to help deal with network faults. The network query interface can be considered a superset of the functionality supported by the current ALTO protocol [7]. Potential extensions are detailed in section 4. Bernstein & Lee Expires September 12, 2012 [Page 8] Internet-Draft Cross Stratum Optimization Use-cases March 2012 3. Data Center to Data Center Networking There are a number of motivations for data center to data center communications: on demand capacity expansion ("cloud bursting"), cooperative exchanges between business partners, offsite data backup, "rent before building", etc... In Figure 3 we show an example where a number of businesses each with an "internal data center" contracts with a large external data center for additional computational (which may include storage) capacity. The data centers may connect to each other via IP transit type services or more typically via some type of Ethernet virtual private line or LAN service. +-------------------+ | | | Large Data Center | | | +----------+--------+ | _.+-----------. ,--'' `---. ,-' `-. ,' `. ,' `. +--------+ ; Network : |Business| __..+ | | #1 DC +-' : ; +--------+ `. ,' `. ;: `-. ,-' \ `---. _.--' +--`.----+ `+-----------'' |Business| / | #N DC | | +--------+ +----+---+ |Business| | #2 DC | +--------+ Figure 3. Basic data center to data center networking. 3.1. Cross Stratum Optimization Examples In the DC-to-DC example of Figure 3 we can have computational constraints/limits at both local and remote data centers; fixed and marginal computational costs at local and remote data centers; and Bernstein & Lee Expires September 12, 2012 [Page 9] Internet-Draft Cross Stratum Optimization Use-cases March 2012 network bandwidth costs and constraints between data centers. Note that computing costs could vary by the time of day along with the cost of power and demand. Some cloud providers have quite sophisticated compute pricing models including: reserved, on demand, and spot (auction) variants. In addition, to possibly dynamically changing pricing, traffic loads between data centers can be quite dynamic. In addition, data movement between data centers is another source of large network usage variation. Such peaks can be due to scheduled daily or weekly offsite data backup, bulk VM migration to a new data center, periodic virtual machine migration, etc... 3.2. Network and Data Center Faults and Reliability For networked applications that require high levels of reliability/availability the network diagram of Figure 4 could be enhanced with redundant business locations and external data centers as shown in Figure 4. For example cell phone subscriber databases and financial transactions generally require what is called geographic database replication and results in extra communication between sites supporting high availability. For example if business #1 in Figure 4 required a highly available database related service then there would be an additional communication flows from the data center "1a" to data center "1b". Furthermore, if business #1 has outsourced some of its computation and storage needs to independent data center X then for resilience it may want/need to replicate (hot-hot redundancy) this information at independent data center Y. Bernstein & Lee Expires September 12, 2012 [Page 10] Internet-Draft Cross Stratum Optimization Use-cases March 2012 +-------------+ +-------------+ |Independent | |Independent | |Data Center X| |Data Center Y| +-----+-------+ +------+------+ \ / `. _.------------. .' \--'' `-+-. ,-' `-. +--------+ ,' `. .'Business| ,' `.-' |#N DC-a | ; Network : +--------+ +--------+ | | |Business+--- ; |#1 DC-a | `. +: +--------+ `. ;/ \ `-. ,-' `. .'`---. _.--' +--`.----+ +--------+ / `+-+---------\' |Business| |Business| .' | \ |#N DC-a | |#1 DC-b .' / \ +--------+ +--------+ | \ +----+---+ +--------+ |Business| |Business| |#2 DC-a | |#2 DC-b | +--------+ +--------+ Figure 4. Data center to data center networking with redundancy. 4. Potential ALTO Protocol Extensions This section discusses the applicability of the ALTO protocol and necessary extensions to support the high bandwidth consuming use cases previously covered. Before doing so we discuss general properties of the high bandwidth scenarios that may differ significantly from other uses of the ALTO protocol. The first has to do with scope and scale. The consumer of high bandwidth alto extensions is typically some type of application controller within a data center, as opposed to an individual end user. The number of such entities with a need for the high bandwidth related information is orders of magnitude smaller than, say, peer to peer networking users, or applications closer to the end user. Since a network provider may consider this information sensitive, there may be a desire to limit its distribution to a "pre- registered" set of entities. Bernstein & Lee Expires September 12, 2012 [Page 11] Internet-Draft Cross Stratum Optimization Use-cases March 2012 Secondly, there is the notion of time scales. In cloud services we already see variants such as "on demand" compute instances and "reserved" compute instances. For network resource queries we may be concerned with (a) current bandwidth availability, (b) bandwidth availability at a future time, or (c) bandwidth for a bulk data transfer of a given amount that must take place within a given time window. Time-dependent bandwidth information can be and typically are considered in network planning and provisioning systems. For example, a VoD provider knows ahead of time when the latest "blockbuster" film will be available via its service and can make estimates based on historical data on the bandwidth that it will need to deal with the subsequent demand. The following discussions, however, are restricted to "current time" for now. Finally another goal in the design of an interface between the application and networking stratums is to minimize the need for either stratum to know too much about the inner workings of the other. Hence as much as possible it is desired to insulate the applications stratum from technology specifics of the network. That said, data centers providing IaaS may prefer to specify flows and connectivity at a layer below IP such as Ethernet. 4.1. High Bandwidth Network Information ALTO's network map and cost map concepts can be used to support the aforementioned high bandwidth use cases. In this section we will explore both how they could be used in high bandwidth "core" networks and how they might be extended to better support large bandwidth optimization. The ALTO concept of provider defined network location identifier, (PID), is a powerful network abstraction mechanism that is also appropriate for optical/high bandwidth scenarios. For example, a network provider could assign PIDs to WDM ROADMs or OTN switches providing access to an optical core network. All subtending datacenters or hosts would have their IP addresses grouped with such a PID. The collection of these would form an ALTO network map. Furthermore, a corresponding ALTO cost map can be used by the network to indicate preferred connectivity. Since not all these entities necessarily connect directly to an edge WDM ROADM or OTN switch, ALTO's Endpoint Property Service can be used to denote the type of interface supported by an end system or data center and its bandwidth capabilities. Bernstein & Lee Expires September 12, 2012 [Page 12] Internet-Draft Cross Stratum Optimization Use-cases March 2012 4.1.1. Maximum Reservable Bandwidth The amount of bandwidth of available between two sites or subnetworks can be of prime interest to large bandwidth consuming applications. Unlike "unused" IP bandwidth, sub-IP bandwidth such as that from SDH, OTN, and WDM cannot be probed from a network edge or application. The only way to find out if such bandwidth could be allocated to a particular application data flow is to query the network. One may want to query the network as to the reservable bandwidth in a number of different cases: (a) Bandwidth available between a single source destination pair (two PIDs) (b) Bandwidth between one particular source (PID) and several other destinations (PIDs) (c) Bandwidth between one set of sources (PIDs) and another set of destinations (PIDs). Case (a), bandwidth between two points, is well defined, however, in cases (b) and (c) there is some ambiguity. For example in (c) are we considering multiple sources communicating with multiple destinations at the same time? Do some of these pairs interfere with each other? To fully understand such constraints some type of constrained graph abstraction would be needed. However, if we restrict the question in cases (b) and (c) to what is the maximum reservable bandwidth between each source and destination pair within the sets considered individually, then the question is unambiguous, useful, and can fit within ALTO's existing cost map structure (section 5.2 [7]). A new ALTO cost type of "reservable bandwidth" can be defined for this purpose. This would be a "numeric" cost type that represents the actual bandwidth in the unit of, say, Mbps. From the point of view of an optical network, an extended ALTO request would arrive at our extended ALTO server asking for the "reservable bandwidth" between multiple Source Network Locations, say [Src_1, Src_2, ..., Src_m], and a list of multiple Destination Network Locations, say [Dst_1, Dst_2, ..., Dst_n]. The network computing entity would calculate the "reservable bandwidth" between all of these individual source destination pairs. The extended ALTO Server would then return the "reservable bandwidth" as an ALTO Path Cost for each communicating pair (i.e., Src_1 -> Dst_1, ..., Src_1 - > Dst_n, ..., Src_m -> Dst_1, ..., Src_m -> Dst_n). Bernstein & Lee Expires September 12, 2012 [Page 13] Internet-Draft Cross Stratum Optimization Use-cases March 2012 4.1.2. Latency Information Latency information, either fixed due to propagation delay times, or statistical measures due to queuing induced delays can be similarly represented via ALTO's cost map structure. When choosing amongst flows between multiple data centers utilizing significant amounts of bandwidth, alternative routes with differing latency may need to be considered. In such a situation, a simple latency cost map, may need to be replaced by an abstract graph model to allow for more effective optimization of resources. 4.1.3. Endpoint Access Bandwidth Capacity There are a number of standard sized pipes used to access high bandwidth networks and these can either be larger or smaller than the bandwidth availability within various portions of the network. Hence to make good use of network resources it is desirable to advertise and endpoints access bandwidth capacity. Typically this would be a number in terms of Mbps or Gbps and would reflect the true bandwidth available to the endpoint after upstream bottlenecks or overhead is taken into account. This information could be advertised via ALTO's endpoint property service. 4.2. Network Information via Constraint and Cost Graph As discussed in the previous section, as the desired connectivity between locations becomes more complex (rather than exclusive point to point),the basic ALTO cost map structure can be insufficient to reveal network bottlenecks and hence optimization decision points. Consider the network shown in Figure 5, where DC indicates a data center, ER an end user region (as in the end user aggregation use case), N a switching node of some sort, and L a link. The link capacities and costs are also shown on the figure as well as a cost map between [ER1, ER2] and [DC1, DC2, DC3]. Since the network has a tree structure (very unusual but easier to draw in ASCII art), the cost map is unique. As an illustration, assume that the maximum available capacity between any individual end region and a data center is 5 units (i.e., L1=L2=L5=L6=5). However, link L3 (capacity 8 units) represents a bottle neck to all the data centers (L3 is on all the paths to DC1, DC2, or DC3 from all end regions, ER1 and ER2). In a similar way, link L4 (capacity 6 units) represents a bottle neck to data centers DC1 and DC2 from all end regions, ER1 and ER2. A simple "cost map" like structure misses these bottle necks. Bernstein & Lee Expires September 12, 2012 [Page 14] Internet-Draft Cross Stratum Optimization Use-cases March 2012 ,---. L1 +----+ ( ER1 )`-. L5 .'|DC1 | `---' `-._ ,-. / +----+ (N1 ) L3 ,-..' .-'`-' `-.__ L4__.+(N3 ) ,---. .' `-.,-..--'' `-'`. +----+ ( ER2 ).-'L2 (N2 ) L6 `-.|DC2 | `---' `-'`-._ +----+ `-. Link Capacity Cost `-._ L1 5 1 L7 `-._ L2 5 2 `-._ L3 8 1 `-. L4 6 2 Cost Map `-._ +----+ L5 5 1 DC1 DC2 DC3 `-._|DC3 | L6 5 1 ER1 5 5 8 +----+ L7 10 6 ER2 6 6 9 Figure 5. Example network illustrating bottlenecks With the current ALTO cost map structure, the least cost path from ER1 would be either to DC1 or DC2. However, with the proposed capacitated cost map, the connection from ER1 to DC3 could be a better choice than the rest depending on the relative cost of network resources to data center resources. A more general and relatively efficient alternative is to provide the requestor with a capacitated and multiply weighted graph that approximates and abstracts the capabilities of the network as seen by the source and destination location sets. The creation of an approximate graph model to represent the network for cross layer optimization purposes is similar to the well-known topology aggregation problem [14] and [15], but different in a number of respects. First, the goal is not the approximation of the network structure for general route computation use, but a view of only a portion of the network relevant to the participating locations that approximates the costs and constraints amongst these locations. Second, the specific technologies underlying the costs and constraints are of no interest to the application layer and hence much technology specific layer information that one sees in GMPLS link state routing databases would be absent in such a graph. Bernstein & Lee Expires September 12, 2012 [Page 15] Internet-Draft Cross Stratum Optimization Use-cases March 2012 Like the current ALTO filtered cost map, a request for a capacitated, weighted graph would take source and destination PIDs as inputs. In JSON notation we could represent the resulting graph as an JSON object containing link objects. A first cut encoding could be something like: object { LinkEntry [LinkName]<0..*>; } CostConstraintGraphData; object { PIDName: a-end; // Node name at one side of the link PIDName: z-end; // Node name at the other side of the link Weight: wt; JSONNumber: latency; Capacity: r-cap; // Reservable capacity } LinkEntry; Where a link name is formatted like a PIDName (but names a link), and PID names are used for both provider defined location and provider defined internal model node identification. A graph representation of the network of 0 might look like: { "meta" : {}, "data" : { "graph": { "L1": {"a-end":"ER1", "z-end":"N1", "wt":1,"r-cap":5}, "L2": {"a-end":"ER2", "z-end":"N1", "wt":2,"r-cap":5}, "L3": {"a-end":"N1", "z-end":"N2", "wt":1,"r-cap":8}, "L4": {"a-end":"N2", "z-end":"N3", "wt":2,"r-cap":6}, "L5": {"a-end":"N3", "z-end":"DC1", "wt":1,"r-cap":5}, "L6": {"a-end":"N3", "z-end":"DC2", "wt":1,"r-cap":5}, "L7": {"a-end":"N2", "z-end":"DC3", "wt":6,"r-cap":10} } } } Bernstein & Lee Expires September 12, 2012 [Page 16] Internet-Draft Cross Stratum Optimization Use-cases March 2012 4.3. Network Updates and Notifications Changing conditions in the network such as costs or capacity may need to be relayed to the application layer in suitable form and in a time frame relative to their importance to service QoS, service delivery, or cross layer optimization. Network fault conditions can affect service QoS in a number of ways. The most obvious being a significant reduction in capacity to current application flows. In such a case the application would want to be notified as soon as possible and take remedial action. In other cases a network fault may only be observable as an increase in latency (due to increased length of recovered optical path) such an increase may not immediately result in breach of a service level agreement (SLA) but could cumulatively over time. Hence notification of such a change in condition would need to be timely and the network may qualify if the change of state is relatively permanent or what the duration may be. Some applications, such as those involving bulk file transfer, may have flexible time windows, with the exact time the service is rendered dictated by network availability. In particular, the network takes advantage of application flexibility in the exact scheduling of the network resources to be used. Such occurrences may be non-recurring, e.g. a one off bulk file transfer, or recurring as would be common in cloud based system backup and restore applications. In this case the notification from the network needs to relatively timely (but most likely on the order of seconds rather than milliseconds), is specific to a particular network service instance rather than raw network cost or capacity, and the entire notification process may require a non-repudiation security assurance. Changes in the network that only affect costs but not QoS can affect the cross layer optimization of an existing application. The time frame for such notifications would typically be in terms of fractions of an hour to days. 4.3.1. Notification Interface With the exception of the "notification of network service instance availability", all other notifications can be made via modifications or updates to suitably extended network or cost maps, or graphs. Since the high bandwidth uses cases deal with a rather restricted user group, a number of implementation mechanisms may be possible, that may not be viable in a more general ALTO deployment. For example, with a capacitated graph representation we may selectively Bernstein & Lee Expires September 12, 2012 [Page 17] Internet-Draft Cross Stratum Optimization Use-cases March 2012 update specific links of the graph for particular application entities. Note that in order to do this the network layer would need to keep track of the graph models in use by specific application entities and update them as appropriate. 4.4. Application-Network Reservation Interface The network query interfaces previously discussed allows the application layer to find out about the options, costs, and capabilities available from the network layer in a suitably high level but actionable format. However, it remains to specify an interface for the application layer to communicate its usage intent to the network layer and possibly make firm commitments for scarce network resources. Before delving into this interface we first look a bit at what happens behind the scenes in high bandwidth networks. 4.4.1. IP Bypass/Traffic Engineering There are various ways to alter the path that IP flows take through a network. Two IETF standard ways are via DiffServe [16] and MPLS-TE [17]. Both mechanism start with IP packet classification but in MPLS-TE a packet belonging to a flow matching an MPLS forwarding equivalence class (FEC) will be "pulled" from normal IP packet forwarding and place in a MPLS tunnel, known as a label switched path. It will then be forwarded on via MPLS mechanisms bypassing the IP layer until it "pops" out of its MPLS tunnel and rejoins the IP forwarding world (hopefully much closer to its intended destination and making better use of network resources along the way). In the SONET, SDH, G.709, and WDM world a similar process can take place, but is known by the term "grooming" [11],[12]. In both cases network resources including bandwidth, equipment, and power can be significantly optimized by essentially setting up "express lanes" at a lower layer in the network's protocol stack. Note that with optical transport networks there can many layers below "layer 2", i.e., one can think of the "physical" layer as possibly consisting of a number of different sub-layers. If the application layer by knowing its usage patterns or required network usage can let the network its needs then IP/Optical bypass can be more readily be performed on a dynamic basis, particularly if the network has a GMPLS infrastructure. The application layer should not need to know the specifics of how the IP bypass occurs, e.g., via MPLS, OpenFlow, or other standard or proprietary techniques. Bernstein & Lee Expires September 12, 2012 [Page 18] Internet-Draft Cross Stratum Optimization Use-cases March 2012 4.4.2. High Bandwidth Reservation/Recovery Interface As previously stated the application layer should not be exposed to the details of networking mechanisms that will provide the bandwidth and QoS guarantees. Hence the application layer would specify its demands in terms of IP flows such as when specifying an MPLS FEC. It is for further study whether some IaaS applications may want to deal with layer 2 (Ethernet) flows rather than IP. In either case the basic principles would be the same. Note that a bandwidth reservation interface such as this could also be used when application layer is seeking network help in dealing with disaster recovery and business continuity. A number of current protocols come close to the features desired of such an interface, but none are completely appropriate. A short summary follows: (a) PCE: The PCE interface takes requests for connections with various optimization conditions supported. PCEs though return the computed paths to the requester, something of which is undesired in our reservation interface. Note that PCE is built directly on TCP. (b) UNIs (GMPLS and OIF): UNIs provide RSVP-TE based signaling interfaces for connection requests at a particular layer. Such interfaces expect the requester to know something about the network layers being utilized. Typically, if these are used, they are used between access and core network equipment. (c) Cloud IaaS interfaces for reserving instances: These are typically RESTful or XML-RPC type interfaces. With these interfaces compute, storage and other IaaS related resources are requested (setup/teardown). We note that currently such an interface is currently out of the scope of ALTO or any current IETF working group. One reason to consider this within ALTO is the tight coupling between the network information (PIDs, network map, cost map, capacitated graph) and requests that would be made by the application layer. In the high bandwidth case both query and reservation have similar security/privacy requirements. 5. Conclusion In this draft we have discussed two generic use cases that motivate the usefulness of general interfaces for cross stratum optimization in the network core. In our first use case network resource usage became significant due to the aggregation of many individually unique client demands. While in the second use case where data Bernstein & Lee Expires September 12, 2012 [Page 19] Internet-Draft Cross Stratum Optimization Use-cases March 2012 centers were communicating with each other bandwidth usage was already significant enough to warrant the use of private line/LAN type of network services. Both use cases result in optimization problems that trade off computational versus network costs and constraints. Both featured scenarios where advanced reservation, on demand, and recovery type service interfaces could prove beneficial. In the later section of this document we showed how ALTO concepts [1] and the ALTO protocol could be used and extended to support joint application network optimization for large network bandwidth consuming applications. 6. Security Considerations TBD 7. IANA Considerations This informational document does not make any requests for IANA action. 8. References 8.1. Informative References [1] "draft-ietf-alto-reqs-09." [Online]. Available: http://datatracker.ietf.org/doc/draft-ietf-alto-reqs/. [Accessed: 17-May-2011]. [2] J. Medved, N. Bitar, S. Previdi, B. Niven-Jenkins, and G. Watson, "Use Cases for ALTO within CDNs." [Online]. Available: http://tools.ietf.org/html/draft-jenkins-alto-cdn-use-cases-02. [Accessed: 06-Mar-2012]. [3] E. Mannie, Ed., "Generalized Multi-Protocol Label Switching (GMPLS) Architecture, RFC 3945." Oct-2004. [4] Y. Lee, G. Bernstein, and W. Imajuku, Eds., "Framework for GMPLS and PCE Control of Wavelength Switched Optical Networks (WSON), RFC 6163." Apr-2011. [5] A. Farrel, J. P. Vasseur, and J. Ash, "A Path Computation Element (PCE)-Based Architecture, RFC 4655." Aug-2006. [6] G. Swallow, J. Drake, H. Ishimatsu, Y. Rekhter,, "Generalized Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI): Resource ReserVation Protocol-Traffic Engineering(RSVP-TE) Support for the Overlay Model, RFC 4208," Oct-2005. Bernstein & Lee Expires September 12, 2012 [Page 20] Internet-Draft Cross Stratum Optimization Use-cases March 2012 [7] Y. R. Yang, R. Alimi, and R. Penno, "ALTO Protocol." [Online]. Available: http://tools.ietf.org/html/draft-ietf-alto-protocol-10. [Accessed: 05-Mar-2012]. [8] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A view of cloud computing," Commun. ACM, vol. 53, pp. 50- 58, Apr. 2010. [9] K. A. Hua and S. Sheu, "Skyscraper broadcasting: a new broadcasting scheme for metropolitan video-on-demand systems," in Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication, Cannes, France, 1997, pp. 89-100. [10] "Adobe Flash Media Server 4.0 * Building peer-assisted networking applications." [Online]. Available: http://help.adobe.com/en_US/flashmediaserver/devguide/WSa4cb07693d12 3884520b86f312a354ba36d-8000.html. [Accessed: 13-May-2011]. [11] Rudra Dutta and George N. Rouskas, "Traffic grooming in WDM networks: Past and future," IEEE Network, vol. 16, no. 6, pp. 46 - 56, 2002. [12] Keyao Zhu and B. Mukherjee, "Traffic grooming in an optical WDM mesh network," Selected Areas in Communications, IEEE Journal on, vol. 20, no. 1, pp. 122-133, 2002. [13] G. Bernstein, B. Rajagopalan, and D. Saha, Optical Network Control: Architecture, Protocols, and Standards. Addison-Wesley Professional, 2003. [14] B. Awerbuch and Y. Shavitt, "Topology aggregation for directed graphs," Networking, IEEE/ACM Transactions on, vol. 9, no. 1, pp. 82-90, 2001. [15] S. Uludag, K.-S. Lui, K. Nahrstedt, and G. Brewster, "Analysis of Topology Aggregation techniques for QoS routing," ACM Comput. Surv., vol. 39, Sep. 2007. [16] K. Nichols, D. L. Black, S. Blake, and F. Baker, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers." RFC2747. Available: http://tools.ietf.org/html/rfc2474. [17] D. O. Awduche and J. Agogbua, "Requirements for Traffic Engineering Over MPLS." RFC2702. Available: http://tools.ietf.org/html/rfc2702. Bernstein & Lee Expires September 12, 2012 [Page 21] Internet-Draft Cross Stratum Optimization Use-cases March 2012 Author's Addresses Greg M. Bernstein Grotto Networking Fremont California, USA Phone: (510) 573-2237 Email: gregb@grotto-networking.com Young Lee Huawei Technologies 5340 Legacy Drive, Building 3 Plano, TX 75024 USA Phone: (469) 277-5838 Email: leeyoung@huawei.com Intellectual Property Statement The IETF Trust takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in any IETF Document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Copies of Intellectual Property disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement any standard or specification contained in an IETF Document. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity All IETF Documents and the information contained therein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION Bernstein & Lee Expires September 12, 2012 [Page 22] Internet-Draft Cross Stratum Optimization Use-cases March 2012 HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Bernstein & Lee Expires September 12, 2012 [Page 23]