Network Working Group                                    Greg Bernstein
Internet Draft                                        Grotto Networking
Intended status: Informational                                Young Lee
                                                                 Huawei



                                                         March 12, 2012

      Use Cases for High Bandwidth Query and Control of Core Networks


             draft-bernstein-alto-large-bandwidth-cases-01.txt


Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with
   the provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on September 12, 2011.

Copyright Notice

   Copyright (c) 2012 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents




Bernstein & Lee, et al. Expires September 12, 2012             [Page 1]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   carefully, as they describe your rights and restrictions with
   respect to this document.

Abstract

   This draft describes two generic use-cases that illustrate
   application layer traffic optimization applied to high bandwidth
   core networks.  The type of information and interactions needed to
   perform various optimizations is described. In addition, extensions
   to the existing ALTO protocol are suggested that provide this
   functionality.

Table of Contents


   1. Introduction...................................................3
      1.1. Computing Clouds, Data Centers, and End Systems...........4
   2. End System Aggregate Networking................................5
      2.1. Aggregated Bandwidth Scaling..............................5
      2.2. Cross Stratum Optimization Example........................6
      2.3. Data Center and Network Faults and Recovery...............7
      2.4. Cross Stratum Control Interfaces..........................8
   3. Data Center to Data Center Networking..........................9
      3.1. Cross Stratum Optimization Examples.......................9
      3.2. Network and Data Center Faults and Reliability...........10
   4. Potential ALTO Protocol Extensions............................11
      4.1. High Bandwidth Network Information.......................12
         4.1.1. Maximum Reservable Bandwidth........................13
         4.1.2. Latency Information.................................14
         4.1.3. Endpoint Access Bandwidth Capacity..................14
      4.2. Network Information via Constraint and Cost Graph........14
      4.3. Network Updates and Notifications........................17
         4.3.1. Notification Interface..............................17
      4.4. Application-Network Reservation Interface................18
         4.4.1. IP Bypass/Traffic Engineering.......................18
         4.4.2. High Bandwidth Reservation/Recovery Interface.......19
   5. Conclusion....................................................19
   6. Security Considerations.......................................20
   7. IANA Considerations...........................................20
   8. References....................................................20
      8.1. Informative References...................................20
   Author's Addresses...............................................22
   Intellectual Property Statement..................................22
   Disclaimer of Validity...........................................22






   Bernstein & Lee         Expires September 12, 2012 [Page 2]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


1. Introduction

   Cloud Computing, network applications, software as a service (SaaS),
   Platform as a service (PaaS), and Infrastructure as a Service
   (IaaS), are just a few of the terms used to describe situations
   where multiple computation entities interact with one another across
   a network.   When the communication resources consumed by these
   interacting entities is significant compared with link or network
   capacity then opportunities may exist for more efficient utilization
   of available computation and network resources if both computation
   and network stratums cooperate in some way. The application layer
   traffic optimization (ALTO) working group is tackling the similar
   problem of "better-than-random peer selection" for distributed
   applications based on peer to peer (P2P) or client server
   architectures [1]. In addition, such optimization is important in
   content distribution networks (CDNs) as illustrated in [2].

   In the network stratum, particularly at the lower layers such as
   MPLS and optical, there are many restoration and recovery mechanisms
   to deal with network faults. The emergence of network based
   applications or cloud based disaster recovery/business recovery
   brings a new dimension to fault management, but also opportunities
   to more efficiently deliver higher levels of reliability. For
   example, the reliability requirements for mission critical
   applications are typically quantified by two key time parameters.
   The first is the Recovery Time Objective (RTO) which is the time to
   get the application back up and functioning and is similar to
   network recovery time notions. The second is the Recovery Point
   Objective (RPO) which quantifies in terms of time the amount of data
   loss that can be tolerated when a disaster occurs. Different
   applications and organizations can have greatly different demands
   from miliseconds to 12 hours. In addition, the amount of data that
   may need to be transferred to meet these objectives can vary greatly
   amongst different application types. With recover point objectives
   of, say an hour or more, a dynamic optical network layer could be
   very efficiently shared so as to reduce the overall cost to achieve
   a given layer of reliability. However, to do so requires cooperation
   between application and network stratum.

   General multi-protocol label switching (GMPLS) [3] can and is being
   applied to various core networking technologies such as SONET/SDH
   and wavelength division multiplexing (WDM) [4]. GMPLS provides
   dynamic network topology and resource information, and the
   capability to dynamically allocate resources (provision label
   switched paths). Furthermore, the path computation element (PCE) [5]
   provides for traffic engineered path optimization.




   Bernstein & Lee         Expires September 12, 2012 [Page 3]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   However, neither GMPLS nor PCE provide interfaces that are
   appropriate for an application layer entity to use for the following
   reasons:

     . GMPLS routing exposes full network topology information which
        tends to be proprietary to a carrier or require specialized
        knowledge and techniques to make use of, e.g., the routing and
        wavelength assignment (RWA) problem in WDM networks [4].

     . Core networks typically consist of two or more layers, while
        applications are typically only know about the IP layer and
        above. Hence applications would not be able to make direct use
        of PCE capabilities.

     . GMPLS signaling interfaces are defined for either peer GMPLS
        nodes or via a user network interface (UNI) [6]. Neither of
        these are appropriate for direct use by an application entity.

   In this paper we discuss two general use-cases that can generate
   core network flows with significant bandwidth and may vary
   significantly over time. The "cross stratum optimization" problems
   generated by these use cases are discussed. Finally, we look at
   interfaces between the application and network "stratums" that can
   enable these types of optimizations and how they can be created via
   extensions to the current ALTO protocol[7].

1.1. Computing Clouds, Data Centers, and End Systems

   While the definition of cloud computing or compute clouds is
   somewhat nebulous (or "foggy" if you will) [8], the physical
   instantiation of compute resources with network connectivity is very
   real and bounded by physical and logical constraints. For the
   purposes of this draft, we will call any network connected compute
   resources a data center if its network connectivity is significant
   compared either to the bandwidth of an individual WDM wavelength or
   with respect to the network links in which it is located. Hence we
   include in our definition very large data centers that feature
   multiple fiber access and consume more than 10MW of power, moderate
   to large content distribution network (CDN) installations located in
   or near major internet exchange points, medium sized business
   centers, etc...

   We will refer to those computational entities that don't meet our
   bandwidth criteria for a data center as an "end system".






   Bernstein & Lee         Expires September 12, 2012 [Page 4]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


2. End System Aggregate Networking

   In this section we consider the fundamental use case of end systems
   communicating with data centers as shown in Figure 1. In this figure
   the "clients" are end systems with relatively small access bandwidth
   compared to a WDM wavelength, e.g., under 100Mbps. We show these
   clients roughly partitioned into three network related end user
   regions ("A", "B", and "C"). Given a particular network application,
   in a static network application situation, each client in a region
   would be associated with a particular data center.

                                           Region B
                             +---------+  +------+
                             |  Data   |  |Client|
                             |Center 2 |  |  B1  |+------+
             +------+        +----+----+  +--+---+|Client|
             |Client|             |         /     |  B2  |
             |  A1  `.         _.-+--------+-.    +--+---+
   Region A  +------+ `-.  ,-''               `--.  /   ...
        +------+        ,`:                       `+.     +------+
        |Client|       /                             \    |Client|
        |  A2  +------+                               \---+  BM  |
        +------+     (             Network             )  +------+
         ...        .-'                               /
     +------+   _.-'   \                             `.
     |Client|.-'        `=.                       ,-'  `.
     |  AN  |       _.-''  `--.               _.-\   +---`.----+
     +------+ +----'----+      `----+------+''    \  |  Data   |
              |  Data   |           |       \      | |Center 3 |
              |Center 1 |        +--+---+ +--+---+ \ +---------+
              +---------+        |Client| |Client|  \------+
                                 |  C1  | |  C2  |  |Client|
                                 +------+ +------+  |  CK  |
                                       Region C     +------+

            Figure 1. End system to data center communications.

2.1. Aggregated Bandwidth Scaling

    One of the simplest examples where the aggregation of end system
   bandwidth can quickly become significant to the "network" is for
   video on demand (VoD) streaming services. Unlike a live streaming
   service where IP or lower layer multicast techniques can be
   generally applied, in VoD the transmissions are unique between the
   data center and clients. For regular quality VoD we'll use an
   estimate of 1.5Mbps per stream (assuming H.264 coding), for HD VoD
   we'll use an estimate of 10Mbps per stream. To fill up a 10Gbps
   capacity optical wavelength requires either 6,666 or 1,000 clients


   Bernstein & Lee         Expires September 12, 2012 [Page 5]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   for regular or high definition respectively.  Note that special
   multicasting techniques such as those discussed in [9] and peer
   assistance techniques such as provided in some commercial systems
   [10] can reduce the overall network bandwidth requirements.

    With current high speed internet deployment such numbers of clients
   are easily achieved; in addition demand for VoD services can vary
   significantly over time, e.g., new video releases, inclement weather
   (increases number of viewers), etc...

2.2. Cross Stratum Optimization Example

    In an ideal world both data centers and networks would have
   unlimited capacity, however in actuality both can have constraints
   and possibly varying marginal costs that vary with load or time of
   day.  For example suppose that in Figure 1 that Data Center 3 has
   been primarily serving VoD to region "C" but that it has, at a
   particular period in time, run out of computation capacity to serve
   all the client requests coming from region "C". At this point we
   have a fundamental cross stratum optimization (CSO) problem. We want
   to see if we can accommodate additional client request from region
   "C" by using a different data center than the fully utilized data
   center #3. To answer this questions we need to know (a) available
   capacity on other data centers to meet a request, (b) the marginal
   (incremental) cost of servicing the request on a particular data
   center with spare capacity, (c) the ability of the network to
   provide bandwidth between region "C" to a data center, and (d) the
   incremental cost of bandwidth from region "C" to a data center.






















   Bernstein & Lee         Expires September 12, 2012 [Page 6]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


                                             Region B
                               +---------+  +------+
                               |  Data   |  |Client|
                               |Center 2 |  |  B1  |+------+
               +------+        +----+----+  +--+---+|Client|
               |Client|             |         /     |  B2  |
               |  A1  `.         _.-+--------+-.    +--+---+
     Region A  +------+ `-.  ,-'' XXXXX     XX  `--.  /   ...
          +------+        ,`:       ``---..__ XXXX  `+.     +------+
          |Client|       /  X        |       ```--XX   \    |Client|
          |  A2  +------+..X`.       \              XX--+---+  BM  |
          +------+     (  X   `-/     \                  )  +------+
           ...        .-'     .'       |        +----.X /
       +------+   _.-'   \  X/         \        |    X `.
       |Client|.-'        `=.X          \      XXXX ,-'  `.
       |  AN  |       _.-''  `--.    XXXXXXXXX  _.-\   +---`.----+
       +------+ +----'----+      `----+------+''    \  |  Data   |
                |  Data   |           |       \      | |Center 3 |
                |Center 1 |        +--+---+ +--+---+ \ +---------+
                +---------+        |Client| |Client|  \------+
                                   |  C1  | |  C2  |  |Client|
                                   +------+ +------+  |  CK  |
                                         Region C     +------+

     Figure 2. Aggregated flows between end systems and data centers.



   In Figure 2 we show a possible result of solving the previously
   mentioned CSO problem. Here we show the additional client requests
   from region "C" being serviced by data center #2 across the network.
   Figure 2 also illustrates the possibility of setting up "express"
   routes across the network at the MPLS level or below. Such
   techniques, known as "optical grooming" or "optical bypass"[11],[12]
   at the optical layer, can result in significant equipment and power
   savings for the network by "bypassing" higher level routers and
   switches.

2.3. Data Center and Network Faults and Recovery

    Data center failures, whether partial or complete, can have a major
   impact on revenues in the VoD example previously described. If there
   is excess capacity in other data centers within the network
   associated with the same application then clients could be
   redirected to those other centers if the network has the capacity.
   Moreover, MPLS and GMPLS controlled networks have the ability to
   reroute traffic very quickly while preserving QoS. As with general
   network recovery techniques [13] various combinations of pre-


   Bernstein & Lee         Expires September 12, 2012 [Page 7]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   planning and "on the fly" approaches can be used to tradeoff between
   recovery time and excess network capacity needed for recovery.

    In the case of network failures there is the potential for clients
   to be redirected to other data centers to avoid failed or over
   utilized links.

2.4. Cross Stratum Control Interfaces

    Two types of load balancing techniques are currently utilized in
   cloud computing. The first is load balancing within a data center
   and is sometimes referred to as local load balancing. Here one is
   concerned with distributing requests to appropriate machines (or
   virtual machines) in a pool based on the current machine
   utilization. The second type of load balancing is known as global
   load balancing and is used to assign clients to a particular data
   center out of a choice of more than one within the network and is
   our concern here.  A number of commercial vendors offer both local
   and global load balancing products.  Currently global load balancing
   systems have very little knowledge of the underlying network. To
   make better assignments of clients to data centers many of these
   systems use geographic information based on IP addresses. Hence we
   see that current systems are attempting to perform cross stratum
   optimization albeit with very coarse network information. A more
   elaborate interface for CSO in the client aggregation case would be:

       1. A Network Query Interface - Where the global load balancer
          can inquire as to the bandwidth availability between "client
          regions" and data centers.

       2. A Network Resource Reservation Interface - Where the global
          load balancer can make explicit requests for bandwidth
          between client regions and data centers.

       3. A Fault Recovery Interface - For the global load balancer to
          make requests for expedited bulk rerouting of client traffic
          from one data center to another. Or for the network layer to
          make requests to the application to help deal with network
          faults.

    The network query interface can be considered a superset of the
   functionality supported by the current ALTO protocol [7]. Potential
   extensions are detailed in section 4.







   Bernstein & Lee         Expires September 12, 2012 [Page 8]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


3. Data Center to Data Center Networking

    There are a number of motivations for data center to data center
   communications: on demand capacity expansion ("cloud bursting"),
   cooperative exchanges between business partners, offsite data
   backup, "rent before building", etc... In Figure 3 we show an
   example where a number of businesses each with an "internal data
   center" contracts with a large external data center for additional
   computational (which may include storage) capacity. The data centers
   may connect to each other via IP transit type services or more
   typically via some type of Ethernet virtual private line or LAN
   service.

                         +-------------------+
                         |                   |
                         | Large Data Center |
                         |                   |
                         +----------+--------+
                                    |
                                  _.+-----------.
                             ,--''               `---.
                          ,-'                         `-.
                        ,'                               `.
                      ,'                                   `.
     +--------+      ;                Network                :
     |Business|  __..+                                       |
     | #1 DC  +-'    :                                       ;
     +--------+       `.                                   ,'
                        `.                               ;:
                          `-.                         ,-'  \
                             `---.               _.--'   +--`.----+
                                  `+-----------''        |Business|
                                   /                     | #N DC  |
                                  |                      +--------+
                             +----+---+
                             |Business|
                             | #2 DC  |
                             +--------+

          Figure 3. Basic data center to data center networking.



3.1. Cross Stratum Optimization Examples

    In the DC-to-DC example of Figure 3 we can have computational
   constraints/limits at both local and remote data centers; fixed and
   marginal computational costs at local and remote data centers; and


   Bernstein & Lee         Expires September 12, 2012 [Page 9]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   network bandwidth costs and constraints between data centers. Note
   that computing costs could vary by the time of day along with the
   cost of power and demand. Some cloud providers have quite
   sophisticated compute pricing models including: reserved, on demand,
   and spot (auction) variants.

    In addition, to possibly dynamically changing pricing, traffic
   loads between data centers can be quite dynamic. In addition, data
   movement between data centers is another source of large network
   usage variation. Such peaks can be due to scheduled daily or weekly
   offsite data backup, bulk VM migration to a new data center,
   periodic virtual machine migration, etc...



3.2. Network and Data Center Faults and Reliability

    For networked applications that require high levels of
   reliability/availability the network diagram of Figure 4 could be
   enhanced with redundant business locations and external data centers
   as shown in Figure 4. For example cell phone subscriber databases
   and financial transactions generally require what is called
   geographic database replication and results in extra communication
   between sites supporting high availability. For example if business
   #1 in Figure 4 required a highly available database related service
   then there would be an additional communication flows from the data
   center "1a" to data center "1b".  Furthermore, if business #1 has
   outsourced some of its computation and storage needs to independent
   data center X then for resilience it may want/need to replicate
   (hot-hot redundancy) this information at independent data center Y.




















   Bernstein & Lee         Expires September 12, 2012 [Page 10]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


              +-------------+              +-------------+
              |Independent  |              |Independent  |
              |Data Center X|              |Data Center Y|
              +-----+-------+              +------+------+
                     \                           /
                      `.     _.------------.   .'
                        \--''               `-+-.
                     ,-'                         `-.       +--------+
                   ,'                               `.    .'Business|
                 ,'                                   `.-' |#N DC-a |
                ;                Network                :  +--------+
    +--------+  |                                       |
    |Business+---                                       ;
    |#1 DC-a |   `.                                   +:
    +--------+     `.                               ;/  \
                     `-.                         ,-'     `.
                      .'`---.               _.--'       +--`.----+
        +--------+   /       `+-+---------\'            |Business|
        |Business| .'           |          \            |#N DC-a |
        |#1 DC-b .'             /           \           +--------+
        +--------+             |             \
                          +----+---+    +--------+
                          |Business|    |Business|
                          |#2 DC-a |    |#2 DC-b |
                          +--------+    +--------+

     Figure 4. Data center to data center networking with redundancy.



4. Potential ALTO Protocol Extensions

   This section discusses the applicability of the ALTO protocol and
   necessary extensions to support the high bandwidth consuming use
   cases previously covered. Before doing so we discuss general
   properties of the high bandwidth scenarios that may differ
   significantly from other uses of the ALTO protocol.

   The first has to do with scope and scale. The consumer of high
   bandwidth alto extensions is typically some type of application
   controller within a data center, as opposed to an individual end
   user. The number of such entities with a need for the high bandwidth
   related information is orders of magnitude smaller than, say, peer
   to peer networking users, or applications closer to the end user.
   Since a network provider may consider this information sensitive,
   there may be a desire to limit its distribution to a "pre-
   registered" set of entities.



   Bernstein & Lee         Expires September 12, 2012 [Page 11]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   Secondly, there is the notion of time scales. In cloud services we
   already see variants such as "on demand" compute instances and
   "reserved" compute instances. For network resource queries we may be
   concerned with (a) current bandwidth availability, (b) bandwidth
   availability at a future time, or (c) bandwidth for a bulk data
   transfer of a given amount that must take place within a given time
   window.

   Time-dependent bandwidth information can be and typically are
   considered in network planning and provisioning systems. For
   example, a VoD provider knows ahead of time when the latest
   "blockbuster" film will be available via its service and can make
   estimates based on historical data on the bandwidth that it will
   need to deal with the subsequent demand. The following discussions,
   however, are restricted to "current time" for now.

   Finally another goal in the design of an interface between the
   application and networking stratums is to minimize the need for
   either stratum to know too much about the inner workings of the
   other. Hence as much as possible it is desired to insulate the
   applications stratum from technology specifics of the network. That
   said, data centers providing IaaS may prefer to specify flows and
   connectivity at a layer below IP such as Ethernet.

4.1. High Bandwidth Network Information

   ALTO's network map and cost map concepts can be used to support the
   aforementioned high bandwidth use cases.  In this section we will
   explore both how they could be used in high bandwidth "core"
   networks and how they might be extended to better support large
   bandwidth optimization.

   The ALTO concept of provider defined network location identifier,
   (PID), is a powerful network abstraction mechanism that is also
   appropriate for optical/high bandwidth scenarios. For example, a
   network provider could assign PIDs to WDM ROADMs or OTN switches
   providing access to an optical core network.  All subtending
   datacenters or hosts would have their IP addresses grouped with such
   a PID. The collection of these would form an ALTO network map.
   Furthermore, a corresponding ALTO cost map can be used by the
   network to indicate preferred connectivity. Since not all these
   entities necessarily connect directly to an edge WDM ROADM or OTN
   switch, ALTO's Endpoint Property Service can be used to denote the
   type of interface supported by an end system or data center and its
   bandwidth capabilities.





   Bernstein & Lee         Expires September 12, 2012 [Page 12]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


4.1.1. Maximum Reservable Bandwidth

   The amount of bandwidth of available between two sites or
   subnetworks can be of prime interest to large bandwidth consuming
   applications. Unlike "unused" IP bandwidth, sub-IP bandwidth such as
   that from SDH, OTN, and WDM cannot be probed from a network edge or
   application. The only way to find out if such bandwidth could be
   allocated to a particular application data flow is to query the
   network.

   One may want to query the network as to the reservable bandwidth in
   a number of different cases:

   (a)   Bandwidth available between a single source destination pair
        (two PIDs)

   (b)   Bandwidth between one particular source (PID) and several
        other destinations (PIDs)

   (c)   Bandwidth between one set of sources (PIDs) and another set of
        destinations (PIDs).

   Case (a), bandwidth between two points, is well defined, however, in
   cases (b) and (c) there is some ambiguity.  For example in (c) are
   we considering multiple sources communicating with multiple
   destinations at the same time? Do some of these pairs interfere with
   each other? To fully understand such constraints some type of
   constrained graph abstraction would be needed.

   However, if we restrict the question in cases (b) and (c) to what is
   the maximum reservable bandwidth between each source and destination
   pair within the sets considered individually, then the question is
   unambiguous, useful, and can fit within ALTO's existing cost map
   structure (section 5.2 [7]). A new ALTO cost type of "reservable
   bandwidth" can be defined for this purpose. This would be a
   "numeric" cost type that represents the actual bandwidth in the unit
   of, say, Mbps.

   From the point of view of an optical network, an extended ALTO
   request would arrive at our extended ALTO server asking for the
   "reservable bandwidth" between multiple Source Network Locations,
   say [Src_1, Src_2, ..., Src_m], and a list of multiple Destination
   Network Locations, say [Dst_1, Dst_2, ...,   Dst_n]. The network
   computing entity would calculate the "reservable bandwidth" between
   all of these individual source destination pairs. The extended ALTO
   Server would then return the "reservable bandwidth" as an ALTO Path
   Cost for each communicating pair (i.e., Src_1 -> Dst_1, ..., Src_1 -
   > Dst_n, ..., Src_m -> Dst_1, ..., Src_m -> Dst_n).


   Bernstein & Lee         Expires September 12, 2012 [Page 13]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


4.1.2. Latency Information

   Latency information, either fixed due to propagation delay times, or
   statistical measures due to queuing induced delays can be similarly
   represented via ALTO's cost map structure.

   When choosing amongst flows between multiple data centers utilizing
   significant amounts of bandwidth, alternative routes with differing
   latency may need to be considered. In such a situation, a simple
   latency cost map, may need to be replaced by an abstract graph model
   to allow for more effective optimization of resources.

4.1.3. Endpoint Access Bandwidth Capacity

   There are a number of standard sized pipes used to access high
   bandwidth networks and these can either be larger or smaller than
   the bandwidth availability within various portions of the network.
   Hence to make good use of network resources it is desirable to
   advertise and endpoints access bandwidth capacity. Typically this
   would be a number in terms of Mbps or Gbps and would reflect the
   true bandwidth available to the endpoint after upstream bottlenecks
   or overhead is taken into account. This information could be
   advertised via ALTO's endpoint property service.

4.2. Network Information via Constraint and Cost Graph

   As discussed in the previous section, as the desired connectivity
   between locations becomes more complex (rather than exclusive point
   to point),the basic ALTO cost map structure can be insufficient to
   reveal network bottlenecks and hence optimization decision points.

   Consider the network shown in Figure 5, where DC indicates a data
   center, ER an end user region (as in the end user aggregation use
   case), N a switching node of some sort, and L a link. The link
   capacities and costs are also shown on the figure as well as a cost
   map between [ER1, ER2] and [DC1, DC2, DC3]. Since the network has a
   tree structure (very unusual but easier to draw in ASCII art), the
   cost map is unique.

   As an illustration, assume that the maximum available capacity
   between any individual end region and a data center is 5 units
   (i.e., L1=L2=L5=L6=5). However, link L3 (capacity 8 units)
   represents a bottle neck to all the data centers (L3 is on all the
   paths to DC1, DC2, or DC3 from all end regions, ER1 and ER2). In a
   similar way, link L4 (capacity 6 units) represents a bottle neck to
   data centers DC1 and DC2 from all end regions, ER1 and ER2. A simple
   "cost map" like structure misses these bottle necks.



   Bernstein & Lee         Expires September 12, 2012 [Page 14]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012



      ,---.    L1                                    +----+
     ( ER1 )`-.                                L5  .'|DC1 |
      `---'    `-._ ,-.                           /  +----+
                   (N1 )    L3               ,-..'
                 .-'`-' `-.__         L4__.+(N3 )
      ,---.    .'            `-.,-..--''     `-'`.   +----+
     ( ER2 ).-'L2              (N2 )           L6 `-.|DC2 |
      `---'                     `-'`-._              +----+
                                       `-.
        Link Capacity Cost                `-._
        L1    5        1                   L7 `-._
        L2    5        2                          `-._
        L3    8        1                              `-.
        L4    6        2              Cost Map           `-._    +----+
        L5    5        1              DC1  DC2  DC3          `-._|DC3 |
        L6    5        1          ER1  5    5    8               +----+
        L7    10       6          ER2  6    6    9

            Figure 5. Example network illustrating bottlenecks

   With the current ALTO cost map structure, the least cost path from
   ER1 would be either to DC1 or DC2. However, with the proposed
   capacitated cost map, the connection from ER1 to DC3 could be a
   better choice than the rest depending on the relative cost of
   network resources to data center resources.

   A more general and relatively efficient alternative is to provide
   the requestor with a capacitated and multiply weighted graph that
   approximates and abstracts the capabilities of the network as seen
   by the source and destination location sets.

   The creation of an approximate graph model to represent the network
   for cross layer optimization purposes is similar to the well-known
   topology aggregation problem [14] and [15], but different in a
   number of respects. First, the goal is not the approximation of the
   network structure for general route computation use, but a view of
   only a portion of the network relevant to the participating
   locations that approximates the costs and constraints amongst these
   locations. Second, the specific technologies underlying the costs
   and constraints are of no interest to the application layer and
   hence much technology specific layer information that one sees in
   GMPLS link state routing databases would be absent in such a graph.




   Bernstein & Lee         Expires September 12, 2012 [Page 15]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   Like the current ALTO filtered cost map, a request for a
   capacitated, weighted graph would take source and destination PIDs
   as inputs. In JSON notation we could represent the resulting graph
   as an JSON object containing link objects. A first cut encoding
   could be something like:

   object {
     LinkEntry [LinkName]<0..*>;
   } CostConstraintGraphData;

   object {
     PIDName:    a-end; // Node name at one side of the link
     PIDName:    z-end; // Node name at the other side of the link
     Weight:     wt;
     JSONNumber: latency;
     Capacity:   r-cap; // Reservable capacity
   } LinkEntry;


   Where a link name is formatted like a PIDName (but names a link),
   and PID names are used for both provider defined location and
   provider defined internal model node identification. A graph
   representation of the network of 0 might look like:

   {
     "meta" : {},
     "data" : {
       "graph": {
         "L1": {"a-end":"ER1", "z-end":"N1", "wt":1,"r-cap":5},
         "L2": {"a-end":"ER2", "z-end":"N1", "wt":2,"r-cap":5},
         "L3": {"a-end":"N1", "z-end":"N2", "wt":1,"r-cap":8},
         "L4": {"a-end":"N2", "z-end":"N3", "wt":2,"r-cap":6},
         "L5": {"a-end":"N3", "z-end":"DC1", "wt":1,"r-cap":5},
         "L6": {"a-end":"N3", "z-end":"DC2", "wt":1,"r-cap":5},
         "L7": {"a-end":"N2", "z-end":"DC3", "wt":6,"r-cap":10}
       }
     }
   }








   Bernstein & Lee         Expires September 12, 2012 [Page 16]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


4.3. Network Updates and Notifications

   Changing conditions in the network such as costs or capacity may
   need to be relayed to the application layer in suitable form and in
   a time frame relative to their importance to service QoS, service
   delivery, or cross layer optimization.

   Network fault conditions can affect service QoS in a number of ways.
   The most obvious being a significant reduction in capacity to
   current application flows. In such a case the application would want
   to be notified as soon as possible and take remedial action. In
   other cases a network fault may only be observable as an increase in
   latency (due to increased length of recovered optical path) such an
   increase may not immediately result in breach of a service level
   agreement (SLA) but could cumulatively over time. Hence notification
   of such a change in condition would need to be timely and the
   network may qualify if the change of state is relatively permanent
   or what the duration may be.

   Some applications, such as those involving bulk file transfer, may
   have flexible time windows, with the exact time the service is
   rendered dictated by network availability. In particular, the
   network takes advantage of application flexibility in the exact
   scheduling of the network resources to be used. Such occurrences may
   be non-recurring, e.g. a one off bulk file transfer, or recurring as
   would be common in cloud based system backup and restore
   applications. In this case the notification from the network needs
   to relatively timely (but most likely on the order of seconds rather
   than milliseconds), is specific to a particular network service
   instance rather than raw network cost or capacity, and the entire
   notification process may require a non-repudiation security
   assurance.

   Changes in the network that only affect costs but not QoS can affect
   the cross layer optimization of an existing application. The time
   frame for such notifications would typically be in terms of
   fractions of an hour to days.

4.3.1. Notification Interface

   With the exception of the "notification of network service instance
   availability", all other notifications can be made via modifications
   or updates to suitably extended network or cost maps, or graphs.

   Since the high bandwidth uses cases deal with a rather restricted
   user group, a number of implementation mechanisms may be possible,
   that may not be viable in a more general ALTO deployment. For
   example, with a capacitated graph representation we may selectively


   Bernstein & Lee         Expires September 12, 2012 [Page 17]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   update specific links of the graph for particular application
   entities. Note that in order to do this the network layer would need
   to keep track of the graph models in use by specific application
   entities and update them as appropriate.

4.4. Application-Network Reservation Interface

   The network query interfaces previously discussed allows the
   application layer to find out about the options, costs, and
   capabilities available from the network layer in a suitably high
   level but actionable format. However, it remains to specify an
   interface for the application layer to communicate its usage intent
   to the network layer and possibly make firm commitments for scarce
   network resources. Before delving into this interface we first look
   a bit at what happens behind the scenes in high bandwidth networks.

4.4.1. IP Bypass/Traffic Engineering

   There are various ways to alter the path that IP flows take through
   a network. Two IETF standard ways are via DiffServe [16] and MPLS-TE
   [17]. Both mechanism start with IP packet classification but in
   MPLS-TE a packet belonging to a flow matching an MPLS forwarding
   equivalence class (FEC) will be "pulled" from normal IP packet
   forwarding and place in a MPLS tunnel, known as a label switched
   path. It will then be forwarded on via MPLS mechanisms bypassing the
   IP layer until it "pops" out of its MPLS tunnel and rejoins the IP
   forwarding world (hopefully much closer to its intended destination
   and making better use of network resources along the way).

   In the SONET, SDH, G.709, and WDM world a similar process can take
   place, but is known by the term "grooming" [11],[12]. In both cases
   network resources including bandwidth, equipment, and power can be
   significantly optimized by essentially setting up "express lanes" at
   a lower layer in the network's protocol stack. Note that with
   optical transport networks there can many layers below "layer 2",
   i.e., one can think of the "physical" layer as possibly consisting
   of a number of different sub-layers.

   If the application layer by knowing its usage patterns or required
   network usage can let the network its needs then IP/Optical bypass
   can be more readily be performed on a dynamic basis, particularly if
   the network has a GMPLS infrastructure. The application layer should
   not need to know the specifics of how the IP bypass occurs, e.g.,
   via MPLS, OpenFlow, or other standard or proprietary techniques.






   Bernstein & Lee         Expires September 12, 2012 [Page 18]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


4.4.2. High Bandwidth Reservation/Recovery Interface

   As previously stated the application layer should not be exposed to
   the details of networking mechanisms that will provide the bandwidth
   and QoS guarantees. Hence the application layer would specify its
   demands in terms of IP flows such as when specifying an MPLS FEC. It
   is for further study whether some IaaS applications may want to deal
   with layer 2 (Ethernet) flows rather than IP. In either case the
   basic principles would be the same. Note that a bandwidth
   reservation interface such as this could also be used when
   application layer is seeking network help in dealing with disaster
   recovery and business continuity.

   A number of current protocols come close to the features desired of
   such an interface, but none are completely appropriate. A short
   summary follows:

   (a) PCE: The PCE interface takes requests for connections with
   various optimization conditions supported. PCEs though return the
   computed paths to the requester, something of which is undesired in
   our reservation interface. Note that PCE is built directly on TCP.

   (b) UNIs (GMPLS and OIF): UNIs provide RSVP-TE based signaling
   interfaces for connection requests at a particular layer. Such
   interfaces expect the requester to know something about the network
   layers being utilized. Typically, if these are used, they are used
   between access and core network equipment.

   (c) Cloud IaaS interfaces for reserving instances: These are
   typically RESTful or XML-RPC type interfaces. With these interfaces
   compute, storage and other IaaS related resources are requested
   (setup/teardown).

   We note that currently such an interface is currently out of the
   scope of ALTO or any current IETF working group. One reason to
   consider this within ALTO is the tight coupling between the network
   information (PIDs, network map, cost map, capacitated graph) and
   requests that would be made by the application layer. In the high
   bandwidth case both query and reservation have similar
   security/privacy requirements.

5. Conclusion

   In this draft we have discussed two generic use cases that motivate
   the usefulness of general interfaces for cross stratum optimization
   in the network core. In our first use case network resource usage
   became significant due to the aggregation of many individually
   unique client demands. While in the second use case where data


   Bernstein & Lee         Expires September 12, 2012 [Page 19]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   centers were communicating with each other bandwidth usage was
   already significant enough to warrant the use of private line/LAN
   type of network services.

   Both use cases result in optimization problems that trade off
   computational versus network costs and constraints. Both featured
   scenarios where advanced reservation, on demand, and recovery type
   service interfaces could prove beneficial. In the later section of
   this document we showed how ALTO concepts [1] and the ALTO protocol
   could be used and extended to support joint application network
   optimization for large network bandwidth consuming applications.



6. Security Considerations

   TBD

7. IANA Considerations

   This informational document does not make any requests for IANA
   action.

8. References

8.1. Informative References

[1] "draft-ietf-alto-reqs-09." [Online]. Available:
   http://datatracker.ietf.org/doc/draft-ietf-alto-reqs/. [Accessed:
   17-May-2011].
[2]  J. Medved, N. Bitar, S. Previdi, B. Niven-Jenkins, and G. Watson,
   "Use Cases for ALTO within CDNs." [Online]. Available:
   http://tools.ietf.org/html/draft-jenkins-alto-cdn-use-cases-02.
   [Accessed: 06-Mar-2012].
[3]  E. Mannie, Ed., "Generalized Multi-Protocol Label Switching (GMPLS)
   Architecture, RFC 3945." Oct-2004.
[4]  Y. Lee, G. Bernstein, and W. Imajuku, Eds., "Framework for GMPLS
   and PCE Control of Wavelength Switched Optical Networks (WSON), RFC
   6163." Apr-2011.
[5]  A. Farrel, J. P. Vasseur, and J. Ash, "A Path Computation Element
   (PCE)-Based Architecture, RFC 4655." Aug-2006.
[6]  G. Swallow, J. Drake, H. Ishimatsu, Y. Rekhter,, "Generalized
   Multiprotocol Label Switching (GMPLS) User-Network Interface (UNI):
   Resource ReserVation Protocol-Traffic Engineering(RSVP-TE) Support
   for the Overlay Model, RFC 4208," Oct-2005.


   Bernstein & Lee         Expires September 12, 2012 [Page 20]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


[7]  Y. R. Yang, R. Alimi, and R. Penno, "ALTO Protocol." [Online].
   Available: http://tools.ietf.org/html/draft-ietf-alto-protocol-10.
   [Accessed: 05-Mar-2012].
[8]  M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A.
   Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M.
   Zaharia, "A view of cloud computing," Commun. ACM, vol. 53, pp. 50-
   58, Apr. 2010.
[9]  K. A. Hua and S. Sheu, "Skyscraper broadcasting: a new broadcasting
   scheme for metropolitan video-on-demand systems," in Proceedings of
   the ACM SIGCOMM  '97 conference on Applications, technologies,
   architectures, and protocols for computer communication, Cannes,
   France, 1997, pp. 89-100.
[10] "Adobe Flash Media Server 4.0 * Building peer-assisted networking
   applications." [Online]. Available:
   http://help.adobe.com/en_US/flashmediaserver/devguide/WSa4cb07693d12
   3884520b86f312a354ba36d-8000.html. [Accessed: 13-May-2011].
[11]  Rudra Dutta and George N. Rouskas, "Traffic grooming in WDM
   networks: Past and future," IEEE Network, vol. 16, no. 6, pp. 46 -
   56, 2002.
[12]  Keyao Zhu and B. Mukherjee, "Traffic grooming in an optical WDM
   mesh network," Selected Areas in Communications, IEEE Journal on,
   vol. 20, no. 1, pp. 122-133, 2002.
[13]  G. Bernstein, B. Rajagopalan, and D. Saha, Optical Network
   Control: Architecture, Protocols, and Standards. Addison-Wesley
   Professional, 2003.
[14]  B. Awerbuch and Y. Shavitt, "Topology aggregation for directed
   graphs," Networking, IEEE/ACM Transactions on, vol. 9, no. 1, pp.
   82-90, 2001.
[15]  S. Uludag, K.-S. Lui, K. Nahrstedt, and G. Brewster, "Analysis of
   Topology Aggregation techniques for QoS routing," ACM Comput. Surv.,
   vol. 39, Sep. 2007.
[16]  K. Nichols, D. L. Black, S. Blake, and F. Baker, "Definition of
   the Differentiated Services Field (DS Field) in the IPv4 and IPv6
   Headers." RFC2747. Available: http://tools.ietf.org/html/rfc2474.
[17]  D. O. Awduche and J. Agogbua, "Requirements for Traffic
   Engineering Over MPLS." RFC2702. Available:
   http://tools.ietf.org/html/rfc2702.








   Bernstein & Lee         Expires September 12, 2012 [Page 21]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012



Author's Addresses


   Greg M. Bernstein
   Grotto Networking
   Fremont California, USA
   Phone: (510) 573-2237
   Email: gregb@grotto-networking.com

   Young Lee
   Huawei Technologies
   5340 Legacy Drive, Building 3
   Plano, TX 75024
   USA
   Phone: (469) 277-5838
   Email: leeyoung@huawei.com




Intellectual Property Statement

   The IETF Trust takes no position regarding the validity or scope of
   any Intellectual Property Rights or other rights that might be
   claimed to pertain to the implementation or use of the technology
   described in any IETF Document or the extent to which any license
   under such rights might or might not be available; nor does it
   represent that it has made any independent effort to identify any
   such rights.

   Copies of Intellectual Property disclosures made to the IETF
   Secretariat and any assurances of licenses to be made available, or
   the result of an attempt made to obtain a general license or
   permission for the use of such proprietary rights by implementers or
   users of this specification can be obtained from the IETF on-line
   IPR repository at http://www.ietf.org/ipr

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   any standard or specification contained in an IETF Document. Please
   address the information to the IETF at ietf-ipr@ietf.org.

Disclaimer of Validity

   All IETF Documents and the information contained therein are
   provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION


   Bernstein & Lee         Expires September 12, 2012 [Page 22]

Internet-Draft   Cross Stratum Optimization Use-cases        March 2012


   HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY,
   THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL
   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY
   WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE
   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
   FOR A PARTICULAR PURPOSE.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.







































   Bernstein & Lee         Expires September 12, 2012 [Page 23]