System Handbook

The programming language java was chosen for the implementation part of the proposed solution. The motivation behind the decision is the multiplatform support of the Java programming language. The implementation of the above discussed solution can be divided into these main parts
  • Preconditions
  • Initial settings
  • Monitoring
  • Change
The following chapters will discuss the mentioned parts in detail.

Preconditions

As mentioned in the previous chapter the application focuses on Quality of Service configuration and management on an end to end connection. The basic topology for the implementation is shown on Figure 1 Router is the monitored device. Responder is the other end device, configured as an IP SLA responder. BasicMeter is the exporter monitoring the traffic flowing outbound from the Responder interface. PC is the computer running the optimizing application. The type of connection between the routers is not important. If the BasicMeter computer is not running the collector and hosting the database, the PC also needs connection to the database.

Implementation topology.jpg

Figure 1 Implementation topology

The application assumes that both the Router and the Responder are in production state. This means the LAN and WAN interfaces are in operational state, routing is configured properly and the router is forwarding traffic. The application uses SNMP protocol for monitoring the routers, thus it expects the devices to be configured with a known read-write SNMP community string. In order to perform a change a TFTP server running at the Machine is necessary for sending the changed configuration to the managed Router. If the Machine is not on the same local segment as the router, it needs to be ensured that nothing is blocking the SNMP and TFTP traffic from the optimizing application to both the Router and the Responder. For gathering data from the router and management of the routers in question the concept uses the SNMP.

Initial settings

Both the Router and the Responder need to have a special configuration applied in order for the application to work. So the first actions the optimizing application performs is to apply specific QoS configuration to the Router and the Responder. The steps needed for the application to tailor the QoS configuration for a specific router are described in the next few paragraphs.

Autodiscovery

The first step is to perform autodiscovery on the router to identify its interfaces and choose the correct interface for QoS application (the WAN interface). The application pulls the list of all the interfaces on the device from the router using SNMP protocol. All the interfaces that are not in an operational state are not considered. Further excluded are all software interfaces like Loopback interfaces or Null interfaces and interfaces that are a part of a logical bundle i.e. Multilink interface or ATM IMA interface. Even excluding all these interfaces it is not always possible to distinguish which interface is the WAN interface and so the final decision, which interface to apply the QoS configuration on is left to the user. The autodiscovery method returns the interface index of the interface the configuration should be applied on.

Configuration generation and application

Based on the information gathered in the autodiscovery phase an initial QoS configuration is prepared by the application consisting of the following:
  • Class maps see Figure 2
  • Policy maps see Figure 3
  • Interface specific configuration see Figure 4
  • IP SLA monitor probes see Figure 5

class_maps.jpg
Figure 2 Class maps configuration

As shown in Figure 2 the configuration consists of Four class maps for traffic types

  • GOLD - delay sensitive traffic
  • SILVER – mission critical and transactional traffic
  • BRONZE – other preferred traffic types
  • BEST_EFFORT – all that does not match the classes above
Three class maps for TCP flow preference change
  • SILVER_TCP_MARKER
  • BRONZE_TCP_MARKER
  • BEST-EFFORT_TCP_MARKER
Three class maps for UDP flow preference change
  • SILVER_POLICE
  • BRONZE_POLICE
  • BEST-EFFORT_POLICE
As shown in the Figure all the class maps classify traffic based on access control lists.

policy_maps.jpg
Figure 3 Policy maps configuration

The main policy map is the WAN_QUEUING policy map. It defines the bandwidth reservation for the traffic type classes, marks the traffic with the selected marking and sets up WRED with two definitions for certain marking. One is for the normal traffic (the lower drop precedence number) and the other serves to penalize the TCP aggressive flow. Within each of the classes, except for the GOLD class (priority class), there is a nested policy called the POLICIER. These service policies ensure the enforcement of the proposed preference alteration schema. The POLICE class police the UDP flows at a rate of 2% of the total interface bandwidth ensuring that it won’t die out, but at the same time won’t interfere any longer with the other flows. The TCP_MARKER classes set the marking of a TCP aggressive flow to a higher drop precedence AF number, which causes the packets from the flow to be dropped more aggressively thus, signaling the sender to back up.

interface_conf.jpg
Figure 4 Interface specific configuration

The interface to which the policy map should be applied is chosen in the autoconfiguration phase. The max-reserved bandwidth command allows reserving all the available interface bandwidth to the particular classes. One of the important factors is to tailor the bandwidth command according to the Layer to interface type. It is necessary to remember that the most common queuing strategies work on the network layer and do not take Layer 2 overhead into consideration. For example let’s assume an Asynchronous transfer mode (or ATM) WAN link and a constant flow of 64 byte sized packets. The ATM networks transfer data in units called cells. Each cell is 53 bytes long from which 5 bytes is the cell header. Furthermore the ATM adds to each packet an 8 byte LLC/SNAP header and an 8 AAL5 byte trailer. The IP packet together with the ATM headed and trailer is divided into multiple cells. Following the chosen scenario the 80 bytes are divided into two cells, which leave another 10 bytes of overhead for the cell headers. The ATM technology does not allow combining multiple packets into one cell, therefore it introduces padding – if the packet size doesn’t fit exactly to the size of the cell, the leftover bytes are transferred empty. For the scenario in mind the padding would mean another 16 bytes of overhead. Thus the real amount of data transferred is 106 bytes rather than the original 64 bytes. Not taking the overhead into account when configuring QoS can lead to unpredictable behavior during congestions. To avoid this unwanted behavior, the bandwidth command needs to be applied carefully to the WAN interface to ensure, the control of drops and thus the performance.

SLA_probes.jpg
Figure 5 IP SLA monitor probes configuration

The application configures 4 SLA udpEcho probes, to send out a control packet every 10 seconds to the Responder IP address and keeps 1 record of the performed operation in memory for the application to get the information via the SNMP protocol. The last parts of the configuration sent to the router are the access lists for traffic classification. The access lists conceived uniformly. Each application has its own entry in the form of protocol and destination port number. The application then sends a command to download and apply the configuration to the router via SNMP. Upon receiving the SNMP command the router initiates a TFTP session to the IP address that is a part of the SNMP object identifier and downloads the configuration file, which has been sent to the router as a parameter of the SNMP command. The router reads the file and merges the partial configuration stored in the file with the running configuration in the NVRAM.

Monitoring

Upon the successful application of the configuration the optimizing program advances into the monitoring phase. In order to achieve the best outcomes in performance for the given link, the interval for collecting the relevant data from the router has to be properly set. For sufficient statistical purposes interval up to five minutes are accepted. But for alteration of congestion behavior on the router a 5 minute interval is not acceptable. Too short intervals may on the other hand cause too much processor overhead for the interrogated router. The monitoring interval is set for 10 seconds according to the frequency of the IP SLA probes. In each interval the router will be interrogated for the following information:
  • Packet drops in each QoS class
  • Round trip time sample for each QoS class
  • Class utilization
  • Timestamps for the collected data
The selected interval of 10 seconds is long enough to account for the bursty nature of network traffic and so the collected samples are sufficient to prove that congestion is present at the interface. The thresholds the measured values are set against are derived from values measured under low or no link utilization. Should the measured values at some point in time break the kept thresholds, the application uses the above described algorithm to decide whether the setup is eligible for a change. If so, the application performs the change. The input for the change functions is the class that breached its threshold so that the method knows where to look for the aggressive flows.

Change

Before any configuration change can occur the cause of the overutilization within the class that violated the threshold needs to be identified. Based on the time when the thresholds are breached, the change method builds a query for the BasicMeter collector database, to get the all the flows with the following characteristics:
  • Timestamp beginning at actual time minus the active timeout of the BEEM until present
  • TOS value equal to the marking associated with the specific class
  • Records sorted out by the total amount of bytes per flow
The program than takes the biggest flow according to the bytes count and checks whether the flow is protected. If it is, it skips the flow and continues to the next one. Once it finds the most aggressive flow that is not protected, it decides upon how to perform the change based on the transport layer protocol of the flow As discussed earlier, if the aggressive flow is a TCP flow, the access-lists are modified to shift the flow to a predefined class, which will change its marking to a higher drop precedence AF marking, which according to the applied WRED configuration means, that the flow will experience higher drops, thus will throttle back. Should the aggressive flow be an UDP flow, the access list change will shift it into a class with a predefined policier, which will keep the flow in the specified limits. This change should ease of the congestion at the interface and allow the monitoring to gather better RTT than prior to the change. However as the main point of the change is dropping the packet from the aggressive flows, a higher drop rate is expected to be recorded. As for the timeout of the executed change, the aggressive flow should remain in the change conditions until the overall available bandwidth of the interface that previously experienced congestion is higher than the drop rate of packets due to the executed change. If these conditions are met, the application removes the access lists that shifted the aggressive flow into the penal classes, adds them back to their respective place and returns to the monitoring phase again.

Description of classes fields and methods

The description of the classes fields and records can be found in the javadoc generated and attached in the SVN

-- MartinKusnirik - 19 May 2009

Topic revision: r1 - 20 May 2009 - 00:01:38 - MartinKusnirik
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback