Blog

Master of the rings with ERP

Feb. 04, 2023

Introduction

Ethernet ring protection switching (ERP), is a widely-used technology providing sub-second convergence, light resource requirements for its control plane operation and simplified deployment, maintenance and troubleshooting engagements. It is designed for networks with a physical ring or interconnected rings structures where ERP would be deployed in the network core but also in the distribution and/or access layers of those networks. In such a ring topology, usually seen in transportation vertical customers, ERP provides a protected single path of communication between any two nodes in the network. The core of ERP consists of ring automatic protection switching protocol (R-APS) and protection switching mechanism that are specified by Telecommunication Standardization Sector of International telecommunication Union (ITU-T) within, at this moment the latest version of “Recommendation ITU-T G.8032/Y.1344 (2020) – Corrigendum 1” document released in February 2022.

The principle of operation

Normal condition

The concept of ring protection architecture is based on a traffic flow on all the ring links except for one, called the Ring Protection Link (RPL). In normal operation, where the ring nodes are in an idle state, the RPL is blocked by the RPL owner node, which is directly attached to RPL, to prevent network loops. The RPL node blocks traffic for all protected VLANs. In this state, the R-APS protocol messages are used by the RPL owner node to communicate to all other ring nodes that the RPL is blocking and that there is no request for action by any of the ring nodes. This architecture allows for loop-free communication between all the ring nodes and the layer 2 domains connected to the ring nodes on non-ring ports. Figure 1 is a representation of an ERP network under normal conditions and learned MAC addresses of users A and B on respected ports of nodes in the network.

Failure condition In the event of a ring failure (either link or node) the RPL is unblocked to allow for traffic to flow over it and maintain network connectivity. The coordination of ring protection actions on all nodes is handled by the R-ASP protocol in the following way:

Nodes that are directly connected to both sides of the failed link will also internally block traffic forwarding on ports experiencing link failure. They will then start sending R-ASP messages with SF (signal fail) code, informing other nodes in the ring about new condition in which a directly connected link experienced failure. Both nodes will also flush their forwarding database (FDB) for faster convergence of the network
Upon receiving these messages for the first time all the ring nodes, except for RPL owner, will flush their FDB as well.
RPL owner will flush its FDB but will also unblock traffic over the previously blocked RPL

When a link failure occurs in the ring, the ring nodes enter a protective state. After flushing their FDBs, the MAC addresses of users A and B will be learned on different ports, respectively. This process of the protective switching mechanism, starting with the indication of link failure and finishing with re-establishing communication between all the nodes in the network, should take place within 50 milliseconds.

Failure recovery

Upon detection of a failed link recovery, the nodes on both ends of the recovered link will start a guard timer, and will change the type of R-APS messages. Instead of informing other nodes about link failure, new R-APS messages will inform ring nodes about the link recovery using R-ASP message with NR (no request) code. Even though the physical link has been recovered both nodes will continue to internally block traffic forwarding over the ports on both sides of the recovered link. Both nodes on each side of the recovered link will also receive R-ASP messages from the other node but will not process them until the guard timer expires. When this takes place both nodes will process R-APS NR message from the other node and will decide which of two will internally enable forwarding traffic over the recovered link, leaving the other one to continue internally blocking.

In revertive operation it is expected for a network to return to normal condition of operation as soon as all the links are available. In this case, upon receiving R-ASP NR messages, the RPL owner will start Wait-To-Restore timer to avoid intermittent link defects during which traffic will still be forwarded over RPL. Once this timer expires, the RPL owner will :

initiate blocking traffic over the RPL,
flushes its FDB, and
start sending RAPS messages with the code No_Request, RPL-_Blocked (NR, RB) informing all ring nodes that the RPL is blocked

Upon receiving this R-APS (NR, RB) message, all the ring nodes will flush their FDB to improve network convergence while the node still internally blocking the traffic over recovered link will start to forward again. This way network will return to the state as described in figure 1 with the RPL blocked, and the normal flow of traffic on all other links.

In non-revertive operation, regardless of failed link recovery, the traffic will continue to be blocked at the RPL, avoiding the second network interruption. In this case, the network will not automatically revert to the normal condition, and the process of reverting the network to the normal state would have to be performed during a maintenance window by an administrator. reverting network to normal condition would be performed during maintenance window by administrator.

ERP convergence

The ERP protocol is a technology used in mission-critical networks because it is designed to complete the protection switching operations within a 50 ms interval. However, this interval, as specified in the standard, is only respected under the following conditions:

there is no congestion in an Ethernet ring network
all nodes are in the idle state
number of nodes in the ring is fewer than 16
ring fiber length is lower than 1200 km

In case any of above conditions are not fulfilled, the protection switching operation may be longer than 50ms.

The 50 ms period for the protection switching operation should not be interpreted as the end-to-end network convergence time. The actual network convergence time may be longer due to other factors such as :

Convergence of routing protocols including Virtual Router Redundancy protocol (VRRP)
Layer 2 loop prevention protocols used between access and core layers
Virtual chassis configuration adding complexity of synchronizing control planes and FDB flashes among nodes

ERP version 2

Modern mission-critical networks use a hierarchical design in which the core layer is built on the ring topology and to which access layer network nodes should be connected in a redundant way to prevent a single point of failure. Redundant connected devices, in this case, require technology that can prevent Layer 2 loops by automatically detecting and disabling forwarding over all but one link. The recommended solution to connect one or more access devices to the core ring network is to deploy a multi-ring topology, as seen in figure 3. Multi-ring or ladder topology is supported by ERP version 2.

Multi-ring topology consists of the major ring and one or more sub-rings that could be connected in multiple ways as long as the following conditions are respected:

R-APS protocol are not shared across Etherner ring interconnections
On each ring port, each R-APS control protocol and protected VLANs are controlled by only one Ethernet ring
Each major or sub-ring must have its own RPL

Following these principles the network in figure 3 is designed to have two ERP instances, one running in the major ring and one in the sub-ring. The ring link between two interconnection nodes, and related ring ports, are controlled only by the ERP1 instance On the interconnection nodes, ERP2 instance is configured on a single port only, opposite to the two ring ports controlled by the ERP1 instance. The RPL owner functionality is also configured on two different nodes.

ALE Portfolio supporting ERP v2

ALE provides a broad range of products that support ERPv2 protocol such as:

Mission-critical enterprise core product lines OmniSwitch 9900 & OmniSwitch 6900, Premium access switching product lines OmniSwitch 6860E/N, Advance value access switching product lines OmniSwitch 6560 & 6570M, Ruggedized and extended temperature product lines OmniSwitch 6865 & OmniSwitch 6465 / 6465T.

Such a wide range of nodes supporting the ERP protocol can fulfill various requirements for mission-critical networks.

Master of the rings with ERP

Share

Introduction

The principle of operation