Differentiated quality-of-protection provisioning with probabilistic-shared risk link group in survivable flexi-grid optical networks

Abstract. A multilink failures model, i.e., probabilistic-shared risk link group (PSRLG), is adopted to investigate the problem of differentiated quality-of-protection (QoP) provisioning for flexi-grid optical networks. As a metric, service failure probability (SFP) is introduced to exactly examine the feasibility of differentiated QoP schemes, which denotes the failure probability of a connection during transmission. According to different reliability requirements, connection requests are divided into three classes, i.e., class high, class middle, and class low. Then two differentiated QoP provisioning schemes are proposed based on the class division, i.e., intraclass-shared resource scheme (ICSR scheme) and cross-class-shared resource scheme (CCSR scheme). The former allows a connection to share backup resources only with those connections in the same class, whereas the latter enables the connections in different classes to share backup resources. Simulation results show that our proposed schemes could well provide differentiated reliability with PSRLG constraint and achieve a good balance between reliability and resource efficiency. Moreover, the CCSR scheme achieves lower blocking probability, lower resource redundancy, and higher spectrum utilization without sacrificing reliability compared to the ICSR scheme.


Introduction
Recently, flexi-grid optical networks have attracted much attention from academia and industry. 1,2 Different from the conventional wavelength division multiplexing (WDM) networks, flexi-grid optical networks could allocate just enough spectrum resources for the clients and support the subwavelength, super-wavelength, and multiple-rate data traffic requirements, thereby it could achieve higher spectrum resource efficiency. While different services in the current networks often require different reliability, the critical services, such as tel-surgery, require much higher reliability, whereas other services, such as entertainment videos, have relatively lower requirements for reliability. 3 Because of this, multiple quality-of-protection (QoP) was introduced, where services would be divided into different classes. 4 When failures occur, the failure probability of a service depends on its QoP. A quantitative framework of QoP against a singlelink failure has been proposed in Ref. 4, where connection requests are divided into four classes, i.e., guaranteed protection, best effort protection, unprotected traffic, and pre-emptable traffic. Different path protection algorithms are provided for different classes of connection requests, respectively. Reference 5 extended the QoP framework above and studied the optimal design of a span-restorable mesh network capacity. Partial protection mechanisms were used to achieve differentiated QoP provisioning in WDM mesh networks. 6 Then, Ref. 7 put forward two-backup bandwidth aggregation schemes, i.e., backup bandwidth sharing and backup bandwidth multiplexing, by which differentiated QoP could be provided. Preconfigured protection structures (pstructures) are employed to support multiple QoP in Ref. 8, and differentiated QoP against arbitrary doublelink failures was studied in Ref. 9.
Nevertheless, the research above is only limited to singlelink failure and double-link failures. With network topology becoming more and more complex, failures in optical networks are becoming simultaneous and are correlated with each other. In this article, a multifailures' model, i.e., probabilistic-shared risk link group (PSRLG), is introduced to study QoP, which is developed from the conventional-shared risk link group (SRLG). 10 Different from SRLG, PSRLG introduces a probabilistic view, which means that links belonging to a PSRLG would fail with some probability instead of failing deterministically. It is obvious that the PSRLG is more applicable to a number of failure scenarios. For example, when an electromagnetic pulse attack or earthquake occurs, communication links in the vicinity of the disaster may have a higher failure probability than those distant from the disaster. 10 Among the common protection methods for optical networks, path protection, especially shared-path protection is more widely employed [11][12][13][14][15] because it is easier to implement in the current phase and has a high resource utilization. In Ref. 14, full SRLG-disjoint protection (FSDP) was proposed, where each connection request was assigned one working path and one SRLG-disjoint backup path. Any two backup paths can share a common backup resource if their corresponding working paths are full SRLG-disjoint. Reference 15 extended FSDP and proposed partial SRLG-disjoint protection, where backup resources could be shared only if the joint reliability of their corresponding working paths can satisfy the survivability requirements other than being strictly SRLG-disjoint.
Based on these works mentioned above, three shared-path protection algorithms are designed, which are full PSRLGdisjoint protection (FPDP), partial PSRLG-disjoint protection (PPDP), and full link-disjoint protection (FLDP). Then, two differentiated QoP provisioning schemes are proposed, i.e., intraclass-shared resource scheme (ICSR scheme) and cross-class-shared resource scheme (CCSR scheme), to provide differentiated QoP for flexi-grid optical networks. The rest of this article is organized as follows. In Sec. 2, we first describe the network model and give the definition and derivation process of service failure probability (SFP). Then, Sec. 3 proposes the differentiated QoP schemes with PSRLG. Simulation results and an analysis are given in Sec. 4. Section 5 concludes this article.
2 Network Model and Service Failure Probability

Network Model
For flexi-grid optical networks, PSRLG is adopted as the multiple-link failures model. It is assumed that there are N PSRLG events in total (just like a service matrix), which are represented by set R, i.e., R ¼ fr 1 ; r 2 ; · · · r N g. r n means that the n'th PSRLG in R fails. π r represents the probability of r ∈ R. In this article, we employ mutually exclusive PSRLGs, 3 which means that only one PSRLG failure event can take place every time, i.e., P r∈R π r ¼ 1. Also, in one PSRLG failure event, there may be simultaneous multiple link failures. The architecture of flexi-grid optical networks could be described as GðV; E; SÞ, where V, E, and S denote the nodes set, the bidirectional links set, and the frequency slots, respectively. Each optical node is equipped with bandwidth variable optical cross-connects and flexible transponders. Any link in E will be denoted by l k;l for k; l ∈ V, meaning that the link starts from node k and ends at node l. Because there are N PSRLG events in total, then there is an N-dimensional vector P k;l , P k;l ¼ fp r 1 k;l ; p r 2 k;l ; · · · ; p r N k;l g for each link ðk; lÞ. p r k;l ∈ ½0; 1 represents the failure probability of link ðk; lÞ when PSRLG event r occurs. If p r k;l > 0, we say link ðk; lÞ belongs to PSRLG r. Each connection request arrives with a source node and destination node, random bandwidth requirement, and QoP requirement. According to the QoP requirement, an appropriate algorithm would be chosen to compute a pair of paths and to assign the frequency slots, where both the spectral continuity constraint and spectral consecutiveness constraint must be satisfied. 16 The guard bands are assumed to be allocated with the service spectrum resource together, so they have not been considered separately.

Service Failure Probability
To exactly evaluate the performance of the differentiated QoP schemes, it is necessary to quantitatively measure the reliability of a connection. Considering that we defined a new metric, i.e., SFP, which denotes the failure probability of a connection during transmission when shared-path protection is employed, now we try to deduce the expression of SFP with a premise that the primary path and its backup path comply with link-disjoint constraint and that backup paths could share a spectrum resource only if the corresponding primary paths are link-disjoint.
We try to deduce the SFP (P C m ) of a connection request C m . P C m consists of two components: the probability of the primary path and corresponding backup path failing simultaneously, which is defined as P PBF;C m ; the probability of failing in competing for a backup resource, which is defined as P CRF;C m . So, the SFP could be indicated as follows: P PBF;C m is deduced as follows. First, we get the failure probability of a path h when PSRLG event r happens (P r h ). A path could survive only if all the links it passes through are not affected by failures. The survival probability of each link on the light path is 1 − p r k;l , ðk; lÞ ∈ h. With that we could get the survival probability of the light path Q ðk;lÞ∈h ð1 − p r k;l Þ. Then, the failure probability of the path h could be denoted by the following equation: Because the primary path (h W;C m ) and backup path (h B;C m ) are disjoint, they are mutually independent. Considering that we could get the P PBF;C m when PSRLG event r happens (P PBF;C m ) is indicated as follows: As mentioned in Sec. 2.1, mutually exclusive PSRLGs are adopted in this article, so P PBF;C m should be expressed by the following equation: Now, we try to deduce P CRF;C m . Suppose that if there are z connections competing for a backup resource simultaneously, the success rate of each connection is 1∕z. P r SB;C i represents the probability of connection C i employing its backup resource when PSRLG event r occurs. The connection would be switched to the backup path only if its primary path fails while its backup path works well. So, P r SB;C i could be described by the following equation: For computing the P CRF;C m , we need to consider a set S r C m with M elements, which contains all the connections that may compete for the backup resource with C m when PSRLG r occurs. Connection s 1 ∈ S r C m would compete for the backup resource with connection C m when the two connections are simultaneously switched to backup paths. The probability is: P r SB;C m Ã P r SB;s 1 , where we could get the probability that there is only one connection competing for the backup resource with service C m : P ðP r SB;C m Ã P r SB;s 1 Þ. Then, the probability of two services P s 1 ∈S r Cm P s 2 ∈ S r C m s 2≠s 1 ðP r SB;C m Ã P r SB;s 1 Ã P r SB;s 2 Þ. By that analogy, the probability of n (n ≤ M) competing for the backup resource with service C m (P r C;C m ;n ) could be expressed as follows: Based on the equations above, we could get the failure probability of service C m when PSRLG event r happens (P r CRF;C m ) as follows: Then, P PBF;C m could be written as follows: 3 Differentiated Quality-of-Protection Provisioning with Probabilistic-Shared Risk Link Group

Shared-Path Protection Algorithms
In this section, we first present three shared-path protection algorithms, which are the basis of our proposed differentiated QoP schemes.

Full link-disjoint protection algorithm
For the FLDP algorithm, the primary path and corresponding backup path must be link-disjoint. Backup paths could share a resource only if their corresponding primary paths are linkdisjoint. We develop the FLDP algorithm from a greedy algorithm proposed in Ref. 3, which only focuses on path computation and does not consider resource allocation. Service requests can be divided into two groups, i.e., arrival events (AE) and departure events (DE). 17 The former needs to compute routes and assign spectrum resources, whereas the latter needs to tear down routes and release spectrum resources. We describe the general steps of FLDP as follows. FLDP algorithm could be further illustrated by the following example. Figure 1(a) shows the China Education and Research Network (CERNET) topology with eight PSRLG events. The probability of every PSRLG event is 1∕8, i.e., π r ¼ 1∕8, r ∈ R. The PSRLGs that each link belongs to and the failure probability of the link when each PSLRG occurs has been figured out, so that the symbol "{(3, 0.1), (4, 0.5)}" on link (1,8) means that the link belongs to r 3 and r 4 , and p r 3 1;8 ¼ 0.1, p r 4 1;8 ¼ 0.5. Now there is a connection request from node 0 to node 4, link costs are calculated, and then the Dijkstra algorithm is used to calculate the primary path: 0-3-4, as shown in Fig. 1(b). In Fig. 1(c), for computing the backup path, all the links on the primary path are pruned, and the costs of the remaining links are updated. Then, we could get the backup path: 0-2-1-4 by the Dijkstra algorithm. Another connection from node 0 to node 6 is built in the same way, as shown in Fig. 1(d). The two primary paths are link-disjoint, so backup resource sharing is allowed on link (0, 2).
Based on this simple analysis, we could see that the reliability of a connection established by this algorithm would be affected by two factors: the first one is that the primary path and backup path simultaneously fail, for example, in Fig. 1(d), when PSRLG r 5 occurs, link (5, 6) on P W2 and link (6,9) on P B2 may fail simultaneously; the second one is caused by the failure in competing for the backup resource. For example, in Fig. 1(d), when PSRLG r 1 occurs, link (1, 3) on P W1 and link (0, 5) on P W2 may simultaneously fail, then those two services would be switched to P B1 and P B2 , respectively. Then, one of the two services will not be restored for lack of an available backup resource on link (0, 2).

Partial probabilistic-shared risk link groupdisjoint protection algorithm
To obtain better reliability, PPDP gets rid of spectrum resource competition by adopting a stricter constraint for backup resource sharing. Besides link-disjoint, PSRLGdisjoint is also indispensable. Considering the same connection requests from node 0 to 4 and from 0 to 6 in Fig. 1, we could get the same primary paths and backup paths as FLDP employed, as is shown in Fig. 1(d). But P B1 and P B2 cannot share the backup resource on link (0, 2), because link (0, 3) on P W1 and link (0, 5) on P W2 belong to the same PSLRG r 1 , and the primary paths are not PSRLG-disjoint. From this instance, we could see that the PPDP has a lower degree of backup resource sharing compared to FLDP. Although PPDP could improve reliability, it has a lower resource efficiency than FLDP.

Full probabilistic-shared risk link group-disjoint protection algorithm
FPDP algorithm manages to further improve the reliability on the basis of the PPDP algorithm. PSRLG-disjoint is an indispensable constraint for both route calculation and backup resource sharing. On one hand, PSRLG-disjoint for primary path and corresponding backup path could ensure that the primary path and corresponding backup path would not fail simultaneously. Once the primary path fails, the connection could be switched to the backup path. On the other hand, connections could share a backup resource only if their primary paths are PSRLG-disjoint. This constraint could ensure the primary paths could not simultaneously fail, i.e., the connections cannot be simultaneously switched to their backup paths, which could completely avoid competition for the backup resource. So, FPDP could restore any connection with 100%. In spite of its high reliability, the poor performance on blocking probability cannot be ignored. For example, the connection request from node 0 to node 6 in Fig. 1(d) would be blocked if FPDP is employed, because two PSRLG-disjoint paths from node 0 to node 6 do not exist. According to the description above for those shared-path protection algorithms, we could see that the improvement in reliability is generally at the cost of a lower resource sharing degree or a higher blocking probability. So, employing protection algorithms with different reliabilities introduce a tradeoff, i.e., providing differentiated QoP is a tradeoff between reliability and network resource efficiency. This is of great importance for both service demand and network operation.

Differentiated Quality-of-Protection Schemes
Based on the shared-path protection algorithms proposed in Sec. 3.1, we propose two differentiated QoP schemes to provide a unified solution of jointly supporting services with different reliability requirements, which are the ICSR scheme and the CCSR scheme.

Intraclass-shared resource scheme
In this scheme, according to the reliability requirements, connection requests are divided into the following three classes: class high, class middle, and class low. Correspondingly, FPDP, PPDP, and FLDP are employed, respectively. For example, when a connection request of class high arrives, FPDP is adopted to calculate routes and assigned a spectrum resource. Backup resource sharing is implemented only within the services of the same class. This is a natural combination of the different shared-path protection algorithms.

Cross-class-shared resource scheme
Different from the ICSR scheme, the CCSR scheme tries to improve the backup resource sharing degree without affecting the reliability of connections, which could be achieved only if the backup resource competition ratio does not increase with the improvement of the backup resource sharing degree. So, the CCSR scheme enables backup resource sharing among connections of different classes only if their primary paths are PSRLG-disjoint.
We use CERNET topology to illustrate the ICSR scheme and the CCSR scheme. It is assumed that there have been three connections of different classes on the networks, as shown in Fig. 2(a). When a new connection request of class low from node 4 to node 9 arrives, routes are given directly without a calculation progress, primary path: 4-5-6-9, backup path: 4-1-2-9. The ICSR scheme is illustrated in Fig. 2(b), where the new connection could only share a backup resource with the one of class low, and other connections of different classes are not considered. According to FLDP, the two connections of class low are not link-disjoint, so backup resource sharing is not allowed. In Fig. 2(c), for the CCSR scheme, except for backup resource sharing within the same class, other connections of different classes can also be considered. The primary path of the new connection belongs to r 5 and r 7 , and the primary path of the exiting connection of class middle belongs to r 6 and r 8 . They are PSRLG-disjoint, so backup resource sharing can be implemented on link (2,9). Similarly, the new connection could share a backup resource with the exiting connection of class high on link (1,4).
From the examples above, we can see that the CCSR scheme could obviously improve the degree of backup resource sharing, which is of great importance for network efficiency. From Table 1, we can see that the complexity of all the algorithms are 2Oðn 2 Þ, which can be easily found in the process of FLDP algorithm as shown in Table 1. On the other hand, in order to implement these QoP schemes, the information of the PSRLG group, the probability of each PSRLG

Simulation Results
In this section, we conducted the simulations based on the 14-node National Science Foundation Network (NSFNET) topology and 24-node USA Network (USNET) topology as shown in Fig. 3. Six PSRLG events and nine PSRLG events are generated, respectively, by the same method mentioned by Gerstel and Sasaki, 3 i.e., each PSRLG event is associated with a circle whose center is randomly located on the plane and whose radius is a uniformly distributed random number in (1, 1.5). All the links touched by the circle belong to this PSLRG. The probability of the PSRLG events, namely π r , is uniformly distributed. Then, P r∈R π r ¼ 1. If a link, such as ði; jÞ, belongs to PSRLG r, its failure probability p r ij is set to a uniformly distributed random number in (0.1, 0.9). The traffic arrival follows a Poisson process with arrival rate λ, and the traffic holding time follows a negative exponential distribution with departure rate μ. The traffic load equals to λ∕μ (Erlang), and 100,000 service requirements can be run in the experiment. The bandwidth requirement is uniformly distributed within [2,5] frequency slots, and there are 300 available frequency slots on each fiber link.
We compare the performance of the ICSR scheme and the CCSR scheme, and FLDP, PPDP, and FPDP are adopted  Table 1 Full link-disjoint protection (FLDP) algorithm.
Step 1: Initialize the network information (physical topology, PSRLG events, initialize parameters) and generate X connection requests.
Step 3: Wait for a connection request, if it is AE, turn to Step 4, if it is DE, turn to Step 5; Step 4: For an AE a) Calculate the cost for each link by the following equation: w i;j ¼ P r ∈R ðπ r Ã p r i;j Þ, ði; jÞ ∈ E b) Compute the primary path: P W from source node s to destination node d by Dijkstra algorithm; c) Reserve spectrum resource along the primary path; d) Prune all links on the primary path obtained in (b), and update the costs of remaining links by the equation: w 0 i;j ¼ P ðk;lÞ∈P w P r ∈R ðπ r Ã p r i;j Ã p r k;l Þ, ði; jÞ ∈ E , r ∈ R e) Compute the backup path from source node s to destination node d by Dijkstra algorithm.
f) Reserve spectrum resource along the backup path. For a slot that has been used by other backup paths. It is available only if all of the primary paths protected by it are link-disjoint with P W .
If all the steps above are executed, the connection is built successfully. Otherwise, this connection request drops and blocks. Let X ¼ X − 1, return to step 2.
Step 5: For a DE, the primary path and corresponding backup path are torn down, and their spectrum resource is released. Let X ¼ X − 1, return to step 2.
Step 6: All the connection requests have been dealt with, simulation ends.
individually, respectively. The metrics adopted here for performance evaluation include SFP, blocking probability, redundancy, and spectrum utilization ratio. As mentioned in Sec. 2.2, SFP is the failure probability of a connection during transmission. Blocking probability is the ratio of the number of the connection requests rejected by the networks over the number of all the connection requests arriving at the networks. Redundancy is the ratio of the total consumed backup frequency slots over the total consumed primary frequency slots. Spectrum utilization ratio is the ratio of the sum of the total consumed backup frequency slots and work frequency slots over the total frequency slots.
As shown in Figs. 4(a) and 4(b), the ICSR scheme provides distinct SFP to three classes of services, i.e., the SFP obviously decreases from class low to class high, which illustrates that the ICSR scheme could well achieve differentiated QoP provisioning. The same situation occurs when the CCSR scheme is employed. It can also be observed that for the same class of services, such as class middle, the SFP of the CCSR scheme is almost the same as that of the ICSR scheme. The reason is that the CCSR scheme improves the degree of backup resource sharing under the premise of PSRLGdisjoint, which could guarantee that the SFP is not affected. Figure 5 shows the blocking probability versus traffic load in NSFNET (a) and USNET (b). For the two differentiated QoP provisioning schemes, the ratio of the number of requests for class high, class middle, and class low is 1∶1∶1. For FPDP, LDPDP, and FLDP, we do not associate any QoP requirements with a connection request. As expected, FPDP has the highest blocking probability, and FLDP achieves the lowest. For FPDP, it has more strict constraints for route calculation, i.e., the primary path and corresponding backup path must be link-disjoint as well as PSRLG-disjoint, which leads directly to the increasing of blocking probability. What is more, FPDP is stricter on backup resource sharing, which means that the connections can share a backup resource only if their primary paths are link-disjoint as well as PSRLG-disjoint. This leads to a lower resource sharing degree, which is another reason for the higher blocking probability. We also observe that the the CCSR scheme achieves a lower blocking probability than the ICSR scheme. As mentioned before, the CCSR scheme has a higher degree of backup resource sharing; more available resources could be saved for the coming services. Figure 6 plots the redundancy with NSFNET topology and USNET topology. First, PPDP has a lower redundancy than FPDP. Although the same constraint-PSRLG-disjointneeds to be met when sharing a backup resource for the two algorithms, the blocking probability of PPDP is lower than that of FPDP, and the backup resource sharing degree would increase with an increase in connection numbers. The redundancy of FLDP is further reduced compared to the     Optical Engineering 066111-7 June 2014 • Vol. 53 (6) other two algorithms. In addition, FLDP has the lowest blocking probability; another important reason is that the constraint for backup resource sharing is relaxed from PSRLG-disjoint to link-disjoint. Moreover, Fig. 6 also shows that the CCSR scheme achieves a lower redundancy than the ICSR scheme. One reason is that the CCSR scheme directly increases the degree of backup resource sharing. Another is that the CCSR scheme has a lower blocking probability. The spectrum utilization ratio of different schemes can be seen in Fig. 7. FPDP has the lowest spectrum utilization ratio in both NSFNET topology and USNET topology, which is caused by its obviously higher blocking probability than that of the others. Another phenomenon which is not visible but does exist is that the CCSR scheme has a lower spectrum utilization ratio compared to ICSR scheme. Together with the performance of the ICSR scheme and the CCSR scheme on blocking probability, we can reach the conclusion that the CCSR scheme could support more connections with fewer spectrum resources, namely, it achieves a higher resource efficiency.

Conclusions
In this article, we motivate the need of providing differentiated QoP with a new multilink failures model named PSLRG. Shared-path algorithms including FLDP, PPDP, and FPDP are proposed, based on which two differentiated QoP schemes, i.e., ICSR and CCSR, are put forward to provide differentiated reliability accordingly. To exactly measure the effectiveness of the schemes, a new metric called SFP is proposed. Numerical results show that both the differentiated QoP schemes could achieve a trade-off between reliability and resource efficiency. Compared to the ICSR scheme, the CCSR scheme could obtain a lower blocking probability, lower redundancy, and a better spectrum utilization ratio without disturbing reliability.