Documento by: Gianrico Fichera
Release: 0.2

Problema: In una piattaforma ASR9001 usata come BNG con raccolta PPPoE ci sono dei limiti nel numero di utenti
massimi gestibili in simultanea in relazione alle risorse disponibili dalla configurazione hardware e
dalle licenze in dotazione. Vogliamo verificare le risorse in uso e disponibili su una macchina in esercizio.

Verifiche:

ASR9001 ha due NP Typhoon e due FIA (Fabric Interface Asic).

I customer utilizzano "QoS chunk" nel momento in cui questi hanno un profilo di banda. C’e’ un limite di 32K di sessioni per NPU Typhoon e di 32k shapers. Questi 32k sono divisi in 4 chunks di 8k. Per raggiungere il limite dei 32k devi avere quattro interfacce fisiche in uso. Le 4 tengiga on-board sono divise tra i due NP, le porte dei moduli aggiuntivi sono anche divise tra i due NP. Di fatto sono due gruppi autonomi di porte. I due FIA sono “skytrain 0 e 1” corrispondenti a “NP0 e NP1” tutti collegati a un ASIC “sacramento” RP/0/RSP0/CPU0:ASR9001-A# show controller np ports all Tue Jun 9 15:14:56.705 MEDT Node: 0/0/CPU0: ---------------------------------------------------------------- NP Bridge Fia Ports -- ------ --- --------------------------------------------------- 0 0 0 GigabitEthernet0/0/0/0 - GigabitEthernet0/0/0/19 0 0 0 TenGigE0/0/2/0, TenGigE0/0/2/1 <--- ONBOARD 1 1 1 TenGigE0/0/1/0 – TenGigE0/0/1/1 <--- A9K-MPA-2X10GE 1 1 1 TenGigE0/0/2/2, TenGigE0/0/2/3 <--- ONBOARD Nella configurazione in esame nella TenGiga 0/0/2/3 terminano le sessioni dell'utenza. RP/0/RSP0/CPU0:ASR9001-B#show controllers np portMap all | include 0/0/2/3 Wed Jun 10 11:23:38.954 MEDT 9 TenGigE0/0/2/3 0 (bundle) Vediamo il numero di utenti complessivamente connessi:
RP/0/RSP0/CPU0:ASR9001#show subscriber session all summary Tue Jun 9 14:53:49.632 MEDT Session Summary Information for all nodes Type PPPoE IPSub IPSub (DHCP) (PKT) ==== ===== ====== ===== Session Counts by State: initializing 0 0 0 connecting 0 0 0 connected 3 0 0 activated 7937 0 0 idle 0 0 0 disconnecting 3 0 0 end 0 0 0 Total: 7943 0 0 Session Counts by Address-Family/LAC: in progress 6 0 0 ipv4-only 7937 0 0 ipv6-only 0 0 0 dual-partial-up 0 0 0 dual-up 0 0 0 lac 0 0 0 Total: 7943 0 0

Vediamo le risorse in uso:

RP/0/RSP0/CPU0:ASR9001#show qoshal resource summary location 0/0/CPU0
Tue Jun  9 14:46:31.661 MEDT
TY Options argc:5
nphal_show_chk -p 2081 resource summary
Done
-1
Counters: X(Y/Z): X -> Resources Allocated in HW
                  Y -> Resource Allocated in SW
                  Z -> Refcount of each resource
                  Sanity Check: X==Y && Z >= X
        :X (Y):   X -> Resource Allocated in HW
                  Y -> Resource Allocated in SW
Client - 0, General - Not any Specific Client
 SW and RefCount for Entities are accounted in Client 0(general)
 Only HW count for Entities is per client
NP 0
=============================================================== 

CLIENT : QoS-EA
   Policy Instances: Ingress 0 Egress 18  Total: 18
    Entities: (L4 level: Queues)
     Level        Chunk 0           Chunk 1           Chunk 2           Chunk 3
     L4         0(    0/    0)    0(    0/    0)   12(   12/   12)   24(   24/   24)
     L3(8Q)     0(    0/    0)    0(    0/    0)    6(    6/    6)   12(   12/   12)
     L3(16Q)    0(    0/    0)    0(    0/    0)    0(    0/    0)    0(    0/    0)
     L2         0(    0/    0)    0(    0/    0)    0(    0/    0)    0(    0/    0)
     L1         0(    0/    0)    0(    0/    0)    0(    0/    0)    0(    0/    0)
        Policers: Internal 18(18) Regular 0(0)  Parent 0(0)  Child 0(0)  Total 18(18)

NP 1
===============================================================
...
 CLIENT : QoS-EA
   Policy Instances: Ingress 7869 Egress 7870  Total: 15739
    Entities: (L4 level: Queues)
     Level        Chunk 0           Chunk 1           Chunk 2           Chunk 3
     L4         0(    0/    0)23607(23607/23607)    0(    0/    0)    0(    0/    0)
     L3(8Q)     0(    0/    0) 7869( 7869/ 7869)    0(    0/    0)    0(    0/    0)
     L3(16Q)    0(    0/    0)    0(    0/    0)    0(    0/    0)    0(    0/    0)
     L2         0(    0/    0)    0(    0/    0)    0(    0/    0)    0(    0/    0)
     L1         0(    0/    0)    0(    0/    0)    0(    0/    0)    0(    0/    0)
        Policers: Internal 2047(2047) Regular 21816(21816)  Parent 0(0)  Child 0(0)  Total 23863(23863)


Nota: H-QoS (3layer hierarchical): L2 grand-parent policy, L3 parent-policy, L4 child policy Cioe’ la policy si puo’ fare a tre livelli nidificati. Noi ne usiamo due ed ecco perche’ in L2 non c’e’ nulla. Per ogni L3 noi abbiamo un child con voice, data, default ed ecco perche’ sono 23607. I 4 chunk possono essere divisi anche in 4 subinterfaces oltre che in 4 interfacce fisiche. Ci sono queues hardware per ogni NP. Qui un chunk e’ quasi saturo. La modalita’ 8Q, di default, e’: 8Q → 1 priority-1 queue, 1 priority-2 queue, 6 normal priority queue Una interfaccia fisica con Typhoon in egress ha 4 CoS code (anche 4 in input): - 1 High priority (traffico tipo routing protocols utilizzata di default) cos 6-7 - 2 Medium priority (da utilizzare con policy-map) cos 5 - 3 Medium priority (da utilizzare con policy-map) - 4 low priority (usata di default da tutto il traffico) cos < 5 Una VOQ e’ un set di code e rappresenta una entita’ 10G nel sistema. Ogni coda ha 4 priorita’: P1,P2,BE (best-effort) e mcast e si chiamano anche VQI. Questa priorita’ delle 4 VQI e’ una cosa interna ovvero la priorita’ dei pacchetti nel fabric. Cioe’ se due porte di un NP fanno 10+6giga e quella di uscita una 12giga e una 4giga allora quella 4giga avra’ tutto il traffico disponibile perche’ la scheda di uscita a 12 segnala a ritroso che ha la congestione e quindi si garantisce che in tutto il path interno al router non si scartano i pacchetti della 4giga. "In order for packets to forward from one interface to another in an ASR9K, packets must traverse the fabric. There is no local switching in an ASR9K. How does a packet get from one interface to another though? This is accomplished through the use of VQIs which are assigned to each interface. This way the fabric knows which linecard (LC) and Network Processor (NP) to route the packet." (fonte Cisco) Vedi nota sotto (**) RP/0/RSP0/CPU0:ASR9001-A#show qos capability location 0/0/CPU0 Thu Jun 11 10:28:00.867 MEDT Capability Information: ====================== Max Policy maps supported on this LC: 16384 Max policy hierarchy: 3 Max policy name length: 64 Max classes per child-policy: 1024 Max classes per policy: 1024 Max classes per grand-parent policy: 1 Max police actions per class: 2 Max marking actions per class: 2 Max matches per class : 8 Max class-map name length: 64 Max members in a bundle: 64 Max instance name length: 32 Ecco le code della tengiga0/0/2/3: RP/0/RSP0/CPU0:ASR9001-B#show qoshal default-queue subslot 2 port 3 location 0/0/CPU0 Wed Jun 10 11:23:05.669 MEDT TY Options argc:8 nphal_show_chk -p 2081 default-queue -s 2 -q 3 Done show subslot (2), ifsub 0, port 3, card=15795202 Interface Default Queues : Subslot 2, Ifsubsys 0, Port 3 =============================================================== Port 131 NP 1 TM Port 9 Ingress: QID 0x10000 Entity: 1/0/1/4/0/0 Priority: Priority 1 Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x640000/0x636/0x640001 Statistics(Pkts/Bytes): Tx_To_TM 0/0 Fast TX: 10702843896/2957012152200 Total Xmt 10702843896/2957012152200 Dropped 0/0 Ingress: QID 0x10001 Entity: 1/0/1/4/0/1 Priority: Priority 2 Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x640005/0x637/0x640006 Statistics(Pkts/Bytes): Tx_To_TM 0/0 Fast TX: 0/0 Total Xmt 0/0 Dropped 0/0 Ingress: QID 0x10003 Entity: 1/0/1/4/0/3 Priority: Priority 3 Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x64000f/0x0/0x640010 Statistics(Pkts/Bytes): Tx_To_TM 0/0 Total Xmt 0/0 Dropped 0/0 Ingress: QID 0x10002 Entity: 1/0/1/4/0/2 Priority: Priority Normal Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x64000a/0x638/0x64000b Statistics(Pkts/Bytes): Tx_To_TM 0/0 Fast TX: 0/0 Total Xmt 0/0 Dropped 0/0 Egress: QID 0x10020 Entity: 1/0/1/4/4/0 Priority: Priority 1 Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x6400a0/0x639/0x6400a1 Statistics(Pkts/Bytes): Tx_To_TM 0/0 Fast TX: 14853799610/783385141422 Total Xmt 14853799610/783385141422 Dropped 0/0 Egress: QID 0x10021 Entity: 1/0/1/4/4/1 Priority: Priority 2 Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x6400a5/0x63a/0x6400a6 Statistics(Pkts/Bytes): Tx_To_TM 0/0 Fast TX: 0/0 Total Xmt 0/0 Dropped 0/0 Egress: QID 0x10023 Entity: 1/0/1/4/4/3 Priority: Priority 3 Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x6400af/0x0/0x6400b0 Statistics(Pkts/Bytes): Tx_To_TM 0/0 Total Xmt 0/0 Dropped 0/0 Egress: QID 0x10022 Entity: 1/0/1/4/4/2 Priority: Priority Normal Qdepth: 0 StatIDs: commit/fast_commit/drop: 0x6400aa/0x63b/0x6400ab Statistics(Pkts/Bytes): Tx_To_TM 0/0 Fast TX: 1489909192/1768202091336 Total Xmt 1489909192/1768202091336 Dropped 0/0 ---------------------- QoS si riferisce sia al CoS ToS – Il CoS permette di raggruppare flussi di traffico totalmente differenti tra loro. Il ToS e’ IP e quindi in qualche modo e’ legato ad un traffico piu’ specifico. Il Queuing e’ una struttura FIFO. MQC Hierarchy Cisco Modular QoS CLI (MQC) framework is the Cisco IOS QoS* user language introdotto con il CB-WFQ in IOS 12.0.5(T). Consiste nella terna di comandi “class-map”, “policy-map” e “service-policy”. MQC supporta il concetto di hierarchical policy. Esempio di policy gerarchica: - Prendi una porzione di banda da un link, ad esempio 100mega su un link di 1giga. Questo e’ il livello 1. - Su questi 100mega dai priorita’ alla voce e lascia best-effort il resto entro i 100mega. Questo e’ il livello 2. Se poi una applicazione non usa la banda, questa e’ a disposizione per gli altrimenti Questo e’ tipico ad esempio nel BNG con ASR per i clienti. Tutti i clienti arrivano nella stessa interfaccia fisica ma hanno una banda massima riservata. All’interno di quella banda hanno una seconda policy con una priorita’ per la voce. ! ! Esempio di policy gerarchica su IOS-XR ASR9001 ! policy-map policy_livello2_child class voice priority level 1 ! strict-priority sino al tetto di 200kbps che si puo’ superare senza garanzia police rate 200 kbps ! CIR – oppure “police rate percent” se non si usa libera per altri ! class data bandwidth remaining percent 20 < - - questo ho il dubbio che sia efficace qui ! class class-default ! end-policy-map ! policy-map policy_livello1_parent class class-default service-policy policy_livello2_child shape average 30500 kbps <-- valore massimo con attesa in coda se possibile ! end-policy-map ! ! Typhoon – max 8 child queues per parent Esempio: 1 priority 1, 2 priority 2, 5 normal priority 2-level nested policy maps supported La corrispondenza della policy di cui sopra in hardware e’ (il nome della policy x semplicita’ non e’ l’originale): RP/0/RSP0/CPU0:ASR9001#show qos interface bundle-Ether 1.19.pppoe45731 output Wed Jun 10 12:14:32.368 MEDT Interface: TenGigE0_0_2_3 output Bandwidth configured: 10000000 kbps Bandwidth programed: 10000000 kbps ANCP user configured: 0 kbps ANCP programed in HW: 0 kbps Port Shaper programed in HW: 0 kbps Policy: policy_livello1_parent Total number of classes: 4 ---------------------------------------------------------------------- Level: 0 Policy: policy_livello1_parent Class: class-default QueueID: N/A Shape CIR : NONE Shape PIR Profile : 1/3(S) Scale: 476 PIR: 30464 kbps PBS: 380800 bytes WFQ Profile: 1/9 Committed Weight: 10 Excess Weight: 10 Bandwidth: 0 kbps, BW sum for Level 0: 0 kbps, Excess Ratio: 1 ---------------------------------------------------------------------- Level: 1 Policy: policy_livello2_child Class: voice_out Parent Policy: policy_livello1_parent Class: class-default QueueID: 68024 (Priority 1) Queue Limit: 3 kbytes Abs-Index: 3 Template: 0 Curve: 6 Shape CIR Profile: INVALID Policer Profile: 60 (Single) Conform: 200 kbps (200 kbps) Burst: 2500 bytes (0 Default) Child Policer Conform: TX Child Policer Exceed: DROP Child Policer Violate: DROP ---------------------------------------------------------------------- Level: 1 Policy: policy_livello2_child Class: data_out Parent Policy: _ policy_livello1_parent Class: class-default QueueID: 68026 (Priority Normal) Queue Limit: 78 kbytes Abs-Index: 21 Template: 0 Curve: 0 Shape CIR Profile: INVALID WFQ Profile: 1/19 Committed Weight: 20 Excess Weight: 20 Bandwidth: 0 kbps, BW sum for Level 1: 0 kbps, Excess Ratio: 20 ---------------------------------------------------------------------- Level: 1 Policy: policy_livello2_child Class: class-default Parent Policy: policy_livello1_parent Class: class-default QueueID: 68027 (Priority Normal) Queue Limit: 304 kbytes Abs-Index: 45 Template: 0 Curve: 0 Shape CIR Profile: INVALID WFQ Profile: 1/71 Committed Weight: 81 Excess Weight: 81 Bandwidth: 0 kbps, BW sum for Level 1: 0 kbps, Excess Ratio: 80 ---------------------------------------------------------------------- RP/0/RSP0/CPU0:ASR9001-B# Qui ci sono 1757 utenti collegati: RP/0/RSP0/CPU0:ASR9001-B# sh controllers np struct EGR-UIDB summary location 0/0/CPU0 Wed Jun 10 12:38:30.738 MEDT Node: 0/0/CPU0: ---------------------------------------------------------------- NP: 0 Struct 10: EGR_UIDB Struct is a PHYSICAL entity Reserved Entries: 0, Used Entries: 1834, Max Entries: 65536 NP: 1 Struct 10: EGR_UIDB Struct is a PHYSICAL entity Reserved Entries: 0, Used Entries: 1841, Max Entries: 65536 RP/0/RSP0/CPU0:ASR9001-B# Se non si usano le code si possono creare utenti fino a riempire 65536 entries. Le entry sono 1834 invece di 1757 perche’ c’e’ una dinamica di allocazione/rilascio legata agli utenti che fanno flapping. Quindi in teoria si puo’ arrivare a 64k di utenti per NP senza queueing. -------------------------------- -------------------------------- -------------------------------- -------------------------------- -------------------------------- -------------------------------- -------------------------------- Nota (**) Qui si cerca di seguire il VQI di ingresso e uscita dentro ASR9001 per un prefisso. Interfaccia di ingresso Tengiga0/0/2/3: RP/0/RSP0/CPU0:ASR9001-A#show ip cef 172.27.121.137 detail < - - BE1.18.pppoe39965 tengig0/0/2/3 Thu Jun 11 10:41:20.411 MEDT 172.27.121.137/32, version 5576304, attached, subscriber, internal 0x1000041 0x0 (ptr 0x9f4cd2ac) [1], 0x0 (0x9ee30d00), 0x0 (0x0) Updated Jun 11 03:23:04.193 Prefix Len 32, traffic index 0, precedence n/a, priority 3 gateway array (0x9dc980c8) reference count 211, flags 0x200000, source rib (7), 0 backups [212 type 3 flags 0x1008401 (0x9ddc6d10) ext 0x0 (0x0)] LW-LDI[type=3, refc=1, ptr=0x9ee30d00, sh-ldi=0x9ddc6d10] gateway array update type-time 1 Apr 29 04:12:44.116 LDI Update time Apr 29 04:12:44.117 LW-LDI-TS Jun 11 03:23:04.193 SUBS-INFO[0x9eea0e40 IFH=0x1065e0 (Subs I/F Bundle-Ether1.18.pppoe39965) NH=0x9d7ca844 Flags=0x28] via Bundle-Ether1.18.pppoe39965, 2 dependencies, weight 0, class 0 [flags 0x8] path-idx 0 NHID 0x0 [0x9f499ba0 0x0] local adjacency Load distribution: 0 (refcount 212) Hash OK Interface Address 0 Y Bundle-Ether1.18 point2point RP/0/RSP0/CPU0:ASR9001-A# Uscita verso internet tengiga0/0/1/1 RP/0/RSP0/CPU0:ASR9001-A#show ip cef 100.126.0.1 detail ←-- tengiga0/0/1/1 Thu Jun 11 10:42:02.367 MEDT 100.126.0.1/32, version 0, internal 0x1020001 0x0 (ptr 0x9de95148) [3], 0x0 (0x9de5a090), 0x0 (0x0) Updated Nov 6 01:00:21.402 Prefix Len 32, traffic index 0, Adjacency-prefix, precedence n/a, priority 15 gateway array (0x9dc9bec8) reference count 1, flags 0x0, source internal (11), 0 backups [2 type 3 flags 0x8401 (0x9ddc4910) ext 0x0 (0x0)] LW-LDI[type=3, refc=1, ptr=0x9de5a090, sh-ldi=0x9ddc4910] gateway array update type-time 1 Nov 6 01:00:21.402 LDI Update time Nov 6 01:00:21.403 LW-LDI-TS Nov 6 01:00:21.403 via 100.126.0.1/32, Bundle-Ether200.2, 2 dependencies, weight 0, class 0 [flags 0x0] path-idx 0 NHID 0x0 [0x9d7c17e4 0x0] next hop 100.126.0.1/32 local adjacency Load distribution: 0 (refcount 2) Hash OK Interface Address 0 Y Bundle-Ether200.2 100.126.0.1 RP/0/RSP0/CPU0:ASR9001-A#show controller np ports all loc 0/0/CPU0 Thu Jun 11 10:46:32.993 MEDT Node: 0/0/CPU0: ---------------------------------------------------------------- NP Bridge Fia Ports -- ------ --- --------------------------------------------------- 0 0 0 GigabitEthernet0/0/0/0 - GigabitEthernet0/0/0/19 0 0 0 TenGigE0/0/2/0, TenGigE0/0/2/1 1 1 1 TenGigE0/0/1/0 - TenGigE0/0/1/1 1 1 1 TenGigE0/0/2/2, TenGigE0/0/2/3 Qui dovrebbero apparire i vqi in uso: RP/0/RSP0/CPU0:ASR9001-A# show cef 100.126.0.1 hardware egress detail loc 0/0/CPU0 | include vqi Thu Jun 11 11:17:50.356 MEDT out_lbl_invalid: 0 match: 0 vqi/lag-id: 0x0 out_lbl_invalid: 0 match: 0 vqi/lag-id: 0x0 RP/0/RSP0/CPU0:ASR9001-A# show cef 100.126.0.1 hardware ingress detail loc 0/0/CPU0 | include vqi Thu Jun 11 11:18:00.378 MEDT out_lbl_invalid: 0 match: 0 vqi/lag-id: 0x0 out_lbl_invalid: 0 match: 0 vqi/lag-id: 0x0 RP/0/RSP0/CPU0:ASR9001-A# Che poi dovrebbe fare match con il valore di switch_fabric_port sotto. Il valore switch_fabric_port dovrebbe essere il VQI solo che non trovo un match come invece dalla doc vedo.
Probabilmente perche’ siamo in ASR9001 e ci sono due NP che fanno capo ad una sola CPU quindi il traffico
dello switch fabric non va verso altri moduli ma torna tutto indietro.
RP/0/RSP0/CPU0:ASR9001-A#show controllers pm interface tengig 0/0/2/3 location 0/0/CPU0 Thu Jun 11 11:04:03.147 MEDT Ifname(1): TenGigE0_0_2_3, ifh: 0x4000680 : iftype 0x1e egress_uidb_index 0x18, 0x18 ingress_uidb_index 0x18, 0x18 port_num 0x3 subslot_num 0x2 ifsubinst 0x0 ifsubinst port 0x3 phy_port_num 0x3 channel_id 0x1 channel_map 0x0 lag_id 0x1 virtual_port_id 0x1 switch_fabric_port 103 <--0x67 in_tm_qid_fid0 0x10002 in_tm_qid_fid1 0xffffffff in_qos_drop_base 0x640001 out_tm_qid_fid0 0x10022 out_tm_qid_fid1 0xffffffff np_port 0x9 ... RP/0/RSP0/CPU0:ASR9001-A#show controllers pm interface tengig 0/0/1/1 location 0/0/CPU0 Thu Jun 11 11:05:15.965 MEDT Ifname(1): TenGigE0_0_1_1, ifh: 0x4000700 : iftype 0x1e egress_uidb_index 0x1a, 0x1a ingress_uidb_index 0x1a, 0x1a port_num 0x1 subslot_num 0x1 ifsubinst 0x0 ifsubinst port 0x1 phy_port_num 0x1 channel_id 0x1 channel_map 0x0 lag_id 0x5 virtual_port_id 0x0 switch_fabric_port 97 <-- 0x61 in_tm_qid_fid0 0x30002 in_tm_qid_fid1 0xffffffff in_qos_drop_base 0x6e0001 out_tm_qid_fid0 0x30022 out_tm_qid_fid1 0xffffffff np_port 0x6 ...