Multicast networking III - PIM

 Please be advised that this article is work-in-progress. The information here may be vague, incomplete, misleading or plainly wrong.  PIM -> definition -> Protocol Independent Multicast -> layer 3 protocol -> multicast routing protocol -> unlike other multicast routing protocols that came before it (DVMRP, MOSPF), PIM does not maintain a multicast routing database -> it uses the information in the unicast routing tabe -> a major duty is to perform, reliable RPF checks on the multicast traffic -> it is 'independent' of the unicast routing protocol used in the network. As long as protocol(s) populate the unicast routing table, PIM can perform its duties.

-> messages -> HELLO MESSAGE -> usage ->  used to establish neighbour adjacencies through the network -> Hello messages are used to discover directly connected, PIM configured nodes -> leads to the creation of multicast trees (topology of links where multicast traffic can flow) -> routed multicast traffic will only flow between two directly connected PIM nodes -> also advertises the PIM capabilities (such as Proxy, GenerationID capable, State Refresh capable) -> PIM Hello does not indicate wheter the device is in Dense or Sparse mode. This means that PIM-DM routers can form adjacencies with PIM-SM routers. -> format -> +-PIM Hello--+ |                                                                                                               |						|-IP header-||--PIM Version--|--Type--|--Reserved--|--Checksum--|---Option (TLV)---|---Option (TLV)---|...|---Option (TLV)---| |                                                 |				                                     +---PIM Header-- common to all PIM messages+

-> details -> IP -> Destination IP: 224.0.0.13 (All-PIM_Routers) -> Protocol: 103 (PIM) -> TTL: 1 -> PIM -> Version -> apparently there are 2 verison -> version 1 is so old that it is very rarely (if at all) found deployed -> most likely, this is 2 -> Type: 0 (PIM Hello) -> Reserved -> not used -> checksum -> hash value of PIM data -> Options -> are carried in TLV format which makes PIM a very extensible protocol -> Option 1 -> Hold Time -> how long the neighbour router should wait for a PIM Hello in response before considering the neighbour DEAD (same reasoning as communicating a OSPF Dead Timer in OSPF Hello) -> usually 3.5x the interval at which the Hellos are sent (Hellos are sent at 30s so Hold time would be 105s) -> Option 19 -> Designated Router priority. Used for DR election purposes -> Option 20 -> Generation ID -> optional capability used to handle device failures. -> regenerated whenever PIM forwarding (as a process) was (re)started on the interface (reconfiguration, device reboots, etc) -> if a PIM device detects a different GenID from its neighbour across Hellos, then (depending if the neighbour is upstream or downstream) it will interrupt normal PIM behaviour and take specific actions in order to ensure the validity of information in the (S,G) and (*,G) states -> Option 21 -> State Refresh -> indicates that the router sending the Hello message supports the State Refresh mechanic. -> used in PIM Dense-Mode -> PIM JOIN/PRUNE MESSAGE -> This type of message is used to either -> Prune -> inform the upstream router to stop sending a particular multicast traffic flow towards the sender of the Prune message (downstream router) -> the most common function of this message in PIM Dense-Mode as joining all routers to all multicast streams is implicit. -> Join -> In PIM-SM is used by the Designated Router to inform the Rendezvous Point (via the upstream router) to send a particular multicast stream downstream. -> In PIM-DM is used in a single particular case: Prune Override -> the separation between Join and Prune signaling is made by the message's content. -> format -> verbose format:

+PIM JOIN/PRUNE MESSAGE---+ |+PIM HEADER+ +-THE REST OF THE PIM JOIN/PRUNE MESSAGE FIELDS+| ||						 | |	                                                                                           +--this is for Multicast Group 1 and will repeat for each Multicast Group in the message---+             +-this is for the last Multicast Group (X) in the message-+|| ||						 | |									                           |																			       													                                                                                                          |             |																		                                                              |||					   		|--IP header--||--PIM Version--|--Type--|--Reserved--|--Checksum--| |--Upstream neighbour (encoded unicast address)--|--Reserved--|--Num Groups (N)--|--Hold Time--|-Multicast Group 1 (Encoded Group Format)-|--Number of Joined Sources--|--Number of Pruned Sources--|-Joined Source Address 1 (encoded source format)-|....|Joined source address (J) (encoded source format)|--Pruned Source Address 1 (encoded source format)|....|Pruned Source Address (P) (encoded source format)|.............|--Multicast Group X--|--Number of Joined Sources--|--Number of Pruned Sources--|--Joined Source Address 1--|....|--Joined source address (J)--|--Pruned Source Address 1--|....|--Pruned Source Address (P)--||| |---Addr Fam|--Enc Type--|--Unicast Addr-| 	                                          |-AddrFam-|-EncType-|-B-|-Rsvd-|-Z-|-MskLen-|-GrpMcastAddr-|                                                         |-AddrFam-|-EncType-|-Rsvd-|-S-|-W-|-R-|-MskLen-|-SrcAddr-|    |-AddrFam-|-EncType-|-Rsvd-|-S-|-W-|-R-|-MskLen-|-SrcAddr-|-AddrFam-|-EncType-|-Rsvd-|-S-|-W-|-R-|-MskLen-|-SrcAddr-|    |-AddrFam-|-EncType-|-Rsvd-|-S-|-W-|-R-|-MskLen-|-SrcAddr-| -> details -> IP -> Destination IP: 224.0.0.13 (All-PIM_Routers) -> Protocol: 103 (PIM) -> TTL: 1 -> PIM -> Version: 2 -> Type: 3 (Join/Prune) -> Upstream-neigbour -> has its own specific format of three fields -> |--Address Family--|-- Encoding Type--|--Unicast Address--| |                                                        |																	      +--Encoded Unicast Address+ -> Address family -> 1: IPv4 -> 2: IPv6 -> 0-255 values to accomodate the address families defined by IANA -> Encoding Type -> 0: Native -> Unicast Address -> the unicast address -> it is not in IP format. It's format is based on the Address Family and Encoding Type fields -> its purpose is to identify the upstream neighbour -> the flow of these PIM Join/Prune messages is towards the multicast source -> they are addresses (via this field, since the destination address is 224.0.0.13) to the upstream router -> especially imnportant in multi-access mediums where there may be multiple upstream neighbours possible. -> the address of the upstream neighbour is based on the unicast reverse path to the source -> for example -> routers A, B, C and D are on the same network segment -> router A forwards (S,G) traffic sourced in another network, in the said segment -> if router D wants to send a Prune message -> checks what router is the next hop on the unicast route towards S																																					 -> that must be the upstream neighbour for multicast stream then -> Num Groups (N) -> indicates the number of groups being signaled (N) -> mutiple groups can be signaled with a single message -> Hold time -> indicates how much time should the receiver of this message keep its interface in PRUNE state -> when the message is processed, the router should reset Prune Timer for the interface to the value of the Prune Holdtime received.

-> Multicast Group 1 -> uses Encoded Group Format -> |--Address Family--|--Encoding Type--|--B--|--Reserved--|--Z--|--Mask Len--|--Group Multicast Address--| |												         |															   +Encoded Group Address format--+

-> Address Family: same as for the other encoded fields -> Encoding Type: same as for the other encoded fields -> B -> flag that indicates that group range (defined by Mask Len + Group Multicast Address) should use Bidirectional PIM -> 0 for PIM-DM -> Z -> used in Bootstrap Router Mechanism only -> Mask Len -> has the same function as a Subnet Mask -> in combination with the Group Multicast Address defines a range of multicast addresses that the other fields (such as Number of Joined Sources) refer to															-> Group Multicast Address -> the multicast address of this particular instance (1) of Multicast Groups -> Number of Joined Sources -> this field comes immediately after an instance of Multicast Group (Encoded Group Format) fields and refers to that particular Multicast Group (1) only -> indicates the number of Joined Source Address instances (in the list) for that Multicast Group (1) -> in PIM-DM, this number is greater than 0 only during Prune Override operation -> this number is 0 (and there are no Joined Source Address fields) for Prune operations -> Number of Pruned Sources -> this field comes immediately after Number of Joined Sources (for a Multicast Group) and refers to that particular Multicast Group (1) only -> indicates the number of Pruned Source Address instances for that Multicast Group (1) -> it is greater than 0 for Prune operations -> Joined Source Address -> indicates that the router wants to receive a particular multicast stream (Multicast Group (1) ) sent by a particular source/list of sources [1 to J]

-> uses the Encoded Source Address format -> |--Address Family--|--Encoding Type--|--Rsrvd--|--S--|--W--|--R--|--Mask Len--|--Source Address--| |                                                                                               |															     		    +-Encoded Source Address format--+

-> Address Family: same as for the other encoded fields -> Encoding Type: same as for the other encoded fields -> S -> Sparse Bit. Indicates that the Protocol is PIM-SM -> W -> Wild Card Bit -> R -> Rendezvous Point Bit -> Mask Len -> describes a range of source addresses (Mask Len in combination with Source Address) -> PIM-DM sends a single Source Address per Joined Source Address field. Not a range -> Source Address -> the encoded source address indicated -> Pruned Source Address -> indicates that the router doesn't want to receive multicast traffic for a particular group (Multicast Group(N)) sent by a particular source/list of sources [1 to P]											      -> uses the Encoded Source Address format described above -> Multicast Group 2 to X -> as many Multicast Groups instances as needed, up to X -> each Multicast Group instance has its own set of Number of Joined Sources, Number of Pruned Sources, Joined Source Address (1 to J), Pruned Source Address fields that refer to it and dictates the actions of the Join/Prune message intended receiver. -> modes -> PIM-DM -> definition -> PIM Dense Mode -> it is a variant of the protocol which uses a 'flood-and-prune' mechanism -> designed for building Shortest Path Trees across a multicast enabled network -> there is no Rendezvous Point in a PIM-DM configured network -> mechanics -> PIM-DM neighbours -> PIM-DM uses Hello messages to form neighbour adjacencies -> Hello messages are sent -> every Hello_Period of 30 seconds -> at a random interval with a max limit of 5 seconds (Triggered_Hello_Delay). THis occurs when the device first boots, when PIM-DM is enabled on an interface, or in response to a received Hello message. -> When a Hello message is received, the router must update its state on the neighbour and reset the Neighbour Liveness Timer -> equal to the Hold Time received in the Hello message -> internal timer which indicates the maximum interval to wait for a Hello before considering the neighbour Dead and erasing the adjacency -> PIM-DM Designated Router -> a Designated Router is a router used to communicate with the Rendezvous Point (using PIM control messages) on behalf of the clients (multicast sources and receivers) -> a Designated Router (DR) is elected for every multi-access network segment -> if there are 4 routers in a network, it is enough for one of them to communciate with the RP (via the upstream router) -> in point-to-point segments there is no need for a DR, because PIM messages would be forwarded to the RP by both routers of the point-to-point link anyway. -> PIM-DM itself has no need for a DR because there is no RP to communicate with. -> However, one DR per multi-access segment is elected anyway to act as IGMPv1 Designated Querier. This election and the existence of a DR in PIM-DM is performed solely for the benefit of IGMPv1. -> For the election, PIM-DM uses Option 19 in the Hello message, with highest priority value preferred and the highest IP used for tie-breakers when priorities are equal. -> From a protocol standpoint, any ehternet interface, even if it directly connects two interfaces is considered a multi-access interface (maybe configurable? based on vendor) -> multicast tree creation -> PIM-DM is a 'dense' protocol which means that it implements a 'flood-and-prune' mechanism -> the assumption is that every router needs every multicast stream -> also known as the 'Push' or 'implicit Join' model -> all (S,G) traffic must be forwarded on every non-RPF multicast interface -> non-RPF because -> RPF interfaces are the multicast ingress interfaces (IIF) -> basically the multicast traffic is forwarded on all multicast-enabled non-ingress interfaces (basically OIL) -> the receiving routers will then prune themselves from the multicast tree by informing the upstream router to stop sending a particular (S,G) stream -> due to the model it uses and the fact tht there is no Rendezvous Point in PIM-DM, PIM-DM works with Source Distribution Trees (also called Shortest Path Trees SPT: described by (S,G) states and are rooted at the multicast source) -> overview -> a First Hop Router receives a multicast packet on the IIF -> the router performs the RPF check -> the router adds ALL OTHER dense-mode multicast interfaces to the OIL -> the router forwards the packet on all the interfaces in the OIL -> the process starts at the FHR (first hop from the source of the traffic) and repeats until the packets reach the Last Hop Routers (the leaves of the tree) -> if the LHRs have receivers registered (statically or IGMP) for the group, they just forward the traffic towards the receivers. No specific PIM action is required. -> reactive SPT pruning -> if the LHRs have no receivers registered for the group, they send a Prune message on the IIF -> the upstream neighbour is indicated in the Prune message -> when it receives the message it will place the (forwarding) interface in a Pruned state -> if all interfaces in the OIL of a router were Pruned, then the router will send a Prune message of its own on the IIF and the process repeats -> if a multicast packet does not pass the RPF check, a Prune message for that group is sent on the receiving interface -> taking these actions, the overall effect is that a multicast tree will be Pruned all the way back. -> if on a point-to-point interface, a router receives a multicast packet that fails the RPF check, then that packet is a copy and a Prune message is sent to the other router that forwarded the packet. This process occurs on both routers. -> if on a multi-access interface a router receives a multicast packet that fails the RPF check -> there is another router on the multi-access segment that forwarded the packet -> the initial process of Pruning is already done and there are receivers, otherwise the packet wouldn't be received -> this means there are multiple routers forwarding the packet. The Assert process takes place and a Designated Forwarded is selected. -> selective Prune Overrides -> on a multi-access interface, a Prune from a disinterested downstream neighbour may affect the traffic being received by the other interested neighbours. -> this is due to the Prune message sent by the downstream uninterested neighbour, to the upstream router, which would result in the upstream router's forwarding interface being placed in a Pruned state. -> the Prune message is sent to the All PIM Routers multicast IP and is received and processed by all PIM-DM routers on the multi-access interface -> the neighbour(s) interested in the multicast traffic also react to this message. -> they will send an explicit Join message on the IIF, towards the upstream neighbour, thus nullifying the Prune message. -> this action is called 'Prune Override' -> before Pruning the interface, the upstream router uses the Prune Pending Timer (of value J/P_Override_Interval), in order to give the interested routers enough time to sent the Join message and override the Prune. -> before downstream routers send Join messages to the upstream router to Override a Prune, they have to wait a Override Timer (set to t_override) to expire, to avoid unnecessary Joins being sent by all routers in a LAN, since a single Join is sufficient to override a Prune -> in a multi-access network, the uninterested neighbour will have to wait PruneLimit Timer (default 210s) before it can send another Prune message. -> there is an Option that can be configured, called Prune Propagation Delay on LANs -> if all routers on a LAN supports it, it can be used. -> allows syncing between routers of the Override Intervals -> also allows specifying a Propagation Delay value which expresses the time that it takes a Prune message, sent from a downstream router, to reach the upstream router -> Propagation Delay is used to fine tune the time itnervals -> the Assert mechanism -> on a multi-access interface with two or more routers, there is a potential of multiple multicast packets being delivered. -> to resolve this issue, one out of the available routers is chosen as Designated Forwarded, using the Assert mechanism. -> if a router receives a multicast packet on the multi-access interface that fails the RPF check, it responds with a PIM Assert message -> format ->

+PIM ASSERT message format-+ |                                                                                                               										       |						|-IP header-||--PIM Version--|--Type--|--Reserved--|--Checksum--|--Multicast Group Address (Encoded Group format)--|--Source Address (Encoded Unicast Format)---|--R--|--Metric Preference--|--Metric--| |                                                 |-AddrFam-|-EncType-|-B-|-Rsvd-|-Z-|-MskLen-|-GrpMcastAddr-|---AddrFam|--EncType--|--UnicastAddr| +---PIM Header-- common to all PIM messages+

-> details -> Version: 2 -> Type: 5 (PIM Assert) -> multicast group address -> in encoded group format -> identifies the group address that the message refers to													   -> details described above -> source address -> the source address identifies the source of the multicast stream, in encoded unicast format -> R -> Rendezvous Point flag -> ignored in PIM-DM -> Metric Preference -> Route Preference, based on the unciast protocol that learned the route for the RPF check -> Cisco calls this 'Administrative Distance' -> Metric -> Metric of the route to the multicast source

-> the router selected as the Designated Forwarded has the lowest route preference OR the lowest metric OR the highest IP. -> Assert messages will be periodically send (Assert Timer) to refresh the Assert State. -> the loser routers -> have to prune the forwarding interface and send a Prune message on the multi-access segment, destined to the Assert Winner where they lost the Assert (not the upstream segment) -> store the metric and address of the Assert Winner -> start the Assert Timer -> The Prune Timer (Prune Holdtime) in the Prune message is equal to this Assert Timer -> the Assert Timer is used to time out the Assert state and the whole Assert process will be retaken. -> the winner router -> stores its own address and metric -> sets the Assert Timer -> it is preferable that a smaller Assert timer is set on the Winner to avoid the expiration of the Assert Timer on the losers and the selection of the Assert winner -> steady state and scalability issues -> periodic flooding -> each interface on a multicast tree that has been pruned has an expiration timer (Prune Timer which takes the value Prune Holdtime from the Prune message received from the downstream neighbour). -> once the timer expires, the state of the interface is moved to Forwarding. -> if the Source is still active, multicast traffic floods the whole network again. -> this flooding triggers the Prune, Prune Override and Assert mechanisms again. -> the topology is once again pruned as needed. -> process repeats every time the timer expires. -> another scaling drawback of PIM-DM is the amount of state information that must be kept by each multicast router (for each forwarding state, for each neighbour, etc.) -> PIM Graft -> there is a good reason to keep all the state information, although a drawback -> PIM-DM depends on this state maintenance to allow for impromptu joins -> a router that has pruned itself from a tree may have a new receiver, thus it must reattach back to the tree -> but no rejoin (graft) to a tree is performed if the router doesn't know about that tree in the first place. -> PIM-DM doesn't have a mechanism to discover trees so it depends on these floods to refresh (S,G) states. -> PIM Graft is a mechanism used by a router to reattach back to a tree. -> the router sends a PIM Graft message to its upstream neighbour -> upstream neighbour moves the incoming interface to the Forwarding state -> replies with a Graft Ack, to make the process reliable -> repeats the Graft process further upstream as necessary -> if a Graft Ack is not received in response to a Graft, another Graft message will be sent after a short delay (default 3s, Graft Retry Timer) -> PIM Graft message format has the same format as the Join/Prune message with a few exceptions -> IP -> Destination: unicast address of the upstream neighbour -> PIM -> Type: 6 (PIM GRAFT) -> the source address of the multicast stream MUST be in the Joined Source Address field -> Hold Time field is set to 0 and ignored on reception. -> PIM Graft Ack message sent in response to a Graft message, must be identical with the Graft message, with a few exceptions -> PIM -> Type: 7 (PIM GRAFT ACK) -> Upstream neighbour address field contains the address of the Graft sender -> State Refresh -> the flooding occurs so that forwarding states are kept in the routing table -> a forwarding state is created when a multicast packet is received on the IIF -> flooding happens just to force the creation of the state, but since the state already exists, it will just refresh it -> State Refresh is a mechanism that esentially resets the Prune Timer on a pruned (S,G) state without the unnecessary flooding -> a First Hop Router (originator) with an active source will generate a State Refresh message every RefreshInterval (default 60s. The timer is called 'State Refresh Timer') -> each Middle Hop Router will -> refresh the timer on the Pruned state interfaces (will refresh the whole state, not just the prune interfaces, but the prune ones are the msot important) -> send it down the OIL regardless of the state of the interfaces prune or forward -> the final result is the same as it would be if the multicast flood had occurred -> the state refresh process starts with the First Hop Router, which is the router closest to the multicast source -> this router is also called 'originator' router -> Originator router keeps a 'Source Active Timer' which is reset every time a multicast packet for that (S,G) is received. When the timer expires, the source is considered dead. -> in a multi-access segment, this message is forwarded by the Assert winner only -> format:

|--IP--|--PIM Ver--|--Type--|--Reserved--|--Checksum--|--Multicast Group Address (Encoded Group Format)--|---Source Address (Encoded Unicast Format)---|---Originator Address (Encoded Unicast Format)--|--R--|--Metric Preference--|--Metric--|--Masklen--|--TTL--|--P--|--N--|--O--|--Reserved--|--Interval--| |-AddrFam-|-EncType-|-B-|-Rsvd-|-Z-|-MskLen-|-GrpMcastAddr-|---AddrFam|--EncType--|--UnicastAddr-|---Addr Fam|--Enc Type--|--Unicast Addr-|

-> details -> PIM -> Version: 2 -> Type: 9 -> Multicast Group Address | indicates the (S,G) -> Source Address         | -> Originator Address -> the encoded address of the First Hop Router -> R -> Rendezvous Point Bit -> ignored for PIM-DM -> Metric Preference | same as in the case of										 -> Metric           |  Assert message -> MaskLength -> TTL -> Time-To-Live -> it is decremented each time the packet is routed (which means that its value has to be greater than 1) -> IP TTL is always set to 1 for PIM messages -> P -> prune indicator flag -> set to 1 if the State Refresh is sent on a Pruned interface -> this happens in order to correct any error in the process (such as an upstream router having an interface pruned which in fact should be forwarding. In this case the downstream router would be able to react and send a graft) -> N -> prune now flag -> used for compatibility with earlier versions of the State Refresh mechanism -> O -> Assert Override flag -> used for compatibility with earlier versios of the State Refresh mechanism -> Interval -> indicates the interval between state refresh messages -> set by the originator router

-> notes -> timers -> when reading the RFC, note that different names are used for the timer itself and the values they can take. For example -> "Hello Timer" is a timer. When it expires, a Hello message will be sent. There are two values this timer can take -> "Hello_Period": default 30s, is the usual interval during normal PIM activity -> "Triggered_Hello_Delay": default: between 0 and 5s, used during certain events such as device boot up					 -> a timer must exist for a forwarding state to expire -> the rfc3973 specifies no such timer -> it would probably be a good idea that the state expiry timer should be synced with the Prune Timer -> maybe a couple of seconds bigger -> if traffic still flows when the Prune Timer expires, the flooding would occur -> since the flooding occurs before the state expires, then flooding would jsut refresh the timer -> if the traffic isn't flowing anymore when the Prune Timer expires, then there is nothin to be flooded -> this would cause the state timer on all routers to expire -> router failures -> a mechanism to handle router failures is Generation ID which, if changes across Hellos, triggers specific actions -> if a router fails, the unicast protocol would change the unicast routes, which implies changing the reverse path to the source. When the reverse path to S changes, PIM takes specific actions. -> PIM-SM -> PIM Sparse Mode -> Bidir-PIM -> Bidirectional PIM -> PIM-SSM -> PIM Source-Specific Multicast

-> PIM timers -> what happens if a PIM node dies I -> if a neighbour dead timer expires, a cancel assert timer will be sent and the assert negotiation will be retaken -> if a winner is being taken down, it must send a cancel assert timer to force the other routers to retake the assert negotiation

-> Override TImer (OT) set to t_override value. Folosit de un downstream cand primeste un Prune adresat upstream. Cand timerul expira, va trimite Join catre upstream -> PrunePending Timer (PPT) set to J/P_Override_Interval. Folosit de upstream la primirea de Prune pentru a permite downstream sa trimita Join -> Prune Propagation Delay is a value that expresses the time that it takes a sent PRune message (by a downstream router) to reach the upstream router

-> references -> https://www.rfc-editor.org/rfc/rfc3973.txt