460 likes | 623 Views
Building Highly Available Wireless Infrastructures. Module 4. Agenda. Troublemakers Background: Alcatel network architecture High availability - pieces of the solution Putting it all together. High Availability – Self-Healing Wi-Fi. Mobility controller detects AP failure. x.
E N D
Agenda Troublemakers Background: Alcatel network architecture High availability - pieces of the solution Putting it all together
High Availability – Self-Healing Wi-Fi Mobility controller detects AP failure x
High Availability – Self-Healing Wi-Fi • Controller automatically reconfigures AP to extend coverage to compensate • Plug-and-play APs download original config
Agenda Troublemakers Background: Alcatel network architecture High availability - pieces of the solution Putting it all together
Troublemakers Why designing High Availability Wireless Networks is a challenge? • Many moving parts to deal with: • Access points, Stations • Switches • Authentication and other external servers • + the usual fixed network issues • Configuration synchronization & distribution • Master - local Alcatel architecture • User state information required in multiple nodes • Minimize switchovers • Mobility • Add more restrictions on network design
Agenda Troublemakers Background: Alcatel network architecture High availability - pieces of the solution Putting it all together
Background - Master/Local What you need to know about Alcatel architecture • Master switch has the global configuration and pushes it to Local switches and APs • Master propagates information required for mobility and controls the global dynamic radio network data • Local switches are configured for local parameters: VLANs, IPs, NAT pools, DHCP pools, VPN pools, L2 firewall rules, Logging levels, Logging and trap servers, non-RADIUS admin users, etc. • What happens when Master is down • Global configuration cannot be changed • Mobility is not as smooth; new IP address will have to be acquired more often • Local switches can be rebooted and be fully functional; will use the local config to setup ports and then try to reach the master switch to get the latest global config. If it fails (about 4 minutes), it will use the last saved global config received in the last snapshot • APs can boot from a Local switch (local bootstrapping)
Background - APs AP initial boot process • AP boots, gets IP from static configuration in the flash or DHCP • AP needs to find a master switch: • Static configuration in the flash (master) • DHCP option 43 • ADP • DNS • Keep in mind that the IP address obtained for the master will be used for AP bootstrapping - This IP address must be up for APs to boot! • Couple of notes on local bootstrapping: • Some options for finding a master switch are more universal throughout an enterprise network: DNS and DHCP • ADP will only work at Layer 2 • Static configuration in flash works great, but requires you configure each AP (time consuming and costly)
Background - AP bootstrapping Knowing how AP boots is key to high availability design • An AP gets its configuration and connects with its switch and sets its GRE tunnel before operating (each SSID creates a tunnel) • Anytime an AP-switch tunnel goes down (bootstrap-threshold is 7 heartbeats by default ), the AP goes through the whole bootstrapping process. • Basic steps are: • Contact the master (or any switch with local bootstrapping) • Verify image version; download & reboot as needed • Get the configuration for the AP location • Contact the switch at <lms-ip> address • Set GRE tunnel • Turn Radio on • Start Heartbeats with the switch • When loosing connectivity with the switch - go back to step 1 • Quickest AP recovery, set bootstrap- threshold to 3 or 4, leave at default (7) if problems are encountered
Agenda Troublemakers Background: Alcatel network architecture High availability - pieces of the solution Putting it all together
High Availability – AP Interleaving GRE Tunnels A picture is worth 1000 words…. IP Network Floor 1 Floor 2
High Availability – AP Interleaving • Basics: • The idea is simply to interleave APs between 2 switches • If a switch goes down all its APs go too • Mobility is key for for this deployment to work: users are constantly roaming betweenthe switches • After a switch failure, self-healing will update APs for the new radio environment situation • When to use it? • Robust: no moving parts here; Nothing needs to happen immediately when one switch fails • Customer requires all switches be in use and fully populated with APs (No spares using VRRP) • All APs go down with a switch • Management is less straightforward • Creates a lot of mobility events, harder to keep track of users in the system • Addressing constraints, see mobility later
High Availability – APs using a backup switch What is it; how does it work? • APs are provisioned with a primary switch IP address (lms-ip) and a backup (bkplms-ip) • When bkplms-ip is configured, this modifies the AP bootstrap process: • Everything works as previously described • If the GRE tunnel with the primary switch goes down • AP does not re-bootstrap at this point, it tries to establish GRE tunnel with configured backup • A bit of optimism: AP re-establishes tunnel with backup switch and starts heartbeats with backup switch • Backup switch goes down; AP re-bootstraps at this point • Observations: • AP’s will failover from primary to backup without going back to master • AP’s will need a master switch available to re-bootstrap if tunnel with backup switch fails
High Availability – APs using a backup switch Primary Backup lms-ip =172.16.1.0/24 bkplms-ip =172.16.2.0/24 Location 1.1.1 Location 1.1.3 Location 1.1.2 IP
Configuring – APs using a backup switch • Master Switch GUI – Configuration > WLAN > Advanced > General • Specify AP location code for AP’s that will be using a backup switch • You can use wild card entry to configure many APs at once: • Global – 0.0.0 • Building – 1.0.0 (example shown) • Floor – 1.1.0 • Single AP – 1.1.2
High Availability - Active/Standby N to 1 L2 Active Switch 172.16.100.1 Standby Switch 10.10.1.1 Active Switch 192.168.1.1 L3 Switches reside in the different subnets, but users have access to data VLANs • Local Switches – L3 model VLAN 4 VLAN 5 VLAN 4 & 5 VLAN 4 & 5 Location 1.1.0 lms-ip = 172.16.100.1 bkplms = 10.10.1.1 Location 1.2.0 lms-ip = 192.168.1.1 bkplms = 10.10.1.1
High Availability – Load Balance Switch Failure Switch S1 Switch S2 Switch S3 10.10.1.1 192.168.1.1 172.16.1.1 IP Location 1.1.0 Location 3.1.0 Location 3.2.0 Location 1.2.0 lms-ip = S1 bkplms-ip = S2 lms-ip = S3 bkplms-ip = S1 Location 2.1.0 lms-ip = S2 bkplms-ip = S3 Location 2.2.0 • AP’s are load balanced across multiple Active Switches • - Remember that a switch has a max AP count
High Availability – AP’s using a backup switch When to use it? • What is good about it? • Little additional configuration is needed to make it work • The primary and backup switches do not have to be in the same subnet! Main reason to use this mode • What is not so good about it? • Each AP is individually programmed to go only to one other switch • After a failover, APs never go back to the primary switch until another failure happens • No way to decide if APs should go to another switch, because of connectivity issues with back bone (see tracking later) • Note on switch placement: • Both switches need to be in the same broadcast domain or at least they both have to have the access to same user VLANs otherwise the users will be unreachable on failover
VRRP – Virtual Routing Redundancy Protocol RFC 3768 • VRRP in 3 bullets: • Provides a protocol to elect a master among multiple devices mostly based on a priority scheme and some conventions • Provides a way for the elected master to take over an IP address • Correspondent nodes will use this VRRP IP address to reach resource • Was designed to allow hosts to point their default gateway at some fixed IP and not run routing (less chatty) • Multiple VRRP groups on one node • N members of one group, not just 2
High Availability - VRRP and APs Basics • We use standard VRRP to elect a master on a subnet shared between 2 or more switches • The elected master gains ownership of the VRRP IP address • APs are provisioned with lms-ip set to the VRRP IP address • Note: Don’t be confused by the VRRP definition of “master” as it relates to the switch that assumes ownership in a VR group. It can refer to Alcatel Local switches configured for VRRP and doesn’t mean an Alcatel switch configured as a Master switch
High Availability – VRRP Failover What happens when the active switch fails: • Those 2 things start to happen in parallel: • Standby switch(es) will start missing advertisements from active switch and will transition to active state • The APs will start missing heart beats with the serving switch • The APs give up, shut down radios, and re-bootstrap • Assuming no change to configuration (on Master), they will just try to re-establish tunnel with lms-ip, which is now moved to another switch • On success, APs turn radios back on and accept stations • Impact on users: • Total duration of process: Typically, 7 to 15 seconds • With proper network design, stations will get back same IP address • With transparent authentication, users will not even notice
High Availability – VRRP Active/Standby VLAN 1 IP =172.16.100.1 VR ID 1, Active VR IP = 172.16.100.254 VLAN 1 IP = 172.16.100.2 VR ID 1, Standby VR IP = 172.16.100.254 L2 IP Standby switch doesn’t serve APs or clients unless there is a failure Location1.1.1 lms-ip= 172.16.100.254 Local Switch - Active/Standby, 1 to 1
Configuring – VRRP Active/Standby 172.16.100.1 172.16.100.2 • GUI - Configuration>Switch>VRRP>Add Virtual Router • Specify VR ID, VR IP address, and VLAN ID on both the Active and Standby switches (defaults: 1 sec adv interval, no password) • To ensure a switch controller is selected as the Active switch, you can optionally configure a high priority value (set value to 110 on Active switch and100 on the Standby switch) • Specify AP location code for AP’s that will be using a VR IP address • Again, you can use wild card entry to configure many APs at once: • Global – 0.0.0 • Building 1.0.0 (previous example shown) • Floor 1.1.0 • Single AP 1.1.2
High Availability – VRRP Active/Active Trunk Port VLAN 1 and 50 VLAN 1 IP =172.16.100.2 VR ID 1, Standby VR IP = 172.16.100.254 VLAN 50 IP = 192.168.1.2 VR ID 50, Active VR IP = 192.168.1.254 VLAN 1 IP =172.16.100.1 VR ID 1, Active VR IP = 172.16.100.254 VLAN 50 IP = 192.168.1.1 VR ID 50, Standby VR IP = 192.168.1.254 L2 lms-ip= 192.168.1.254 lms-ip= 172.16.100.254 IP Locations 1.2.0 Locations 1.1.0 Local Switches - Active/Active, 1 to 1
Configuring VR ID 1 – VRRP Active/Active • GUI - Configuration>Switch>VRRP>Add Virtual Router • Specify VR ID, VR IP address, and VLAN ID on both the Active and Standby switch for VLAN 1 • Enable Router Pre-emption and set a higher priority number of 110 (default =100) for the Active switch (elected “Master VRRP router”) 172.16.100.2 172.16.100.1 • Specify AP location code for AP’s that will be using a VR IP address • Again, you can use wild card entry to configure many APs at once: • Building 1.0.0 • Floor 1.1.0 (specify lms-ip for each floor) • Single AP 1.1.2
Configuring VR ID 50 – VRRP Active/Active • GUI - Configuration>Switch>VRRP>Add Virtual Router • Specify VR ID, VR IP address, and VLAN ID on both the Active and Standby switch for VLAN 50 • Enable Router Pre-emption and set a higher priority number for the Active switch 192.168.1.1 192.168.1.2 • Specify AP location code for AP’s that will be using a VR IP address • Again, you can use wild card entry to configure many APs at once: • Building 1.0.0 • Floor 1.2.0 (specify lms-ip for each floor) • Single AP 1.1.2
High Availability - VRRP Active/Standby N to 1 VLAN 1 = 172.16.100.3 VR ID 1, Standby VR IP = 172.16.100.254 VR ID 2, Standby VR IP = 172.16.100.253 Standby Switch VLAN 1 L2 VLAN 1 IP = 172.16.100.1 VR ID 1, Active VR IP = 172.16.100.254 VLAN 1 IP = 172.16.100.2 VR ID 2, Active VR IP = 172.16.100.253 VLAN 1 VLAN 1 Location 1.1.0 lms-ip = 172.16.100.254 Location 1.2.0 lms-ip = 172.16.100.253 L3 network Local Switches – L2 model
High Availability - VRRP Active/Active N to N IP Local Switches - Active/Active, N to N Split the load on N switches on failover VRID 3, active VRID 4, active VR ID 1, backup VR ID 6, backup VR ID 1, active VR ID 2, active VR ID 3, backup VR ID 5, backup VR ID 5, active VR ID 6, active VR ID 2, backup VR ID 4, backup APs connecting to a switch are split in 2 lms-ip groups Uses AP interleaving lms-ip= VR 1 lms-ip= VR 2
High Availability - Master Switch What happens if you lose the Master Switch? • A Master switch acts as a central controller and configuration point for an Alcatel network made up of 1 switch or a number of local switches. • If you lose the master, the only functionality that is lost is the ability to change global configuration such as RF and security parameters, access the local database, perform machine authentication, secureID caching and monitor the Alcatel network • In the absence of the master switch, the network can still serve the clients on all local switches • You can configure an Alcatel switch to back up a master switch and take over the responsibilities of the master switch in the event of a failure
High Availability - Master Switch Masters have special requirements • Master switch can be backed up with VRRP • A VR ID is designed as the one that will elect the master switch • The local switches then set their master IP address to the master VR IP • If APs are provisioned with a master IP address, it must be the VR IP • Same if master IP is distributed with DNS or DHCP • ADP will handout the master VR IP when defined • When all properly configured, the whole network sees only the active master • Unlike Local switches, active and backup switches behave differently: • Active master synchronizes with standby Master • Configuration changes (real time) • RF Plan data (floor plans, etc - optional) • WMS database (Channel and power assignment – set by interval timer) • Local user database (set by interval timer) • Recommended for system performance that synchronization be performed at a frequency greater than every 30 minutes, default is 60 minutes. • Design restriction - standby master switch will not handle AP in standby mode
High Availability - VRRP Master Backup Master Switch - Active/Standby, Synchronized Database VLAN 1 IP = 172.16.100.2 VR ID 1, Standby VR IP = 172.16.100.254 VLAN 1 IP =172.16.100.1 VR ID 1, Active VR IP = 172.16.100.254 L2 IP The standby switch doesn’t serve APs or clients unless there is a failure Location1.1.1 lms-ip= 172.16.100.254
Configuring – VRRP Master Backup • GUI - Configuration>Switch>VRRP>Add Virtual Router • Specify VR ID, VR IP address, and VLAN ID on both the Active and Standby switches (defaults: 1 sec adv interval, no password) • Set a higher priority number of 110 (default =100) for the Active switch (elected “Master VRRP router”) 172.16.100.1 172.16.100.2 Enable periodic database synchronization Default = 60 minutes
High Availability - VRRP tracking Keeping all the pieces together • Need more control on failover - Unlike traditional VRRP used to link hosts to default gateways, wireless switches have state • Tracking can be used to dynamically change the VRRP priority on a given VR ID • Config:(Alcatel4324) (config-vrrp)#tracking vrrp-master-state 1 add 50 • Means that a priority of 50 should be added to the configured priority if the switch becomes the master on VRID 1 • Tracking can also be used in the following ways: • VR ID up time tracking - Assumption is that state builds over time, might not want to failover back to preferred switch as soon it comes back up • Config:(Alcatel4324) (config-vrrp)#tracking master-up-time 30 add 20 • Means that a priority of 20 should be added to the configured priority if the switch stays as the Master for this VRRP instance for 30 minutes or more.
High Availability – Router Preemption • What is Router Preemption? • When an Alcatel switch controller detects a master VRRP router with a lower priority than it has, the VRRP router may either choose to leave the current master alone or take over the current master and become the master itself. • When preemption is enabled, a VRRP router always preempts or takes over the responsibility of the master router. • When preemption is disabled, the lower-priority VRRP router is left in the master state. • When to use Router Preemption • Local VRRP – when configured Active/Active • Master VRRP – when tracking is not enabled
High Availability - Mobility Adds even more fun! • Basics: • Alcatel’s mobility solution is based on enhanced mobile IP (RFC 3220) • The station first joins the network at some switch and acquires an IP address (DHCP) • This initial switch becomes this station fixed point of attachment (HA - Home Agent) • As the station moves around, it roams on other switches (FA -Foreign Agent) • Roaming behaviors: • Stations obviously try to keep their IP address • Better, as stations associate to APs on the same ESSID, they expect they will keep IP address • Stations state is maintained in L2 and L3 portion of the switch and also distributed to other switches • Alcatel’s mobility keeps track of the in-service state of Home Agent switches • Using more than one switch to provide service on a VLAN/subnet helps minimizing outages.
High Availability - Mobility 1 ) Client associates with local switch (Home Agent) 2 ) Home Agent reports client details to master 3 ) Master distributes client details 3 to other locals in the domain 3 MAC/IP/VLAN/HA 1 Roaming Client Associates with HA Master 2 Local Local Local
High Availability – Inter-Switch Mobility 1 ) Client roams to different switch ( foreign agent FA ) 2 ) FA recognizes client 3 ) FA builds tunnel to HA 4 4 ) Client’s traffic tunneled 3 through HA to destination FA HA 2 1 Direction of Roam Master Local Local Local
Home Agent (HA) point 1 Foreign Agents (FA) roaming points …2…3 High Availability – Layer 3 Mobility 10.1.1.1 FA IP - IP Tunnel Subnet 1 Subnet 2 Subnet 3 10.1.1.1 10.1.1.1 DHCP IP Network HA FA IP - IP Tunnel 10.2.2.0/24 10.3.1.0/24 10.1.1.0/24 User data forwarded to HA
High Availability - Various other issues • Local user database • If used on the master switch, gets replicated on standby master • DHCP server • Avoid integrated server, state not replicated • External servers • Redundant access to Authentication servers • Local configuration of Authentication servers • Reliable DHCP server access is more critical for the wireless • Local bootstrapping
High Availability - Review • AP Interleaving • Pro – Simple, inexpensive, target small networks (2 switches) • Con – Harder to manage, creates lots of mobility events • AP using Backup Switch • Active/Standby, 1 to 1 and N to 1 • Pro – Simple to configure, can load balance across multiple active switches, standby switch can reside in different subnet, target medium to large networks (2 or more switches) • Con – Harder to manage, expensive (need spare switch, load balancing limits AP switch loading), APs can only be directed to one (1) backup switch, All switches need access to client termination VLANs
High Availability – Review (continued) • VRRP – Local Switch • Active/Standby, 1 to 1 • Pro – Quick cutover time (7-15 sec), easy to manage and configure, target smallnetworks with high availability applications • Con – Expensive (spare switch, no active APs), switches need L2 connectivity • Active/Active, 1 to 1 • Pro – Quick cutover time (7-15 sec), all switches are used, target small networks with high availability applications • Con – Expensive (can’t fully load a switch with APs), more difficult to configure, switches need L2 connectivity • Active/Standby, N to 1 • Pro - Quick cutover time (7-15 sec), target large networks with high availability applications • Con – Expensive (spare switch, no active APs), difficult to configure, switches need L2 connectivity • Active/Active, N to N • Pro – Quick cutover time (7-15 sec), all switches are used, target large networks with high availability applications • Con – Expensive (can’t fully load a switch with APs), difficult to configure, switches need L2 connectivity • VRRP – Master Switch • Pro – Seamless cutover time, easy to manage and configure, target anynetworks with high availability applications requiring backup of global configurations and database synchronization • Con – Expensive (spare switch, no active APs), switches need L2 connectivity
Agenda Troublemakers Background: Alcatel network architecture High availability - pieces of the solution Putting it all together
High Availability – Putting it together #1 Local VR ID 2 - active VR ID 1 - standby Master VR ID 1 - active VR ID 2 - standby Server by Master VR ID 1 Served by Local VR ID 2 Small network with VRRP, Active/Active • Will not back up Master functionality (configuration) • Maximum AP count per switch in this example = 24 (OAW-4324)
High Availability – Putting it together #2 Masters; Active/Standby, VRRP, no APs 4324 4324 L2 Access Backup for N. Has all local switch VLANs Locals; N to 1. Point to master VR IPMax AP count = Max AP per each Switch 6000 6000 6000 6000 L3 Access to AP APs pointing to a local VR IP Large Network