X-Git-Url: https://gerrit.fd.io/r/gitweb?a=blobdiff_plain;f=src%2Fplugins%2Flb%2Flb_plugin_doc.md;h=25a4cfa11dfd2cbfaffe3704ee60230a4db64450;hb=8c8acc0;hp=c7885ffb837e26956fd2ffa68ce2594c98f6dae5;hpb=cb034b9b374927c7552e36dcbc306d8456b2a0cb;p=vpp.git diff --git a/src/plugins/lb/lb_plugin_doc.md b/src/plugins/lb/lb_plugin_doc.md index c7885ffb837..25a4cfa11df 100644 --- a/src/plugins/lb/lb_plugin_doc.md +++ b/src/plugins/lb/lb_plugin_doc.md @@ -8,19 +8,52 @@ Wich also means feedback is really welcome regarding features, apis, etc... ## Overview -This plugin provides load balancing for VPP in a way that is largely inspired +This plugin provides load balancing for VPP in a way that is largely inspired from Google's MagLev: http://research.google.com/pubs/pub44824.html -The load balancer is configured with a set of Virtual IPs (VIP, which can be +The load balancer is configured with a set of Virtual IPs (VIP, which can be prefixes), and for each VIP, with a set of Application Server addresses (ASs). +There are four encap types to steer traffic to different ASs: +1). IPv4+GRE ad IPv6+GRE encap types: Traffic received for a given VIP (or VIP prefix) is tunneled using GRE towards -the different ASs in a way that (tries to) ensure that a given session will +the different ASs in a way that (tries to) ensure that a given session will always be tunneled to the same AS. +2). IPv4+L3DSR encap types: +L3DSR is used to overcome Layer 2 limitations of Direct Server Return Load Balancing. +It maps VIP to DSCP bits, and reuse TOS bits to transfer DSCP bits +to server, and then server will get VIP from DSCP-to-VIP mapping. + Both VIPs or ASs can be IPv4 or IPv6, but for a given VIP, all ASs must be using -the same encap. type (i.e. IPv4+GRE or IPv6+GRE). Meaning that for a given VIP, -all AS addresses must be of the same family. +the same encap. type (i.e. IPv4+GRE or IPv6+GRE or IPv4+L3DSR). +Meaning that for a given VIP, all AS addresses must be of the same family. + +3). IPv4/IPv6 + NAT4/NAT6 encap types: +This type provides kube-proxy data plane on user space, +which is used to replace linux kernal's kube-proxy based on iptables. + +Currently, load balancer plugin supports three service types: +a) Cluster IP plus Port: support any protocols, including TCP, UDP. +b) Node IP plus Node Port: currently only support UDP. +c) External Load Balancer. + +For Cluster IP plus Port case: +kube-proxy is configured with a set of Virtual IPs (VIP, which can be +prefixes), and for each VIP, with a set of AS addresses (ASs). + +For a specific session received for a given VIP (or VIP prefix), +first packet selects a AS according to internal load balancing algorithm, +then does DNAT operation and sent to chosen AS. +At the same time, will create a session entry to store AS chosen result. +Following packets for that session will look up session table first, +which ensures that a given session will always be routed to the same AS. + +For returned packet from AS, it will do SNAT operation and sent out. + +Please refer to below for details: +https://schd.ws/hosted_files/ossna2017/1e/VPP_K8S_GTPU_OSSNA.pdf + ## Performances @@ -35,34 +68,42 @@ in next versions. The load balancer needs to be configured with some parameters: - lb conf [ip4-src-address ] [ip6-src-address ] + lb conf [ip4-src-address ] [ip6-src-address ] [buckets ] [timeout ] - -ip4-src-address: the source address used to send encap. packets using IPv4. -ip6-src-address: the source address used to send encap. packets using IPv6. +ip4-src-address: the source address used to send encap. packets using IPv4 for GRE4 mode. + or Node IP4 address for NAT4 mode. + +ip6-src-address: the source address used to send encap. packets using IPv6 for GRE6 mode. + or Node IP6 address for NAT6 mode. buckets: the *per-thread* established-connexions-table number of buckets. -timeout: the number of seconds a connection will remain in the +timeout: the number of seconds a connection will remain in the established-connexions-table while no packet for this flow is received. - ### Configure the VIPs - lb vip [encap (gre6|gre4)] [new_len ] [del] - + lb vip [encap (gre6|gre4|l3dsr|nat4|nat6)] \ + [dscp ] [port target_port node_port ] [new_len ] [del] + new_len is the size of the new-connection-table. It should be 1 or 2 orders of magnitude bigger than the number of ASs for the VIP in order to ensure a good load balancing. +Encap l3dsr and dscp is used to map VIP to dscp bit and rewrite DSCP bit in packets. +So the selected server could get VIP from DSCP bit in this packet and perform DSR. +Encap nat4/nat6 and port/target_port/node_port is used to do kube-proxy data plane. Examples: - + lb vip 2002::/16 encap gre6 new_len 1024 lb vip 2003::/16 encap gre4 new_len 2048 lb vip 80.0.0.0/8 encap gre6 new_len 16 lb vip 90.0.0.0/8 encap gre4 new_len 1024 + lb vip 100.0.0.0/8 encap l3dsr dscp 2 new_len 32 + lb vip 90.1.2.1/32 encap nat4 port 3306 target_port 3307 node_port 30964 new_len 1024 + lb vip 2004::/16 encap nat6 port 6306 target_port 6307 node_port 30966 new_len 1024 ### Configure the ASs (for each VIP) @@ -77,8 +118,18 @@ Examples: lb as 2003::/16 10.0.0.1 10.0.0.2 lb as 80.0.0.0/8 2001::2 lb as 90.0.0.0/8 10.0.0.1 - - + +### Configure SNAT + + lb set interface nat4 in [del] + +Set SNAT feature in a specific interface. +(applicable in NAT4 mode only) + + lb set interface nat6 in [del] + +Set SNAT feature in a specific interface. +(applicable in NAT6 mode only) ## Monitoring @@ -88,7 +139,7 @@ These are still subject to quite significant changes. show lb show lb vip show lb vip verbose - + show node counters @@ -96,9 +147,9 @@ These are still subject to quite significant changes. ### Multi-Threading -MagLev is a distributed system which pseudo-randomly generates a -new-connections-table based on AS names such that each server configured with -the same set of ASs ends up with the same table. Connection stickyness is then +MagLev is a distributed system which pseudo-randomly generates a +new-connections-table based on AS names such that each server configured with +the same set of ASs ends up with the same table. Connection stickyness is then ensured with an established-connections-table. Using ECMP, it is assumed (but not relied on) that servers will mostly receive traffic for different flows. @@ -124,8 +175,8 @@ When an AS is removed, there is two possible ways to react. - Keep using the AS for established connections - Change AS for established connections (likely to cause error for TCP) -In the first case, although an AS is removed from the configuration, its -associated state needs to stay around as long as it is used by at least one +In the first case, although an AS is removed from the configuration, its +associated state needs to stay around as long as it is used by at least one thread. In order to avoid locks, a specific reference counter is used. The design is quite