Quick Summary

  • Tunnel technology for virtualized networks
  • Flexbile and extensible tunnel header with TLV design
  • Available in OpenvSwitch (not included in 2.3.0, master branch only)

Existing Tunneling Technology

VLAN

  • VLAN tag in L2 header
  • Virtualized L2

MPLS

  • L2|MPLS|L3|L4|L5
  • Routing according to MPLS tags

VXLAN

  • L2|L3|UDP|L2|L3|L4|L5 (L2 over UDP)
  • VLAN that can cross L3 networks
  • Support L2 Multicast
  • VMWare uses VXLAN to communicate with gateways to the non virtualized world

NVGRE

  • L2|L3|GRE|L2|L3|L4|L5 (L2 over L3)
  • NVGRE is used mostly by Microsoft

STT

  • L2|L3|TCP-like|STT|L2|L3|L4|L5 (L2 over TCP-like)
  • STT header is only needed for the first L3 fragement
  • TCP-like header
    • The format is exaclty the same as TCP packet
    • However, the processing is entirely different: no 3-way handshake, no ACKs, no retransmission
  • VMWare uses STT as the tunneling mechanism between vSwitches

Motivation

  • Existing technologies have fixed definition in header size and fields
    • Lack of flexibility
    • Insufficient space of virtual network identifier (under multi-tenant virtualized networks)
    • Fail to keep pace with the evolution of the network system

Design Considerations

Control Plane Independence

  • Some existing protocols have included a control plane
    • E.g. VXLAN prescribed a multicast learning-based control plane
  • Geneve aims to be a pure tunnel format specification that can fulfill many control planes by explicitly not selecting any one of them

Data Plane Extensibility

  • Variable length of header and options
    • Endpoints: Need to understand and parse new options
    • Transit Devices: optional participation in Geneve packet processing
  • Should not prevent endpoints and transit devices from using NIC checksum offload

Use of Standard IP Fabrics

  • Tunnel often results in poor ECMP performance without additional knowledge of the protocol
    • Add entropy using UDP source port (will be described later)

Compatibility

  • Fully compatible and interoperable with existing tunneling technologies
  • Designed as a superset rather than a replacment of existing tunneling technologies

Sample Deployment

+---------------------+           +-------+  +------+
| +--+  +-------+---+ |           |Transit|--|Top of|==Physical
| |VM|--|       |   | | +------+ /|Router |  | Rack |==Servers
| +--+  |Virtual|NIC|---|Top of|/ +-------+\/+------+
| +--+  |Switch |   | | | Rack |\ +-------+/\+------+
| |VM|--|       |   | | +------+ \|Transit|  |Uplink|   WAN
| +--+  +-------+---+ |           |Router |--|      |=========>
+---------------------+           +-------+  +------+
       Hypervisor
           ()===================================()
                 Switch-Switch Geneve Tunnels

Frame Format

Overview

  • L2|L3|UDP|Geneve Base|Geneve Option|L2|L3|L4|L5
  • Detail could be found in appendix

UDP Header

  • Source Port
    • Selected by the ingress endpoint
    • Should be the same for all packets belonging to a single encapsulated flow
    • Should be calculated using hash of 5-tuple for even distribution over multiple links
    • Used as flow identifier rather than true UDP connection
      • Entire 16-bit range may be used to maximize entropy
  • Destination Port
    • 6081 (assigned by IANA)
  • UDP Checksum
    • May be set to zero (receiver will accept anyway)
    • Computed checksum is recommended to protect the Geneve header and options

Geneve Base Header

  • Option Length (6 bits)
    • 4 byte multiple
    • Excluding 8 bytes fixed Geneve header
    • Geneve header size
      • Min: 8 bytes
      • Max: 8+4*(2^6-1) = 260 bytes
  • OAM Frame (1 bit)
    • This packet contains a control message instead of data payload
    • If this bit set
      • Endpoints must not forward the payload
      • Transits must not attempt to interpret or process it
  • Critical Optios Present (1 bit)
    • 1+ options has critical bit set
    • If this bit set
      • Endpoints must parse option list and interpret all options that marked as critical
      • Endpoints must drop the packet if it cannot parse the critical options
      • Transits must not drop or modify this packet
    • If this bit is not set
      • Endpoint may strip all options based on Option Length and forward the decapsulated packets
  • Protocol Type (16 bits)
    • The protocol of data unit appearing after the Geneve header
      • Follows EtherType where Ethernet is represented as 0x6558
  • Virtual Network Identifier (VNI) (24 bits)
    • Uniquely identify a virtual network
      • Represent an L2 segment in many situations
      • May be used as a part of ECMP decision

Geneve Option Header

  • Option Class (16 bits)

  • Type (8 bits)
    • Indicates the format of the data contained in this option
    • The highest bit of type field indicates that whether this is a critical option
  • Length (5 bits)
    • 4 byte multiple
    • Excluding the 4 bytes option header
    • Geneve option size
      • Min: 4 bytes
      • Max: 4+4*(2^5-1) = 128 bytes
    • Endpoints drops a packet if the total length of options is not equal to the Option Length field specified in the base header
  • Note
    • Sending endpoint must not assume that options will be processed sequentially by the receiver

Reference

Appendix: Detail Frame Format

0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Outer Ethernet Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 Outer Destination MAC Address                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address |   Outer Source MAC Address    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   Outer Source MAC Address                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Optional Ethertype=C-Tag 802.1Q|  Outer VLAN Tag Information   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Ethertype=0x0800        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Outer IPv4 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |Flags|      Fragment Offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Time to Live |Protocol=17 UDP|         Header Checksum       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Outer Source IPv4 Address                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   Outer Destination IPv4 Address              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Outer UDP Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       Source Port = xxxx      |       Dest Port = 6081        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           UDP Length          |        UDP Checksum           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Geneve Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver|  Opt Len  |O|C|    Rsvd.  |          Protocol Type        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        Virtual Network Identifier (VNI)       |    Reserved   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Variable Length Options                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Inner Ethernet Header (example payload):
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 Inner Destination MAC Address                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Inner Destination MAC Address |   Inner Source MAC Address    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   Inner Source MAC Address                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Optional Ethertype=C-Tag 802.1Q|  Inner VLAN Tag Information   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Payload:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethertype of Original Payload |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                  Original Ethernet Payload    |
|                                                               |
| (Note that the original Ethernet Frame's FCS is not included) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Frame Check Sequence:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   New FCS (Frame Check Sequence) for Outer Ethernet Frame     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+