Easyman 的 IT 雜記: 2010

2010年12月25日星期六

回顧我的 2010

還剩一個禮拜就要結束 2010 了, 記得年初訂的新年新希望是 "Live in a better life, in both spritual and physical”, 到底實現了多少呢? 現在來回顧一下吧!

養成一個禮拜游泳1~2次的習慣了, 目前已經可以游到 1.5KM 了, 希望能繼續保持這個好習慣, 對抒解身心壓力有很大的幫助。這個運動也比較温和, 對步入中年的我還算滿適合的。游泳池畔多的是 50 或 60 幾歲的長輩。
找了很多原文的故事書講給馬小尼聽:
有 Roald Dhal 三本書, 從 Fantastic Mr. Fox -> Charlie and Chocklate Factory -> BFG。一開始是因為已經找不到適合的床邊故事講給她聽, 連一旁的老婆也聽的津津有味。這三本書都有對應的電影版本, 看到她看電影時那種驚喜連連的反應, 心裏覺得有很大的成就感。最近幫她準備的是全套的神奇樹屋, 雖然發現她還是不太能接受中英摻雜的說故事方式, 不過中英文合併的印刷方式讓我覺得滿值得的, 希望她大一點時也能對她的英文閱讀能力幫上忙。
今年通過的認證項目
Fri, 29 Oct 2010 at 09:30
642-515: Securing Networks with ASA Advanced

Sat, 9 Oct 2010 at 10:30
642-524: Securing Networks with ASA Foundation

Sat, 3 Jul 2010 at 11:00
642-504: Securing Networks with Cisco Routers and Switches

Sat, 16 Jan 2010 at 17:00
VCP410: VMware Certified Professional on vSphere 4

還差一科 IPS 就可以完成 CCSP了, 不過最近確有一點後繼無力的感覺, 部份原因是工作變的比較忙碌了, 另外一個原因我想應該是我自己對考試這件事和職涯發展的關聯性的質疑吧? 非得要考試才能證明自己學習的成果和能力嗎? 在 SI 時, 已經習慣用考試來增加自己對職場生存的安全感和自我肯定的這種觀念似乎還沒完全改過來。考試這件事真的會吃掉很多時間, 學習帶來的成就感似乎不太能填補失去的時間。哎..... 還有 IE 的 recertify 在後面等著呢。有心卻無力指的是不是就是這種感覺呢?
聖誕夜的驚喜
這應該是我今年最大的驚喜吧!
禮拜五下班回家, 一如往常的會塞車。電話鈴嚮了, 馬小尼用語帶興奮的口氣對我說: "爹地~ 媽咪有準備聖誔大餐哦~, 我有準備禮物要送給你和媽咪哦"。啊, 不是向我要禮物, 而是準備禮物給我們哦?

回到家後, 看到家門口的這張紙, 真的讓我覺得很有趣。門口貼著一張白紙, 上面用注音符號寫著"今天有聖誕大餐", 住對面的鄰居看到了也說:"你女兒好可愛哦"!! 對不起哦~ 真是不巧, 老是讓您看到她可愛又乖巧的一面..... :P

傳說中的聖誕大餐, 媽咪~ 真是辛苦你囉

我和老婆的禮物, 按摩券和做家事券, 感動的累~

全家人一起快樂的吃頓飯才是最大的幸福。馬小尼, 你又讓我感覺到你的成長了。繼續給我更多的驚喜吧~

2010年10月19日星期二

看完這篇, 我承認我太小看 MPF 了。

It is much more sophisticated than what I was thinking.

Thanks for the great article from Petr.

http://blog.ine.com/2009/04/19/understanding-modular-policy-framework/

Action are applied in the following sequence within MPF.

QoS input policing. Applies to traffic entering the firewall, enforces traffic rate. Configured using the command police input| under the policy-map.
TCP normalization. TCP and UDP connection limits and timeouts, and TCP sequence number randomization. Performs TCP connection modification and monitoring to enforce security settings. Confiugured using the command set connection and a pre-configured tcp-map with the advanced TCP parameters.
CSC (if installed). Content security.
Application inspection (multiple types). The core of the stateful firewall. Parses traffic streams and detects application protocols and their commands. Allows enforcing per-application security policies. The command to apply inspection is inspect {protocol-name}. Could be fine-tuned using inspection policy-maps.
IPS (if installed). Intrusion prevention – allows the firewall to work as an inline IPS.
QoS output policing. Applies to traffic leaving the firewall, enforces specified rate. The command is police output
QoS interface priority queue. Services traffic using the interface-level low-latency queue. Configured using the command priority. Could not be applied along with policing feature.
QoS traffic shaping, hierarchical priority queue. Mutually exclusive with any other interface-level QoS features. Traffic shaping could be only applied under class-default

Feature	Interface-Level Direction	Global Policy Direction	Flow-aware feature
QoS Input Policing	Ingress	Ingress
TCP Normalization, Connection Limits, ISN randomization	Bidirectional	Ingress	Yes
CSC	Bidirectional	Ingress	Yes
Application Inspection	Bidirectional	Ingress	Yes
IPS	Bidirectional	Ingress	Yes
QoS Output Policing	Egress	Egress
QoS Interface-Level PQ	Egress	Egress
QoS Shaping, Hierarchical PQ	Egress	N/A

Feature Incompatibilities

As you remember, you can apply multiple actions under the same class. Some actions just can’t go together. Here is the list of the limitations:

1) You can’t combine policing and interface-level priority queuing for the same class.
2) You can’t configure shaping in global policy map.
3) You can only shape ALL traffic leaving the interface, i.e. you can only shape under class-default.
4) You cannot configure two inspect actions under the same class with except to default-inspection-traffic class.

Application priorities:

CTIQBE
DNS
FTP
GTP
H323
HTTP
ICMP
ICMP error
ILS
MGCP
NetBIOS
PPTP
Sun RPC
RSH
RTSP
SIP
Skinny
SMTP
SNMP
SQL*Net
TFTP
XDMCP
DCERPC
Instant Messaging

Here is the list of basic points about MPF:

1) Service policies could be applied globally or per-interface.

2) A packet flow can match multiple classes.

2.1) In case if two ore more classes specify the same feature, firewall applies the deterministic procedure to resolve the conflict.

2.3) In the classes specify different features, they are combined, provided that the features could be used together.

3) Many firewall features are aware of stateful traffic flows.

4) The order that the features are applied is fixed and does not depend on the order of classes in the policy-maps.

2010年10月6日星期三

ASA – Command Authorization

There are three ways to fulfill this.

using enable command
using locally defined username and password
using AAA defined username and password with AAA server

Using enable command for authorization

Create different enable password for desired privilege level.

ASA-Roy(config)# enable password level9 level 9
ASA-Roy(config)# enable password level11 level 11

Adjust the commands’ privilege level.
In this example, level9 can show access-list and level 11 can configure access-list.

ASA-Roy(config)# privilege show level 9 mode exec command access-list
ASA-Roy(config)# privilege configure level 11 command access-list

It is important to be careful that ‘parent’ command should be adjust accordingly also, otherwise you will not be able to use the command even if it is configured correctly.
For example, you need to enable ‘configure terminal’ for privilege 11 otherwise you will not be able to enter global configuration mode to issue the ‘access-list’ command.

Enable command authorization and make sure your are NOT enable the “enable authentication” command thru AAA or LOCAL.

ASA-Roy(config)# aaa authorization command LOCAL
ASA-Roy(config)# no aaa authentication enable console LOCAL
or
ASA-Roy(config)# no aaa authentication enable console AAA_Method

If you do, you will get the following error message when you try to issue ‘enable privilege_level’ command.

ASA-Roy> sh curpriv
Username : admin_asa
Current privilege level : 1
Current Mode/s : P_UNPR
ASA-Roy> enable 9
Enabling to privilege levels is not allowed when configured for
AAA authentication. Use 'enable' only.

Using local user for command authorization

ASA-Roy(config)# aaa authentication enable console LOCAL
ASA-Roy(config)# username level9 password level9 privilege 9
ASA-Roy(config)# username level11 password level11 privilege 11

User Access Verification

Password:
Type help or '?' for a list of available commands.
ASA-Roy> enable
Username: level11
Password: *******
ASA-Roy# sh curpriv
Username : level11
Current privilege level : 11
Current Mode/s : P_PRIV

Using external AAA server for command authorization

aaa authorization command AAA_GROUP LOCAL
ASA-Roy# sh run aaa-
aaa-server AAA_GROUP protocol tacacs+
aaa-server AAA_GROUP (inside) host 1.1.1.1
key *****

ACS Screenshot

2010年10月1日星期五

ASA – CTP(Cut-Through Proxy) with AAA

In some circumstance, using ACL to control the access is still not enough:

For example, you have tow user groups – Finance & HR. You also have two server groups – Finance and HR. You want to have Finance group access to Finance servers but not HR servers. Vice versa, HR users can access to only HR servers but not Finance servers. And if they are in a dhcp environment, how can you enforce the restriction?

The solution is CTP with AAA. It looks like you add an extra lock for the servers and the key is username/password. After passing the Interface ACL, ASA will send prompt to authenticate user if CTP is enabled.

CTP - Authentication

CTP supports ftp, telnet, ftp & http/https protocol.
CTP supports multiple proxy connection and can be limited with ‘aaa proxy-limit’ cmd.
Authentication prompt can be customized by ‘auth-prompt {accept | reject | prompt } prompt_string’ cmd.
Authentication timeouts can be controlled by ‘timeout uauth hh:mm:ss [absolute | inactivity]’ cmd.
CTP auth in HTTP protocol

Basic Auth (HTTP/HTTPs): Ideal if then destination web server also request Basic Auth and if id/pw are identical. You only need to enter id/pw once.
Internal Web (HTTP/HTTPs):

Two ways to configure CTP authenticatoin
1. aaa authentication {include | exclude}
2. aaa authentication match (preferred method)
To control access for non-supported applicatons
1. virtual telnet
2. virtual http

CTP – Authorization

There are two main problems with CTP authentication:

Users need to access multiple internal devices, but with CTP authentication, the user would have to authenticate to each individual device.
CTP authentication is global: once a user authenticates, he can access all the requested service; in other words, you can’t control who accesses what service.

CTP authorization options

Classic method
- Only supports TACACS+ with ACS.
- Disadvantage: each connection the authenticated user opens will incur an initial delay will the policy lookup occurs.
- Advantage: Policy change on AAA server is in immediate effect.
Downloadable ACLs (newer & preferred)
- AAA authenticates user, if authenticated ACS send the name of ACL to appliance.
- Appliance check if the ACL was already downloaded, either use it or download from ACS.
- the ACL is used to determine what the user can access, interface ACL is ignored.

Reference:

http://www.amazon.com/Cisco-Configuration-Networking-Professionals-Library/dp/0071622691/ref=sr_1_2?ie=UTF8&s=books&qid=1285924140&sr=8-2-spell

2010年9月27日星期一

ASA – Network Attack Preventation

Threat Detection

Basic threat detection (performance impact low)
monitor dropped packet rates and security events. If it sees a threat, the appliance generates a log message with a log identifier number of 730100. The kinds of security events or dropped packet rates that the appliance monitors include:

Matches on deny statements in ACLs.
Malformed packets (for example, invalid IP header values or an incorrect header length).
Packets that fail application layer inspection policies defined by the Modular Policy Framework (MPF) or that inherit in the application inspection process itself. (For example, if a specified URL in a policy was seen, causing an HTTP connection to be reset, or if a wiz command was executed on an SMTP/ESMTP connection respectively.)
Defined connection limits that have been exceeded, which includes global system values as well as limits you’ve defined with MPF or the static/nat commands.
Seeing unusual ICMP packets or connections.
Examining the combined rate of all security-related packet drops in this bulleted list.
An interface became overloaded, causing packet drops.
A scanning attack was detected. (For example, the TCP three-way handshake failed, or the first packet in a TCP connection was not a SYN—this is discussed in the “Scanning Threat Detection” section later in the chapter.)
An incomplete connection was detected. (For example, the TCP three-way handshake failed, or UDP traffic is only seen in one direction of a connection.)

Scanning threat detection (performance impact high)

disabled by default
detect scan attacks and optionally shun the attacker.
shunning can also be made manually & unconditionally which take precedence over any policy control (acl , inspection, even conn table checking)

Threat detection statistic (performance impact high)
-disabled by default
-monitor the appliance threat statistics

IP Audit

Software based IPS
Information and Attack
50+ signatures to detect attacks.

TCP Normalization

Prevent abnormal or unusual TCP packets.
Extension of MPF.
Create TCP map to define abnormal criteria.

RPF - Reverse Path Forwarding

RFC 2267
Prevent IP spoofing attacks
Compare the src in packet with routing table to verifiy where it is coming from.
Drop if packet is coming from a network that is not associated with the source interface.

Fragmentation Limits

Use fragment to control how many fragments make up a packet.

http://www.amazon.com/Cisco-Configuration-Networking-Professionals-Library/dp/0071622691/ref=sr_1_2?ie=UTF8&s=books&qid=1285924140&sr=8-2-spell

ASA -Failover

Types

Active/Standby
Active/Active (need multiple context)

HW,SW and configuration requirement

Hardware Requirements

The two units in a failover configuration must be the same model, have the same number and types of interfaces, and the same SSMs installed (if any).

If you are using units with different Flash memory sizes in your failover configuration, make sure the unit with the smaller Flash memory has enough space to accommodate the software image files and the configuration files. If it does not, configuration synchronization from the unit with the larger Flash memory to the unit with the smaller Flash memory will fail.

Although it is not required, it is recommended that both units have the same amount of RAM memory installed.

Software Requirements

The two units in a failover configuration must be in the same operating modes (routed or transparent, single or multiple context). They must have the same major (first number) and minor (second number) software version. However, you can use different versions of the software during an upgrade process; for example, you can upgrade one unit from Version 7.0(1) to Version 7.0(2) and have failover remain active. We recommend upgrading both units to the same version to ensure long-term compatibility.

License requirement (PIX)

3 versions for PIX: UR(Unrestricted), R(Restricted) & FO(Failover)
Valid combanition
- UR+UR, R+R, UR+FO, R+FO
UR+UR support a/a, a/s
UR+FO support only a/s

Chassis vs. Stateful failover

With Unit failover, secondary unit sync config with primary and take over when primary role failed. All xlate, conn, vpn session …etc. will be dropped when primary role failed.
With stateful failover, an extra stateful link is used to replicate the session data from primary to secordary unit which can keep the sessions even primary unit failed.

Failover Link Serial vs. LAN-based failover (LBF)

Serial: dedicated for PIX with Cisco proprietary RS-232 cable clocked at 115Kbps with DB-15 connector. Cable defines the primary and secondary end.
LBF: Introduced in v6.2 which use Ethernet interface instead of a serial cable. ASA use LBF as failover link. Can be combined with stateful link.

Failover communications

The state of the appliances: active or standby
Power status if PIX with Serial failover link
Failover hello messages
Network link status of the appliances interfaces.
Exchange of MAC addresses used on the appliance interfaces
Configuration of the active unit synchronized with the standby unit
With stateful failover, following are synced.
- xlate table
- conn table
- VPN sessions (only in A/S mode)
- MAC address table(Only in transparent mode)
- SIP signaling information
- Current date and time.

Failover link monitoring

Both failover and data interfaces are monitored by the failover pair.
Failover hello send on failover link every 15 seconds by default. (minimum 200ms)
Hold time is 45s (3 hello messages interval).
Interface test will be made to determine if active unit failed.
If active unit/interfaces failed, standby unit promote itself to an active state.

Interface Monitoring

If a hello message from a mate is not seen on a monitored interface for one-half the hold-down period, the appliance will run interface tests on the suspect interface to determine what the problem is.
4 tests include,

Link up/down test
network activity test
ARP test
Broadcast ping test.

2010年9月25日星期六

ASA – multiple context mode

Licensing

PIX 515 and higher and ASA 5510 and higher support contexts.

Context Uses

active/active failover
ISP, co-location/hosting companies that host services requiring firewall functions
Need more than one firewall in the same physical location.

Context Restriction

Dynamic routing protocols (unicast & multicast) are unsupported, only static routes available.
No VPN support, no matter IPSec, L2TP or WebVPN.
Threat detection is unsupported.

System Area

system-wide configuration
create/delete contexts
doesn’t count as a context itself.
accessed by admin context.
leverage admin context to communicate with external devices/services.

Context

Have a name, interfaces allocated to it and a configuration file to store the security policies and configuration of actual context itself.
By default, ‘admin’ context is the administrative context to access system area.
Any context can be admin context, but just only one. (admin-context context_name)

Context chaining

Context can be chained by sharing a common physical/vlan interface.
Only MAC address and translation rules are used to match a packet to a context when interfaces are shared.
Recommend to assign unique MAC for interface of each context. (mac-address auto)

Managing Resources

Following resources can be defined(limited) for a context.

Mgmt connections: ASDM, telnet & ssh.
Hosts.
MAC addresses
Xlates in the translation table.
Connectoins in the state table.
Syslog messages/second.
Applicaton inspections/second.

2010年9月23日星期四

Hard coding speed & duplex 一定是對的嗎? 那可不一定哦!!

為了避免 speed/duplex mismatch 的問題, 很多 configuration guide 都會建議在 switch 端和 end node 端把 speed/duplex 固定住 , 以免產生 duplex mismatch 的問題。這個原則我也已經奉行很久了, 一直沒有遇到什麼問題, 直到那天.....

事情要先從某個 branch office 要 deploy video conference 設備說起。因為 MPLS WAN 的 BW 有限, 所以目前是讓 video conf 跑在 Internet 上, 因此也採購了 firewall , router 和 10M Internet circuit。

FW (e0/0-outside) <-------> (f0/1-LAN ) Router (f0/0-WAN) <----------> ISP BB

根據前面提的經驗法則, 很自然的把 FW, Router Interfaces 的 speed/duplex 都設成 100/full。經過簡單的測試後 Video Conf 的 call 也都能 setup 成功, 因此很快就結案。

But, 當系統啟用後, 有大頭級的使用者抱怨 quality 不好, 所以就開始進行 troubleshooting。透過 Video conference device 的管理介面發現有 packet loss 的問題, 便開始先從 Internet 檢查起。先確定是否有 asymmetric routing issue (因為兩個 site 在不同國家), 也請 ISP re-route 看是否能解決, 但是都沒有用, packet loss issue 依然存在。

因為 packet loss 的問題只有 one-way, 開始懷疑packet是不是被ISP drop(CIR 10M)了? 因此重新設定了 Video conference 和 router 的 QoS, 結果還是一樣。

正在納悶的時候, 突然注意到在 FW (ASA OS8.0.4) 的 e0/0 (outside) 是 100/half, 看來是我老了, 忘了在 fw 端設成 100/full了, 當下就趧快把它改過來了。

Ya~ 搞定了嗎!?.....................NO, 才怪! Router 和 FW outside 連線竟然斷了!!

再確認一次 FW 和 Router 的介面設定, 設定都對。心裏有種見鬼了的感覺。

ASA(config-if)# sh run int e0/0
!
interface Ethernet0/0
speed 100
duplex full
nameif outside
security-level 0

Router#sh run int f0/1
interface FastEthernet0/1
description To_Customer_LAN
load-interval 30
duplex full
speed 100

FW介面是 down/down, 而 Router 則是 up/down。我是在作夢嗎?

ASA(config-if)# sh int e0/0
Interface Ethernet0/0 "outside", is down, line protocol is down
Hardware is i82546GB rev03, BW 1000 Mbps, DLY 10 usec
        Auto-Duplex(Half-duplex), Auto-Speed(100 Mbps)
        MAC address c84c.7552.15b8, MTU 1500
        IP address 203.117.9.146, subnet mask 255.255.255.240
        42260544 packets input, 32562517068 bytes, 0 no buffer
        Received 103466 broadcasts, 0 runts, 0 giants
        0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
        0 L2 decode drops
        34853064 packets output, 10635547375 bytes, 0 underruns
        0 output errors, 71776 collisions, 2 interface resets
        0 babbles, 86297 late collisions, 437184 deferred
        0 lost carrier, 0 no carrier
        input queue (curr/max packets): hardware (0/17) software (0/0)
        output queue (curr/max packets): hardware (0/0) software (0/0)

SGSIN-B01F09C01-RTI01#sh int f0/1
FastEthernet0/1 is up, line protocol is down
Hardware is MV96340 Ethernet, address is fcfb.fba0.6541 (bia fcfb.fba0.6541)
Description: To_Customer_LAN
Internet address is 203.117.9.145/28
MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,
     reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 100Mb/s, 100BaseTX/FX
ARP type: ARPA, ARP Timeout 04:00:00
Last input 03:19:21, output 00:00:09, output hang never
Last clearing of "show interface" counters never
Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
30 second input rate 0 bits/sec, 0 packets/sec
30 second output rate 0 bits/sec, 0 packets/sec
     34853769 packets input, 1906329303 bytes
     Received 1 broadcasts, 80319 runts, 0 giants, 0 throttles
     163190 input errors, 82871 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog
     0 input packets with dribble condition detected
     42656842 packets output, 2314899508 bytes, 0 underruns
     0 output errors, 0 collisions, 2 interface resets
     0 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier
     0 output buffer failures, 0 output buffers swapped out

Bounce 了幾次 interface 都沒用, 最後乾脆都改成 auto/auto, 沒想到一改居然通了, 不信邪又改回 100/full , 結果又斷了。

我想應該是 bug 吧, 不過這又再一次印證了 troubleshooing 雖然有經驗法則, 但是實務上還是要靈活一點, 不要預設任何設定一定都是對的, 否則可能會浪費很多時間。

2010年8月23日星期一

ASA - transparent mode

Switch vs. ASA

Separate Vlan

If same switch is used by ASA for outside and inside connection, then separate vlan is used for outside and inside interface.
If different switches are used by asa, then same vlan id can be used for inside and outside interface.
The principle is to make sure ASA is the only path for traffic to flow through, and not bypassed thru switch.

Unknown frame flooding process

-Switch floods the frame to all ports of the same vlan.
-ASA take advantage of ARP process to figure out the MAC, either forward if MAC is known or drop it if unknown.

Spanning-Tree

-Switch participates in SPT by default to prevent L2 loop.
-ASA does not participate SPT, thus L2 Loop prevention must be made/confirmed manually.

Level of frame processing

-Switch forward/filter frame at L2.
-ASA forward frame at L2 but filter/manipulate it from L2~L7.

Restriction of TP mode

■    Support only two interfaces (physical or VLAN).
■    IPSec vpn and WebVPN is not supported.
■    CDP (Cisco Discovery Protocol) and IPv6 packets are dropped.
■    Ethernet frames that don’t have a valid Ether-Type greater than or equal to 0x600 are dropped;
        Exception can be made by ether-type acl.
■    NAT is not supported untile version 8.
■    QoS with LLQ is not supported as a policy.
■    Routing is unsupported.

ARP inspection

ARP packet is allowed by default.
It solve the problem caused by spoofed arp reply or rogue gratuitous arp.
ARP inspection is enabled on an interface-by-interface basis.
Drop packet when incorrect IP-MAC combination or source mac with wrong interface within arp reply.

Configuration

Enable transparent mode

asa(config)#firewall transparent

asa#show firewall

Configuring management IP

ass(config)# ip address IP_address [subnet_mask] [standby IP_address]

MAC address table manipulation

assa# show mac-address-table

asa(config)# mac-address-table aging-time minutes

asa(config)# mac-address-table static logical_if_name mac_address

asa(config)# mac-learn logical_if_name disable

Ether-type ACL (non-IP traffic)

asa(config)# access-list ACL_ID ethertype {deny | permit} {ipx | bpdu | mpls-unicast | mpls-multicast | any | hex_#_of_protocol} [log]

ARP Inspection

asa(config)# arp-inspection logical_if_name enable [flood | no-flood]

asa(config)# arp logical_if_name IP_address MAC_address

asa# show arp-inspection

asa# show arp

ASA-Sequence of matching translation policy

1. Existing translation in xlate slot of translation table

2. best-match of NAT 0 (NAT exemption)

3. best-match of static NAT

4. best-match of static PAT

5. Policy NAT

6. General NAT

7. Drop if no match of the above.

2010年7月2日星期五

Cisco IOS firewall – Classical Firewall

CCSPv3 SNRS 中有介紹到 IOS firewall, 主要分為二大類

Classical Firewall
Zone-Based Policy Firewall

本篇描述的是 Classical Firewall。

由於 Cisco 實在是太會不停地替旗下產品(或功能)重新命名, 在學習的過程經常會被不同的 terms 搞的很因惑, 特別是在讀 SNRS vol. 3 的 IOS firewall 時, 覺得教材上實在是講的太有限了, 因此就找了這篇文章, 也解答了不少疑惑。

最早的 Cisco IOS firewall feature 出現於 11.2P (沒想到有這麼早), 早期的名稱是 CBAC (Context-Based Access Control)。一開始的目的是克服 ACL 純粹只能以 packet 為單位來進行 filter, CBAC 可以由它的全名的第一個字 "Context" 得知, 它是會考量 packet 的前後關係(session/dialogue)來進行 filter 的。

後來經過不斷的改善和強化, CBAC 從以 transport layer 來對照 packet 的 context 進行過濾, 進而升級到能深度檢查 application 層是否符合 protocol compliance 或依 application-level 的 service 來進行過濾。到了 12.3(4)T時, 還添加了 ACL Bypass feature, 更進一步提昇了 performance 和 stateful inspection architecture。因此 Cisco 將其 rename 為 SPI (Stateful Packet Inspection)。雖然兩者 (CBAC 和 SPI) 常會被認為是同一個功能, 但是事實上 SPI 比 CBAC 還要進階許多。

CBAC 的運作方式如下圖所示。

CBAC 必需要和 ACL 搭配運作才行, 以下節錄自原文。

SPI inspects the packet after it passes the inbound ACL of an input interface if ip inspect in is applied, or after the outbound ACL of output interface if ip inspect out is used. Thus, outbound traffic must be permitted by input ACLs facing the source, and outbound ACLs facing the destination.

以上這段是說明 outbound 的 traffic 必需先被 ACL permit 才會被對應的 ip inspect in/out 檢查, 但是 return 的 traffice 呢?
CBAC 會建立 dynamic access control entries (ACEs) 並套用在回程的 traffic 會經過的地方去呼應原先被 permit 的 outbound traffic。換句話說, CBAC 並沒有 maintain 所謂的 firewall "session table", 而是利用 dynamic ACEs 來允許回程的封包(類似 IOS Reflexive ACL 的原理)。與 CBAC 不同的, SPI 則會 maintain 一份 session table 來記錄 active sessions。以下是原文說明 ACL Bypass feature 為何能提昇 SPI 的效能。

ACL Bypass improves firewall performance for two reasons. SPI is able to maintain a more efficient list to track active sessions, reducing the time required for session setup and verification. Also, return traffic is not subjected to ACLs on the return path, so when return traffic finds a matching entry in the session table, it is shunted past the ACLs in the packet path, reducing the CPU overhead the packet incurs as it moves through the router's processing.

功能升級之後的 IOS Firewall 除了 SPI 之外, 還有下述功能

Protection Against Attack (DoS Protection)
Alerts and Audit Trails
Authentication Proxy (HTTP, HTTPS, TELNET & FTP)
Synergy with NAT and PAM (Port-to-Application Mapping)
Application Inspection (appfw for http/pop3/imap/smtp/esmtp/im)
IOS Transparent Firewall (利用IRB來達成)

Reference:

http://www.cisco.com/en/US/prod/collateral/vpndevc/ps5708/ps5710/ps1018/product_implementation_design_guide09186a00800fd670.html

2010年6月23日星期三

CCSP v3 SNRS 讀後感

呼~ 終於把 SNRS 看完了。

感想就是: Screenshot 有餘, elaboration 不足。大量的使用 SDM 讓我覺得有點失望, 對 cmd 的解說大部份都是來自官網上的 cmd reference 的文件。

Training Student guide 在我的認知應該是要能夠結構化的從 feature concept, feature origination, design example (scenario), configuration example 到較詳細的 configration parameter 說明。

以 ZFW 為例, 我看到了大量的 GUI screenshot, 以及 GUI 的操作說明, 但是對底層實際運作的 C3PL 的結構只有很簡略的介紹，讓我看的霧煞煞的, 最後還是到官網找了這篇文章才比較有清楚的概念。
http://www.cisco.com/en/US/products/sw/secursw/ps1018/products_tech_note09186a00808bc994.shtml

整體感覺是 profession level 的 certification 好像比以前退步了, 還真是有點猶豫要不要去考 SNRS。

2010年5月20日星期四

Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC

Introduction

The purpose of this document is to present how IP Fragmentation and Path Maximum Transmission Unit Discovery (PMTUD) work and to discuss some scenarios involving the behavior of PMTUD when combined with different combinations of IP tunnels. The current widespread use of IP tunnels in the Internet has brought the problems involving IP Fragmentation and PMTUD to the forefront.

IP Fragmentation and Reassembly

The IP protocol was designed for use on a wide variety of transmission links. Although the maximum length of an IP datagram is 64K, most transmission links enforce a smaller maximum packet length limit, called a MTU. The value of the MTU depends on the type of the transmission link. The design of IP accommodates MTU differences by allowing routers to fragment IP datagrams as necessary. The receiving station is responsible for reassembling the fragments back into the original full size IP datagram.

IP fragmentation involves breaking a datagram into a number of pieces that can be reassembled later. The IP source, destination, identification, total length, and fragment offset fields, along with the "more fragments" and "don't fragment" flags in the IP header, are used for IP fragmentation and reassembly. For more information about the mechanics of IP fragmentation and reassembly, please see RFC 791 .

The image below depicts the layout of an IP header.

The identification is 16 bits and is a value assigned by the sender of an IP datagram to aid in reassembling the fragments of a datagram.

The fragment offset is 13 bits and indicates where a fragment belongs in the original IP datagram. This value is a multiple of eight bytes.

In the flags field of the IP header, there are three bits for control flags. It is important to note that the "don't fragment" (DF) bit plays a central role in PMTUD because it determines whether or not a packet is allowed to be fragmented.

Bit 0 is reserved, and is always set to 0. Bit 1 is the DF bit (0 = "may fragment," 1 = "don't fragment"). Bit 2 is the MF bit (0 = "last fragment," 1 = "more fragments").

Value

Bit 0 Reserved

Bit 1 DF

Bit 2 MF

May

Last

Do not

The graphic below shows an example of fragmentation. If you add up all the lengths of the IP fragments, the value exceeds the original IP datagram length by 60. The reason that the overall length is increased by 60 is because three additional IP headers were created, one for each fragment after the first fragment.

The first fragment has an offset of 0, the length of this fragment is 1500; this includes 20 bytes for the slightly modified original IP header.

The second fragment has an offset of 185 (185 x 8 = 1480), which means that the data portion of this fragment starts 1480 bytes into the original IP datagram. The length of this fragment is 1500; this includes the additional IP header created for this fragment.

The third fragment has an offset of 370 (370 x 8 = 2960), which means that the data portion of this fragment starts 2960 bytes into the original IP datagram. The length of this fragment is 1500; this includes the additional IP header created for this fragment.

The fourth fragment has an offset of 555 (555 x 8 = 4440), which means that the data portion of this fragment starts 4440 bytes into the original IP datagram. The length of this fragment is 700 bytes; this includes the additional IP header created for this fragment.

It is only when the last fragment is received that the size of the original IP datagram can be determined.

The fragment offset in the last fragment (555) gives a data offset of 4440 bytes into the original IP datagram. If you then add the data bytes from the last fragment (680 = 700 - 20), that gives you 5120 bytes, which is the data portion of the original IP datagram. Then, adding 20 bytes for an IP header equals the size of the original IP datagram (4440 + 680 + 20 = 5140).

Issues with IP Fragmentation

There are several issues that make IP fragmentation undesirable. There is a small increase in CPU and memory overhead to fragment an IP datagram. This holds true for the sender as well as for a router in the path between a sender and a receiver. Creating fragments simply involves creating fragment headers and copying the original datagram into the fragments. This can be done fairly efficiently because all the information needed to create the fragments is immediately available.

Fragmentation causes more overhead for the receiver when reassembling the fragments because the receiver must allocate memory for the arriving fragments and coalesce them back into one datagram after all of the fragments are received. Reassembly on a host is not considered a problem because the host has the time and memory resources to devote to this task.

But, reassembly is very inefficient on a router whose primary job is to forward packets as quickly as possible. A router is not designed to hold on to packets for any length of time. Also a router doing reassembly chooses the largest buffer available (18K) with which to work because it has no way of knowing the size of the original IP packet until the last fragment is received.

Another fragmentation issue involves handling dropped fragments. If one fragment of an IP datagram is dropped, then the entire original IP datagram must be resent, and it will also be fragmented. You see an example of this with Network File System (NFS). NFS, by default, has a read and write block size of 8192, so a NFS IP/UDP datagram will be approximately 8500 bytes (including NFS, UDP, and IP headers). A sending station connected to an Ethernet (MTU 1500) will have to fragment the 8500 byte datagram into six pieces; five 1500 byte fragments and one 1100 byte fragment. If any of the six fragments is dropped because of a congested link, the complete original datagram will have to be retransmitted, which means that six more fragments will have to be created. If this link drops one in six packets, then the odds are low that any NFS data can be transferred over this link, since at least one IP fragment would be dropped from each NFS 8500 byte original IP datagram.

Firewalls that filter or manipulate packets based on Layer 4 (L4) through Layer 7 (L7) information in the packet may have trouble processing IP fragments correctly. If the IP fragments are out of order, a firewall may block the non-initial fragments because they do not carry the information that would match the packet filter. This would mean that the original IP datagram could not be reassembled by the receiving host. If the firewall is configured to allow non-initial fragments with insufficient information to properly match the filter, then a non-initial fragment attack through the firewall could occur. Also, some network devices (such as Content Switch Engines) direct packets based on L4 through L7 information, and if a packet spans multiple fragments, then the device may have trouble enforcing its policies.

Avoiding IP Fragmentation: What TCP MSS Does and How It Works

The TCP Maximum Segment Size (MSS) defines the maximum amount of data that a host is willing to accept in a single TCP/IP datagram. This TCP/IP datagram may be fragmented at the IP layer. The MSS value is sent as a TCP header option only in TCP SYN segments. Each side of a TCP connection reports its MSS value to the other side. Contrary to popular belief, the MSS value is not negotiated between hosts. The sending host is required to limit the size of data in a single TCP segment to a value less than or equal to the MSS reported by the receiving host.

Originally, MSS meant how big a buffer (greater than or equal to 65496K) was allocated on a receiving station to be able to store the TCP data contained within a single IP datagram. MSS was the maximum segment (chunk) of data that the TCP receiver was willing to accept. This TCP segment could be as large as 64K (the maximum IP datagram size) and it could be fragmented at the IP layer in order to be transmitted across the network to the receiving host. The receiving host would reassemble the IP datagram before it handed the complete TCP segment to the TCP layer.

Below are a couple of scenarios showing how MSS values are set and used to limit TCP segment sizes, and therefore, IP datagram sizes.

Scenario 1 illustrates the way MSS was first implemented. Host A has a buffer of 16K and Host B a buffer of 8K. They send and receive their MSS values and adjust their send MSS for sending data to each other. Notice that Host A and Host B will have to fragment the IP datagrams that are larger than the interface MTU but still less than the send MSS because the TCP stack could pass 16K or 8K bytes of data down the stack to IP. In Host B's case, packets could be fragmented twice, once to get onto the Token Ring LAN and again to get onto the Ethernet LAN.

Scenario 1

Host A sends its MSS value of 16K to Host B.
Host B receives the 16K MSS value from Host A.
Host B sets its send MSS value to 16K.
Host B sends its MSS value of 8K to Host A.
Host A receives the 8K MSS value from Host B.
Host A sets its send MSS value to 8K.

In order to assist in avoiding IP fragmentation at the endpoints of the TCP connection, the selection of the MSS value was changed to the minimum buffer size and the MTU of the outgoing interface (- 40). MSS numbers are 40 bytes smaller than MTU numbers because MSS is just the TCP data size, which does not include the 20 byte IP header and the 20 byte TCP header. MSS is based on default header sizes; the sender stack must subtract the appropriate values for the IP header and the TCP header depending on what TCP or IP options are being used.

The way MSS now works is that each host will first compare its outgoing interface MTU with its own buffer and choose the lowest value as the MSS to send. The hosts will then compare the MSS size received against their own interface MTU and again choose the lower of the two values.

Scenario 2 illustrates this additional step taken by the sender to avoid fragmentation on the local and remote wires. Notice how the MTU of the outgoing interface is taken into account by each host (before the hosts send each other their MSS values) and how this helps to avoid fragmentation.

Scenario 2

Host A compares its MSS buffer (16K) and its MTU (1500 - 40 = 1460) and uses the lower value as the MSS (1460) to send to Host B.
Host B receives Host A's send MSS (1460) and compares it to the value of its outbound interface MTU - 40 (4422).
Host B sets the lower value (1460) as the MSS for sending IP datagrams to Host A.
Host B compares its MSS buffer (8K) and its MTU (4462-40 = 4422) and uses 4422 as the MSS to send to Host A.
Host A receives Host B's send MSS (4422) and compares it to the value of its outbound interface MTU -40 (1460).
Host A sets the lower value (1460) as the MSS for sending IP datagrams to Host B.

1460 is the value chosen by both hosts as the send MSS for each other. Often the send MSS value will be the same on each end of a TCP connection.

In Scenario 2, fragmentation does not occur at the endpoints of a TCP connection because both outgoing interface MTUs are taken into account by the hosts. Packets can still become fragmented in the network between Router A and Router B if they encounter a link with a lower MTU than that of either hosts' outbound interface.

What Is PMTUD?

TCP MSS as described above takes care of fragmentation at the two endpoints of a TCP connection, but it doesn't handle the case where there is a smaller MTU link in the middle between these two endpoints. PMTUD was developed to avoid fragmentation in the path between the endpoints. It is used to dynamically determine the lowest MTU along the path from a packet's source to its destination.

Note: PMTUD is only supported by TCP. UDP and other protocols do not support it. If PMTUD is enabled on a host, and it almost always is, all TCP/IP packets from the host will have the DF bit set.

When a host sends a full MSS data packet with the DF bit set, PMTUD works by reducing the send MSS value for the connection if it receives information that the packet would require fragmentation. A host usually "remembers" the MTU value for a destination by creating a "host" (/32) entry in its routing table with this MTU value.

If a router tries to forward an IP datagram, with the DF bit set, onto a link that has a lower MTU than the size of the packet, the router will drop the packet and return an Internet Control Message Protocol (ICMP) "Destination Unreachable" message to the source of this IP datagram, with the code indicating "fragmentation needed and DF set" (type 3, code 4). When the source station receives the ICMP message, it will lower the send MSS, and when TCP retransmits the segment, it will use the smaller segment size.

Here is an example of an ICMP "fragmentation needed and DF set" message that you might see on a router after turning on the debug ip icmp command:

ICMP: dst (10.10.10.10) frag. needed and DF set 
unreachable sent to 10.1.1.1

The diagram below shows the format of ICMP header of a "fragmentation needed and DF set" "Destination Unreachable" message.

Per RFC 1191 , a router returning an ICMP message indicating "fragmentation needed and DF set" should include the MTU of that next-hop network in the low-order 16 bits of the ICMP additional header field that is labeled "unused" in the ICMP specification RFC 792 .

Early implementations of RFC 1191 did not supply the next hop MTU information. Even when this information was supplied, some hosts ignore it. For this case, RFC 1191 also contains a table that lists the suggested values by which the MTU should be lowered during PMTUD. It is used by hosts to arrive more quickly at a reasonable value for the send MSS.

PMTUD is done continually on all packets because the path between sender and receiver can change dynamically. Each time a sender receives a "Can't Fragment" ICMP messages it will update the routing information (where it stores the PMTUD).

Two possible things can happen during PMTUD:

The packet can get all the way to the receiver without being fragmented.

Note: In order for a router to protect the CPU against DoS attacks, it throttles the number of ICMP unreachable messages that it would send, to two per second. Therefore, in this context, if you have a network scenario in which you expect that the router would need to respond with more than two ICMP (code = 3, type = 4) per second (can be different hosts), you would want to disable the throttling of ICMP messages with the no ip icmp rate-limit unreachable [df] interface command.

The sender can get ICMP "Can't Fragment" messages from any (or every) hop along the path to the receiver.

PMTUD is done independently for both directions of a TCP flow. There may be cases where PMTUD in one direction of a flow triggers one of the end stations to lower the send MSS and the other end station keeps the original send MSS because it never sent an IP datagram large enough to trigger PMTUD.

A good example of this is the HTTP connection depicted below in Scenario 3. The TCP client is sending small packets and the server is sending large packets. In this case, only the servers large packets (greater than 576 bytes) will trigger PMTUD. The client's packets are small (less than 576 bytes) and will not trigger PMTUD because they do not require fragmentation to get across the 576 MTU link.

Scenario 3

Scenario 4 shows an asymmetric routing example where one of the paths has a smaller minimum MTU than the other. Asymmetric routing occurs when different paths are taken for sending and receiving data between two endpoints. In this scenario, PMTUD will trigger the lowering of the send MSS only in one direction of a TCP flow. The traffic from the TCP client to the server flows through Router A and Router B, whereas the return traffic coming from the server to the client flows through Router D and Router C. When the TCP server sends packets to the client, PMTUD will trigger the server to lower the send MSS because Router D must fragment the 4092 byte packets before it can send them to Router C.

The client, on the other hand, will never receive an ICMP "Destination Unreachable" message with the code indicating "fragmentation needed and DF set" because Router A does not have to fragment packets when sending to the server through Router B.

Scenario 4

Note: The ip tcp path-mtu-discovery command is used to enable TCP MTU path discovery for TCP connections initiated by routers (BGP and Telnet for example).

Problems with PMTUD

There are three things that can break PMTUD, two of which are uncommon and one of which is common.

A router can drop a packet and not send an ICMP message. (Uncommon)

A router can generate and send an ICMP message but the ICMP message gets blocked by a router or firewall between this router and the sender. (Common)

A router can generate and send an ICMP message, but the sender ignores the message. (Uncommon)

The first and last of the three bullets above are uncommon and are usually the result of an error, but the middle bullet describes a common problem. People that implement ICMP packet filters tend to block all ICMP message types rather than only blocking certain ICMP message types. A packet filter can block all ICMP message types except those that are "unreachable" or "time-exceeded." The success or failure of PMTUD hinges upon ICMP unreachable messages getting through to the sender of a TCP/IP packet. ICMP time-exceeded messages are important for other IP issues. An example of such a packet filter, implemented on a router is shown below.

access-list 101 permit icmp any any unreachable
access-list 101 permit icmp any any time-exceeded
access-list 101 deny icmp any any
access-list 101 permit ip any any

There are other techniques that can be used to help alleviate the problem of ICMP being completely blocked.

Clear the DF bit on the router and allow fragmentation anyway (This may not be a good idea, though. See Issues with IP Fragmentation for more information).

Manipulate the TCP MSS option value MSS using the interface command ip tcp adjust-mss <500-1460>.

In Scenario 5 below, Router A and Router B are in the same administrative domain. Router C is inaccessible and is blocking ICMP, so PMTUD is broken. A workaround for this situation is to clear the DF bit in both directions on Router B to allow fragmentation. This can be done using policy routing. The syntax to clear the DF bit is available in Cisco IOS® Software Release 12.1(6) and later.

interface serial0 
... 
ip policy route-map clear-df-bit 
route-map clear-df-bit permit 10 
	match ip address 111 
	set ip df 0 
 
access-list 111 permit tcp any any

Another option is to change the TCP MSS option value on SYN packets that traverse the router (available in Cisco IOS 12.2(4)T and later). This reduces the MSS option value in the TCP SYN packet so that it's smaller than the value (1460) in the ip tcp adjust-mss command. The result is that the TCP sender will send segments no larger than this value. The IP packet size will be 40 bytes larger (1500) than the MSS value (1460 bytes) to account for the TCP header (20 bytes) and the IP header (20 bytes).

You can adjust the MSS of TCP SYN packets with the ip tcp adjust-mss command. The following syntax will reduce the MSS value on TCP segments to 1460. This command effects traffic both inbound and outbound on interface serial0.

int s0 
ip tcp adjust-mss 1460

IP fragmentation issues have become more widespread since IP tunnels have become more widely deployed. The reason that tunnels cause more fragmentation is because the tunnel encapsulation adds "overhead" to the size a packet. For example, adding Generic Router Encapsulation (GRE) adds 24 bytes to a packet, and after this increase the packet may need to be fragmented because it is larger then the outbound MTU. In a later section of this document, you will see examples of the kinds of problems that can arise with tunnels and IP fragmentation.

Common Network Topologies that Need PMTUD

PMTUD is needed in network situations where intermediate links have smaller MTUs than the MTU of the end links. Some common reasons for the existence of these smaller MTU links are:

Token Ring (or FDDI)-connected end hosts with an Ethernet connection between them. The Token Ring (or FDDI) MTUs at the ends are greater then the Ethernet MTU in the middle.

PPPoE (often used with ADSL) needs 8 bytes for its header. This reduces the effective MTU of the Ethernet to 1492 (1500 - 8).

Tunneling protocols like GRE, IPsec, and L2TP also need space for their respective headers and trailers. This also reduces the effective MTU of the outgoing interface.

In the following sections we will study the impact of PMTUD where a tunneling protocol is used somewhere between the two end hosts. Of the three cases above this case is the most complex, covering all of the issues that you might see in the other cases.

What Is a Tunnel?

A tunnel is a logical interface on a Cisco router that provides a way to encapsulate passenger packets inside a transport protocol. It is an architecture designed to provide the services to implement a point-to-point encapsulation scheme. Tunneling has the following three primary components:

Passenger protocol (AppleTalk, Banyan VINES, CLNS, DECnet, IP, or IPX)

Carrier protocol - One of the following encapsulation protocols:
- GRE - Cisco's multiprotocol carrier protocol. See RFC 2784 and RFC 1701 for more information.
- IP in IP tunnels - See RFC 2003 for more information.

Transport protocol - The protocol used to carry the encapsulated protocol

The packets below illustrate the IP tunneling concepts where GRE is the encapsulation protocol and IP is the transport protocol. The passenger protocol is also IP. In this case, IP is both the transport and the passenger protocol.

Normal Packet

TCP

Telnet

Tunnel Packet

GRE

TCP

Telnet

IP is the transport protocol.

GRE is the encapsulation protocol.

IP is the passenger protocol.

The next example shows the encapsulation of IP and DECnet as passenger protocols with GRE as the carrier. This illustrates the fact that the carrier protocol can encapsulate multiple passenger protocols.

A network administrator might consider tunneling in a situation where there are two discontiguous non-IP networks separated by an IP backbone. If the discontiguous networks are running DECnet, the administrator may not want to connect them together by configuring DECnet in the backbone. The administrator may not want to permit DECnet routing to consume backbone bandwidth because this could interfere with the performance of the IP network.

A viable alternative is to tunnel DECnet over the IP backbone. Tunneling encapsulates the DECnet packets inside IP, and sends them across the backbone to the tunnel endpoint where the encapsulation is removed and the DECnet packets can be routed it their destination via DECnet.

Encapsulating traffic inside another protocol provides the following advantages:

The endpoints are using private addresses (RFC 1918 ) and the backbone does not support routing these addresses.

Allow virtual private networks (VPNs) across WANs or the Internet.

Join together discontiguous multiprotocol networks over a single-protocol backbone.

Encrypt traffic over the backbone or Internet.

For the rest of the document we will use IP as the passenger protocol and IP as the transport protocol.

Considerations Regarding Tunnel Interfaces

The following are considerations when tunneling.

Fast switching of GRE tunnels was introduced in Cisco IOS Release 11.1 and CEF switching was introduced in version 12.0. CEF switching for multipoint GRE tunnels was introduced in version 12.2(8)T. Encapsulation and de-capsulation at tunnel endpoints were slow operations in earlier versions of IOS when only process switching was supported.

There are security and topology issues when tunneling packets. Tunnels can bypass access control lists (ACLs) and firewalls. If you tunnel through a firewall, you basically bypass the firewall for whatever passenger protocol you are tunneling. Therefore it is recommended to include firewall functionality at the tunnel endpoints to enforce any policy on the passenger protocols.

Tunneling might create problems with transport protocols that have limited timers (for example, DECnet) because of increased latency

Tunneling across environments with different speed links, like fast FDDI rings and through slow 9600-bps phone lines, may introduce packet reordering problems. Some passenger protocols function poorly in mixed media networks.

Point-to-point tunnels can use up the bandwidth on a physical link. If you are running routing protocols over multiple point-to-point tunnels, keep in mind that each tunnel interface has a bandwidth and that the physical interface over which the tunnel runs has a bandwidth. For example, you would want to set the tunnel bandwidth to 100 Kb if there were 100 tunnels running over a 10 Mb link. The default bandwidth for a tunnel is 9Kb.

Routing protocols may prefer a tunnel over a "real" link because the tunnel might deceptively appear to be a one-hop link with the lowest cost path, although it actually involves more hops and is really more costly than another path. This can be mitigated with proper configuration of the routing protocol. You might want to consider running a different routing protocol over the tunnel interface than the routing protocol running on the physical interface.

Problems with recursive routing can be avoided by configuring appropriate static routes to the tunnel destination. A recursive route is when the best path to the "tunnel destination" is through the tunnel itself. This situation will cause the tunnel interface to bounce up and down. You will see the following error when there is a recursive routing problem.
```
%TUN-RECURDOWN Interface Tunnel 0
temporarily disabled due to recursive routing
```

The Router as a PMTUD Participant at the Endpoint of a Tunnel

The router has two different PMTUD roles to play when it is the endpoint of a tunnel.

In the first role the router is the forwarder of a host packet. For PMTUD processing, the router needs to check the DF bit and packet size of the original data packet and take appropriate action when necessary.

The second role comes into play after the router has encapsulated the original IP packet inside the tunnel packet. At this stage, the router is acting more like a host with respect to PMTUD and in regards to the tunnel IP packet.

Lets start by looking at what happens when the router is acting in the first role, a router forwarding host IP packets, with respect to PMTUD. This role comes into play before the router encapsulates the host IP packet inside the tunnel packet.

If the router participates as the forwarder of a host packet it will do the following:

Check whether the DF bit is set.

Check what size packet the tunnel can accommodate.

Fragment (if packet is too large and DF bit is not set), encapsulate fragments and send; or

Drop the packet (if packet is too large and DF bit is set) and send an ICMP message to the sender.

Encapsulate (if packet is not too large) and send.

Generically, there is a choice of encapsulation and then fragmentation (sending two encapsulation fragments) or fragmentation and then encapsulation (sending two encapsulated fragments).

Below are some examples that describe the mechanics of IP packet encapsulation and fragmentation and two scenarios that show the interaction of PMTUD and packets traversing example networks.

The first example below shows what happens to a packet when the router (at the tunnel source) is acting in the role of forwarding router. Remember that for PMTUD processing, the router needs to check the DF bit and packet size of the original data packet and take appropriate action. This examples uses GRE encapsulation for the tunnel. As can be seen below, GRE does fragmentation before encapsulation. Later examples show scenarios in which fragmentation is done after encapsulation.

In Example 1 , the DF bit is not set (DF = 0) and the GRE tunnel IP MTU is 1476 (1500 - 24).

Example 1

The forwarding router (at the tunnel source) receives a 1500-byte datagram with the DF bit clear (DF = 0) from the sending host. This datagram is composed of a 20-byte IP header plus a 1480 byte TCP payload.

IP

1480 bytes TCP + data

Because the packet will be too large for the IP MTU after the GRE overhead (24 bytes) is added, the forwarding router breaks the datagram into two fragments of 1476 (20 bytes IP header + 1456 bytes IP payload) and 44 bytes (20 bytes of IP header + 24 bytes of IP payload) so after the GRE encapsulation is added, the packet will not be larger than the outgoing physical interface MTU.

IP₀

1456 bytes TCP + data

IP₁

24 bytes data

The forwarding router adds GRE encapsulation, which includes a 4-byte GRE header plus a 20-byte IP header, to each fragment of the original IP datagram. These two IP datagrams now have a length of 1500 and 68 bytes, and these datagrams are seen as individual IP datagrams not as fragments.

IP

GRE

IP₀

1456 bytes TCP + data

IP

GRE

IP₁

24 bytes data

The tunnel destination router removes the GRE encapsulation from each fragment of the original datagram leaving two IP fragments of lengths 1476 and 24 bytes. These IP datagram fragments will be forwarded separately by this router to the receiving host.

IP₀

1456 bytes TCP + data

IP₁

24 bytes data

The receiving host will reassemble these two fragments into the original datagram.

IP

1480 bytes TCP + data

Scenario 5 depicts the role of the forwarding router in the context of a network topology.

In the following example, the router is acting in the same role of forwarding router but this time the DF bit is set (DF = 1).

Example 2

The forwarding router at the tunnel source receives a 1500-byte datagram with DF = 1 from the sending host.

IP

1480 bytes TCP + data

Since the DF bit is set, and the datagram size (1500 bytes) is greater than the GRE tunnel IP MTU (1476), the router will drop the datagram and send an "ICMP fragmentation needed but DF bit set" message to the source of the datagram. The ICMP message will alert the sender that the MTU is 1476.

IP

ICMP MTU 1476

The sending host receives the ICMP message, and when it resends the original data, it will use a 1476-byte IP datagram.

IP

1456 bytes TCP + data

This IP datagram length (1476 bytes) is now equal in value to the GRE tunnel IP MTU so the router adds the GRE encapsulation to the IP datagram.

IP

GRE

IP

1456 bytes TCP + data

The receiving router (at the tunnel destination) removes the GRE encapsulation of the IP datagram and sends it to the receiving host.

IP

1456 bytes TCP + data

Now we can look at what happens when the router is acting in the second role as a sending host with respect to PMTUD and in regards to the tunnel IP packet. Recall that this role comes into play after the router has encapsulated the original IP packet inside the tunnel packet.

Note: By default a router doesn't do PMTUD on the GRE tunnel packets that it generates. The tunnel path-mtu-discovery command can be used to turn on PMTUD for GRE-IP tunnel packets.

Below is an example of what happens when the host is sending IP datagrams that are small enough to fit within the IP MTU on the GRE Tunnel interface. The DF bit in this case can be either set or clear (1 or 0). The GRE tunnel interface does not have the tunnel path-mtu-discovery command configured so the router will not be doing PMTUD on the GRE-IP packet.

Example 3

The forwarding router at the tunnel source receives a 1476-byte datagram from the sending host.

IP

1456 bytes TCP + data

This router encapsulates the 1476-byte IP datagram inside GRE to get a 1500-byte GRE IP datagram. The DF bit in the GRE IP header will be clear (DF = 0). This router then forwards this packet to the tunnel destination.

IP

GRE

IP

1456 bytes TCP + data

Assume there is a router between the tunnel source and destination with a link MTU of 1400. This router will fragment the tunnel packet since the DF bit is clear (DF = 0). Remember that this example fragments the outermost IP, so the GRE, inner IP, and TCP headers will only show up in the first fragment.

IP₀

GRE

IP

1352 bytes TCP + data

IP₁

104 bytes data

The tunnel destination router must reassemble the GRE tunnel packet.

IP

GRE

IP

1456 bytes TCP + data

After the GRE tunnel packet is reassembled, the router removes the GRE IP header and sends the original IP datagram on its way.

IP

1456 Bytes TCP + data

The next example shows what happens when the router is acting in the role of a sending host with respect to PMTUD and in regards to the tunnel IP packet. This time the DF bit is set (DF = 1) in the original IP header and we have configured the tunnel path-mtu-discovery command so that the DF bit will be copied from the inner IP header to the outer (GRE + IP) header.

Example 4

The forwarding router at the tunnel source receives a 1476-byte datagram with DF = 1 from the sending host.

IP

1456 bytes TCP + data

This router encapsulates the 1476-byte IP datagram inside GRE to get a 1500-byte GRE IP datagram. This GRE IP header will have the DF bit set (DF = 1) since the original IP datagram had the DF bit set. This router then forwards this packet to the tunnel destination.

IP

GRE

IP

1456 bytes TCP

Again, assume there is a router between the tunnel source and destination with a link MTU of 1400. This router will not fragment the tunnel packet since the DF bit is set (DF = 1). This router must drop the packet and send an ICMP error message to the tunnel source router, since that is the source IP address on the packet.

IP

ICMP MTU 1400

The forwarding router at the tunnel source receives this ICMP error message and it will lower the GRE tunnel IP MTU to 1376 (1400 - 24). The next time the sending host retransmits the data in a 1476-byte IP packet, this packet will be too large and this router will send an ICMP error message to the sender with a MTU value of 1376. When the sending host retransmits the data, it will send it in a 1376-byte IP packet and this packet will make it through the GRE tunnel to the receiving host.

Scenario 5

This scenario illustrates GRE fragmentation. Remember that you fragment before encapsulation for GRE, then do PMTUD for the data packet, and the DF bit is not copied when the IP packet is encapsulated by GRE. In this scenario, the DF bit is not set. The GRE tunnel interface IP MTU is, by default, 24 bytes less than the physical interface IP MTU, so the GRE interface IP MTU is 1476.

The the sender sends a 1500-byte packet (20 byte IP header + 1480 bytes of TCP payload).

Since the MTU of the GRE tunnel is 1476, the 1500-byte packet is broken into two IP fragments of 1476 and 44 bytes, each in anticipation of the additional 24 byes of GRE header.

The 24 bytes of GRE header is added to each IP fragment. Now the fragments are 1500 (1476 + 24) and 68 (44 + 24) bytes each.

The GRE + IP packets containing the two IP fragments are forwarded to the GRE tunnel peer router.

The GRE tunnel peer router removes the GRE headers from the two packets.

This router forwards the two packets to the destination host.

The destination host reassembles the IP fragments back into the original IP datagram.

Scenario 6

This is scenario a similar to Scenario 5, but this time the DF bit is set. In Scenario 6, the router is configured to do PMTUD on GRE + IP tunnel packets with the tunnel path-mtu-discovery command, and the DF bit is copied from the original IP header to the GRE IP header. If the router receives an ICMP error for the GRE + IP packet, it reduces the IP MTU on the GRE tunnel interface. Again, remember that the GRE Tunnel IP MTU is set to 24 bytes less than the physical interface MTU by default, so the GRE IP MTU here is 1476. Also notice that there is a 1400 MTU link in the GRE tunnel path.

The router receives a 1500-byte packet (20 byte IP header + 1480 TCP payload), and it drops the packet. The router drops the packet because it is larger then the IP MTU (1476) on the GRE tunnel interface.

The router sends an ICMP error to the sender telling it that the next-hop MTU is 1476. The host will record this information, usually as a host route for the destination in its routing table.

The sending host uses a 1476-byte packet size when it resends the data. The GRE router adds 24 bytes of GRE encapsulation and ships out a 1500-byte packet.

The 1500-byte packet cannot traverse the 1400-byte link, so it is dropped by the intermediate router.

The intermediate router sends an ICMP (code = 3, type = 4) to the GRE router with a next-hop MTU of 1400. The GRE router reduces this to 1376 (1400 - 24) and sets an internal IP MTU value on the GRE interface. This change can only be seen when using the debug tunnel command; it cannot be seen in the output from the show ip interface tunnel<#> command.

The next time the host resends the 1476-byte packet, the GRE router will drop the packet, since it is larger then the current IP MTU (1376) on the GRE tunnel interface.

The GRE router will send another ICMP (code = 3, type = 4) to the sender with a next-hop MTU of 1376 and the host will update its current information with new value.

The host again resends the data, but now in a smaller 1376-byte packet, GRE will add 24 bytes of encapsulation and forward it on. This time the packet will make it to the GRE tunnel peer, where the packet will be de-capsulated and sent to the destination host.

Note: If the tunnel path-mtu-discovery command was not configured on the forwarding router in this scenario, and the DF bit was set in the packets forwarded through the GRE tunnel, Host 1 would still succeed in sending TCP/IP packets to Host 2, but they would get fragmented in the middle at the 1400 MTU link. Also the GRE tunnel peer would have to reassemble them before it could decapsulate and forward them on.

"Pure" IPsec Tunnel Mode

The IP Security (IPsec) Protocol is a standards-based method of providing privacy, integrity, and authenticity to information transferred across IP networks. IPsec provides IP network-layer encryption. IPsec lengthens the IP packet by adding at least one IP header (tunnel mode). The added header(s) varies in length depending the IPsec configuration mode but they do not exceed ~58 bytes (Encapsulating Security Payload (ESP) and ESP authentication (ESPauth)) per packet.

IPsec has two modes, tunnel mode and transport mode.

Tunnel mode is the default mode. With tunnel mode, the entire original IP packet is protected (encrypted, authenticated, or both) and encapsulated by the IPsec headers and trailers. Then a new IP header is prepended to the packet, specifying the IPsec endpoints (peers) as the source and destination. Tunnel mode can be used with any unicast IP traffic and must be used if IPsec is protecting traffic from hosts behind the IPsec peers. For example, tunnel mode is used with Virtual Private Networks (VPNs) where hosts on one protected network send packets to hosts on a different protected network via a pair of IPsec peers. With VPNs, the IPsec "tunnel" protects the IP traffic between hosts by encrypting this traffic between the IPsec peer routers.

With transport mode (configured with the subcommand, mode transport, on the transform definition), only the payload of the original IP packet is protected (encrypted, authenticated, or both). The payload is encapsulated by the IPsec headers and trailers. The original IP headers remain intact, except that the IP protocol field is changed to be ESP (50), and the original protocol value is saved in the IPsec trailer to be restored when the packet is decrypted. Transport mode is used only when the IP traffic to be protected is between the IPsec peers themselves, the source and destination IP addresses on the packet are the same as the IPsec peer addresses. Normally IPsec transport mode is only used when another tunneling protocol (like GRE) is used to first encapsulate the IP data packet, then IPsec is used to protect the GRE tunnel packets.

IPsec always does PMTUD for data packets and for its own packets. There are IPsec configuration commands to modify PMTUD processing for the IPsec IP packet, IPsec can clear, set, or copy the DF bit from the data packet IP header to the IPsec IP header. This is called the "DF Bit Override Functionality" feature.

Note: You really want to avoid fragmentation after encapsulation when you do hardware encryption with IPsec. Hardware encryption can give you throughput of about 50 Mbs depending on the hardware, but if the IPsec packet is fragmented you loose 50 to 90 percent of the throughput. This loss is because the fragmented IPsec packets are process-switched for reassembly and then handed to the Hardware encryption engine for decryption. This loss of throughput can bring hardware encryption throughput down to the performance level of software encryption (2-10 Mbs).

Scenario 7

This scenario depicts IPsec fragmentation in action. In this scenario, the MTU along the entire path is 1500. In this scenario, the DF bit is not set.

The router receives a 1500-byte packet (20-byte IP header + 1480 bytes TCP payload) destined for Host 2.

The 1500-byte packet is encrypted by IPsec and 52 bytes of overhead are added (IPsec header, trailer, and additional IP header). Now IPsec needs to send a 1552-byte packet. Since the outbound MTU is 1500, this packet will have to be fragmented.

Two fragments are created out of the IPsec packet. During fragmentation, an additional 20-byte IP header is added for the second fragment, resulting in a 1500-byte fragment and a 72-byte IP fragment.

The IPsec tunnel peer router receives the fragments, strips off the additional IP header and coalesces the IP fragments back into the original IPsec packet. Then IPsec decrypts this packet.

The router then forwards the original 1500-byte data packet to Host 2.

Scenario 8

This scenario is similar to Scenario 6 except that in this case the DF bit is set in the original data packet and there is a link in the path between the IPsec tunnel peers that has a lower MTU than the other links. This scenario demonstrates how the IPsec peer router performs both PMTUD roles, as described in the The Router as a PMTUD Participant at the Endpoint of a Tunnel section.

You will see in this scenario how the IPsec PMTU changes to a lower value as the result of the need for fragmentation. Remember that the DF bit is copied from the inner IP header to the outer IP header when IPsec encrypts a packet. The media MTU and PMTU values are stored in the IPsec Security Association (SA). The media MTU is based on the MTU of the outbound router interface and the PMTU is based on the minimum MTU seen on the path between the IPsec peers. Remember that IPsec encapsulates/encrypts the packet before it attempts to fragment it.

The router receives a 1500-byte packet and drops it because the IPsec overhead, when added, will make the packet larger then the PMTU (1500).

The router sends an ICMP message to Host 1 telling it that the next-hop MTU is 1442 (1500 - 58 = 1442). This 58 bytes is the maximum IPsec overhead when using IPsec ESP and ESPauth. The real IPsec overhead may be as much as 7 bytes less then this value. Host 1 records this information, usually as a host route for the destination (Host 2), in its routing table.

Host 1 lowers its PMTU for Host 2 to 1442, so Host 1 will send smaller (1442 byte) packets when it retransmits the data to Host 2. The router receives the 1442-byte packet and IPsec adds 52 bytes of encryption overhead so the resulting IPsec packet is 1496 bytes. Because this packet has the DF bit set in its header it gets dropped by the middle router with the 1400-byte MTU link.

The middle router that dropped the packet sends an ICMP message to the sender of the IPsec packet (the first router) telling it that the next-hop MTU is 1400 bytes. This value is recorded in the IPsec SA PMTU.

The next time Host 1 retransmits the 1442-byte packet (it didn't receive an acknowledgment for it), the IPsec will drop the packet. Again the router will drop the packet because the IPsec overhead, when added to the packet, will make it larger then the PMTU (1400).

The router sends an ICMP message to Host 1 telling it that the next-hop MTU is now 1342. (1400 - 58 = 1342). Host 1 will again record this information.

When Host 1 again retransmits the data, it will use the smaller size packet (1342). This packet will not require fragmentation and will make it through the IPsec tunnel to Host 2.

GRE and IPsec Together

More complex interactions for fragmentation and PMTUD occur when IPsec is used to encrypt GRE tunnels. IPsec and GRE are combined in this manner because IPsec doesn't support IP multicast packets, which means that you cannot run a dynamic routing protocol over the IPsec VPN Network. GRE tunnels do support multicast, so a GRE tunnel can be used to first encapsulate the dynamic routing protocol multicast packet in a GRE IP unicast packet, that can then be encrypted by IPsec. When doing this, IPsec is often deployed in transport mode on top of GRE because the IPsec peers and the GRE tunnel endpoints (the routers) are the same, and transport-mode will save 20 bytes of IPsec overhead.

One interesting case is when an IP packet has been split into two fragments and encapsulated by GRE. In this case IPsec will see two independent GRE + IP packets. Often in a default configuration one of these packets will be large enough that it will need to be fragmented after it has been encrypted. The IPsec peer will have to reassemble this packet before decryption. This "double fragmentation" (once before GRE and again after IPsec) on the sending router increases latency and lowers throughput. Also, reassembly is process-switched, so there will be a CPU hit on the receiving router whenever this happens.

This situation can be avoided by setting the "ip mtu" on the GRE tunnel interface low enough to take into account the overhead from both GRE and IPsec (by default the GRE tunnel interface "ip mtu" is set to the outgoing real interface MTU - GRE overhead bytes).

The following table lists the suggested MTU values for each tunnel/mode combination assuming the outgoing physical interface has an MTU of 1500.

Tunnel Combination

Specific MTU Needed

Recommended MTU

GRE + IPsec (Transport mode)

1440 bytes

1400 bytes

GRE + IPsec (Tunnel mode)

1420 bytes

1400 bytes

Note: The MTU value of 1400 is recommended because it covers the most common GRE + IPsec mode combinations. Also, there is no discernable downside to allowing for an extra 20 or 40 bytes overhead. It is easier to remember and set one value and this value covers almost all scenarios.

Scenario 9

IPsec is deployed on top of GRE. The outgoing physical MTU is 1500, the IPsec PMTU is 1500, and the GRE IP MTU is 1476 (1500 - 24 = 1476). Because of this, TCP/IP packets will be fragmented twice, once before GRE and once after IPsec. The packet will be fragmented before GRE encapsulation and one of these GRE packets will be fragmented again after IPsec encryption.

Configuring "ip mtu 1440" (IPsec Transport mode) or "ip mtu 1420" (IPsec Tunnel mode) on the GRE tunnel would remove the possibility of double fragmentation in this scenario.

The router receives a 1500-byte datagram.

Before encapsulation, GRE fragments the 1500-byte packet into two pieces, 1476 (1500 - 24 = 1476) and 44 (24 data + 20 IP header) bytes.

GRE encapsulates the IP fragments, which adds 24 bytes to each packet. This results in two GRE + IPsec packets of 1500 (1476 + 24 = 1500) and 68 (44 + 24) bytes each.

IPsec encrypts the two packets, adding 52 byes (IPsec tunnel-mode) of encapsulation overhead to each, to give a 1552-byte and a 120-byte packet.

The 1552-byte IPsec packet is fragmented by the router because it is larger then the outbound MTU (1500). The 1552-byte packet is split into pieces, a 1500-byte packet and a 72-byte packet (52 bytes "payload" plus an additional 20-byte IP header for the second fragment). The three packets 1500-byte, 72-byte, and 120-byte packets are forwarded to the IPsec + GRE peer.

The receiving router reassembles the two IPsec fragments (1500 bytes and 72 bytes) to get the original 1552-byte IPsec + GRE packet. Nothing needs to be done to the 120-byte IPsec + GRE packet.

IPsec decrypts both 1552-byte and 120-byte IPsec + GRE packets to get 1500-byte and 68-byte GRE packets.

GRE decapsulates the 1500-byte and 68-byte GRE packets to get 1476-byte and 44-byte IP packet fragments. These IP packet fragments are forwarded to the destination host.

Host 2 reassembles these IP fragments to get the original 1500-byte IP datagram.

Scenario 10 is similar to Scenario 8 except there is a lower MTU link in the tunnel path. This is a "worst case" scenario for the first packet sent from Host 1 to Host 2. After the last step in this scenario, Host 1 sets the correct PMTU for Host 2 and all is well for the TCP connections between Host 1 and Host 2. TCP flows between Host 1 and other hosts (reachable via the IPsec + GRE tunnel) will only have to go through the last three steps of Scenario 10.

In this scenario, the tunnel path-mtu-discovery command is configured on the GRE tunnel and the DF bit is set on TCP/IP packets originating from Host 1.

Scenario 10

The router receives a 1500-byte packet. This packet is dropped by GRE because GRE cannot fragment or forward the packet because the DF bit is set, and the packet size exceeds the outbound interface "ip mtu" after adding the GRE overhead (24 bytes).

The router sends an ICMP message to Host 1 letting it know that the next-hop MTU is 1476 (1500 - 24 = 1476).

Host 1 changes its PMTU for Host 2 to 1476 and sends the smaller size when it retransmits the packet. GRE encapsulates it and hands the 1500-byte packet to IPsec. IPsec drops the packet because GRE has copied the DF bit (set) from the inner IP header, and with the IPsec overhead (maximum 38 bytes), the packet is too large to forward out the physical interface.

IPsec sends an ICMP message to GRE indicating that the next-hop MTU is 1462 bytes (since a maximum 38 bytes will be added for encryption and IP overhead). GRE records the value 1438 (1462 - 24) as the "ip mtu" on the tunnel interface.

Note: This change in value is stored internally and cannot be seen in the output of the show ip interface tunnel<#> command. You will only see this change if you turn use the debug tunnel command.

The next time Host 1 retransmits the 1476-byte packet, GRE drops it.

The router sends an ICMP message to Host 1 indicating that 1438 is the next-hop MTU.

Host 1 lowers the PMTU for Host 2 and retransmits a 1438-byte packet. This time, GRE accepts the packet, encapsulates it, and hands it off to IPsec for encryption. The IPsec packet is forwarded to the intermediate router and dropped because it has an outbound interface MTU of 1400.

The intermediate router sends an ICMP message to IPsec telling it that the next-hop MTU is 1400. This value is recorded by IPsec in the PMTU value of the associated IPsec SA.

When Host 1 retransmits the 1438-byte packet, GRE encapsulates it and hands it to IPsec. IPsec drops the packet because it has changed its own PMTU to 1400.

IPsec sends an ICMP error to GRE indicating that the next-hop MTU is 1362, and GRE records the value 1338 internally.

When Host 1 retransmits the original packet (because it did not receive an acknowledgment), GRE drops it.

The router sends an ICMP message to Host 1 indicating the next-hop MTU is 1338 (1362 - 24 bytes). Host 1 lowers its PMTU for Host 2 to 1338.

Host 1 retransmits a 1338-byte packet and this time it can finally get all the way through to Host 2.

More Recommendations

Configuring the tunnel path-mtu-discovery command on a tunnel interface can help GRE and IPsec interaction when they are configured on the same router. Remember that without the tunnel path-mtu-discovery command configured, the DF bit would always be cleared in the GRE IP header. This allows the GRE IP packet to be fragmented even though the encapsulated data IP header had the DF bit set, which normally wouldn't allow the packet to be fragmented.

If the tunnel path-mtu-discovery command is configured on the GRE tunnel interface, the following will happen.

GRE will copy the DF bit from the data IP header to the GRE IP header.

If the DF bit is set in the GRE IP header and the packet will be "too large" after IPsec encryption for the IP MTU on the physical outgoing interface, then IPsec will drop the packet and notify the GRE tunnel to reduce its IP MTU size.

IPsec does PMTUD for its own packets and if the IPsec PMTU changes (if it is reduced), then IPsec doesn't immediately notify GRE, but when another "too large" packet comes thorough, then the process in step 2 occurs.

GRE's IP MTU is now smaller, so it will drop any data IP packets with the DF bit set that are now too large and send an ICMP message to the sending host.

The tunnel path-mtu-discovery command helps the GRE interface set its IP MTU dynamically, rather than statically with the ip mtu command. It is actually recommended that both commands are used. The ip mtu command is used to provide room for the GRE and IPsec overhead relative to the local physical outgoing interface IP MTU. The tunnel path-mtu-discovery command allows the GRE tunnel IP MTU to be further reduced if there is a lower IP MTU link in the path between the IPsec peers.

Below are some of the things you can do if you are having problems with PMTUD in a network where there are GRE + IPsec tunnels configured.

The following list begins with the most desirable solution.

Fix the problem with PMTUD not working, which is usually caused by a router or firewall blocking ICMP.

Use the ip tcp adjust-mss command on the tunnel interfaces so that the router will reduce the TCP MSS value in the TCP SYN packet. This will help the two end hosts (the TCP sender and receiver) to use packets small enough so that PMTUD is not needed.

Use policy routing on the ingress interface of the router and configure a route map to clear the DF bit in the data IP header before it gets to the GRE tunnel interface. This will allow the data IP packet to be fragmented before GRE encapsulation.

Increase the "ip mtu" on the GRE tunnel interface to be equal to the outbound interface MTU. This will allow the data IP packet to be GRE encapsulated without fragmenting it first. The GRE packet will then be IPsec encrypted and then fragmented to go out the physical outbound interface. In this case you would not configure tunnel path-mtu-discovery command on the GRE tunnel interface. This can dramatically reduce the throughput because IP packet reassembly on the IPsec peer is done in process-switching mode.

Reference:

http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml

http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/IPSec_Over.html

訂閱：文章 (Atom)

2010年12月25日 星期六

2010年10月19日 星期二

2010年10月6日 星期三