270 likes | 292 Views
Surachai CHITPINITYON Kasom KOHT-ARSA Surasak SANGUANPONG Anan PHONPHOEM Pirawat WATANAPONGSE Chalermpol CHUPAMPUN Office of Computer Services Kasetsart University E-mail: Surachai.Ch@ku.ac.th. Design and Implementation of Large Scale URL Filtering.
E N D
Surachai CHITPINITYON Kasom KOHT-ARSA Surasak SANGUANPONG Anan PHONPHOEM Pirawat WATANAPONGSE Chalermpol CHUPAMPUN Office of Computer Services Kasetsart University E-mail: Surachai.Ch@ku.ac.th Design and Implementation of Large Scale URL Filtering APAN, Xi’an, Network Security, 29th August 2007 This work is partially supported by Commission of Higher Education (CHE), UniNET, Thailand
Why Need URL Filtering? Filtering Techniques TCP Revisited Proposed Solution Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
Why Need URL Filtering? Filtering Techniques TCP Revisited Proposed Solution Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
Why Need URL Filtering? • Access Policy Enforcement • Parental Control • Other restricted website by Policy • Suspected Harmful Website (on-demand filtering) • Spyware, Phishing • Embedded Scripting Websites intend to attack OS/Software Vulnerabilities
Why Need URL Filtering? Filtering Techniques TCP Revisited Proposed Solution Performance Facts Current Deployment Agenda
? ? Gateway Internet 1 2 Client 4 Allow Block Unknown 3 Filtering Engine Pass-Through Web Filtering • Traffics must pass through the filtering engine (Firewall, Proxy, Application Gateway) • Create a queue of processing with delay • Delay is depend on traffic volume and machine performance
? ? Gateway Internet 1 2 2 Client 3 Filtering Engine Pass-by Web Filtering • Traffics are captured and passed by without queuing • Zero delay, independent from traffic volume • Ease of Installation (No Traffic Interruption) • Non Blocking Traffic Stream • No Single Point of Failure • Scalable
Why Need URL Filtering? Filtering Techniques TCP Revisited Proposed Solution Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
SYN J SYN K , ACK J+1 ACK K+1 Data (request) Data (reply) TCP Connection Establishment & Data Transfer Client Server SYN_SENT SYN_RCVD ESTABLISED ESTABLISED
FINL ACK L+1 FIN M ACKM+1 TCP Connection Termination Client Server FIN_WAIT_1 CLOSE_WAIT FIN_WAIT_2 LAST_ACK TIME_WAIT CLOSED
Filtering SYN J SYN K , ACK J+1 ACK K+1 Data (request) FINL Data (reply) TCP Session Hijacking Client Server Faked FIN by Filtering Engine Packet will be ignored
Why Need URL Filtering? Filtering Techniques TCP Revisited Proposed Solution Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
Pass by method incorporated with 2 techniques Session Hijacking Fast Sequence Number Interception Keywords Capturing in Application Request Packet URL Processing Designed to Handle Hundred Million of URLs list Very fast access to URLs repository Proposed Solution
Data (request) FINL FINL Data (reply) ACK L+1 FIN M ACK M+1 Session Hijacking Filtering Client Server Successful filtering Faked FIN ignored Faked FIN Unsuccessful filtering
? ? GET GET GET 1 Gateway Internet 2 Client 2 4 Black Lists search Filtering Engine FIN FIN 5 GET 3 Keyword Capturing GET/PUT/POST Matching
Key design URL Compression Techniques In-Memory Balanced Tree of URLs Utilize KSpider’s Core Architecture (URL Manager Module) Benefits 69% Averaged Compression Ratio of URLs Length (currently supported Max 268 Millions URLs List under 8 GB RAM) Almost Linear Access Speed (10microseconds by averaged URL Management Technique
Online indexer Other processing URL Processor Storage Manager URL Extractor Data Streamer Data Decompressor Storage Data Compressor Stats Collector URL Filter URL Manager Data Collector URL Buffer Queue HTTP Data Collector Parallel DNS URL Buffer Queue Scheduler URL Storage Manager Communicator To Communicator Cluster Communicator URL Buffer Queue In-memory Storage On Disk Scheduler KSpider’s Architecture
Webscreen List 0 http://www.lovely.com/ http://www.lovely.com http://www.lion.com http://www.lovely12.com http://www.lovely11.net http://www.lower13.net 4 1 3 18 18 12 ion.com 1.net 3.net 2 17 12.com URL Compression Technique Prefix Balance Search Tree
Why Need URL Filtering? Filtering Techniques TCP Revisited Core Technology Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
Avg. Search Time Test Record 10 µsec(350 µsec MAX with 268 Million URLs) 268 Million URLs with 8 GB Hijack Activation under0.6msec Memory Requirement 34MURL/GB Performance • 69% compression ratio with average 26.5 bytes per URL • Performance collected under Dell 2900, Intel Xeon 5160(3Ghz)
Why Need URL Filtering? Filtering Techniques TCP Revisited Core Technology Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
Reference Site • Operations since December 2005 Inter. GW Inter. GW 8 gigabit links span to 8 gigabit interfaces in 4 machine 3 Gbps 2 Gbps Multiple Links/Interfaces EtherChannel 2 Gbps Ethernet 1 Gbps WebScreen Agent CPU : 2xDual Core Opteron 2.4 GhzRAM : 8 GBHD : SAS 146 GB CAT Telecom
4.6 Gbps aggregated traffic • 1.6 M packet/s incoming packets • 64 K packet/s http request packets Collected Statistics • Avg. 110 request/s Dropping rate (9.5 M per day) • Peak 250 request/s Dropping rate
Why Need URL Filtering? Filtering Techniques TCP Revisited Core Technology Performance Facts Current Deployment Scalability Planning for 10Gbps Agenda
Solutions for 10 Gbps Link Deploy Traffic Distribution Device (1x10 Gbps to 10x1 Gbps) Currently on the test of GigaVUE Typical servers can handle up to 800 Mbps bit rate per 1 Gbps interface UNINET THAISARN Scalability Planning for 10Gbps 1G 10G 1G 10G 10G 10G Mirror port Mirror port GigaVUE1 GigaVUE2 10G 10G 10G 10G 10G 1G 1G LAN 1G 1G