560 likes | 716 Views
P2P-SIP Peer to peer Internet telephony using SIP. Kundan Singh and Henning Schulzrinne Columbia University, New York April 2005 http://www.cs.columbia.edu/IRT/p2p-sip. Introduction What is P2P? and SIP? Why P2P-SIP? Architecture SIP using P2P vs P2P over SIP; Components that can be P2P
E N D
P2P-SIPPeer to peer Internet telephony using SIP Kundan Singh and Henning Schulzrinne Columbia University, New York April 2005 http://www.cs.columbia.edu/IRT/p2p-sip
Introduction What is P2P? and SIP? Why P2P-SIP? Architecture SIP using P2P vs P2P over SIP; Components that can be P2P Implementation Choice of P2P (DHT); Node join, leave; message routing Conclusions and future work Agenda Total 33 slides
Communication and collaboration Computer systems Magi Groove Skype Centralized Distributed mainframes workstations Peer-to-peer Client-server Napster Gnutella Kazaa Freenet Overnet C C P P Flat Hierarchical Pure Hybrid RPC HTTP DNS mount Gnutella Chord S Napster Groove File sharing Kazaa C C P P SETI@Home folding@Home C P Distributed computing What is P2P? • Share the resources of individual peers • CPU, disk, bandwidth, information, …
P2P goals • Resource aggregation - CPU, disk, … • Cost sharing/reduction • Improved scalability/reliability • Interoperability - heterogeneous peers • Increased autonomy at the network edge • Anonymity/privacy • Dynamic (join, leave), self organizing • Ad hoc communication and collaboration
P2P file sharing • Napster • Centralized, sophisticated search • Gnutella • Flooding, TTL, unreachable nodes • FastTrack (KaZaA) • Heterogeneous peers • Freenet • Anonymity, caching, replication
P2P goals [re-visited] • If present => find it • Flooding is not scalable • Blind search is inefficient P2P systems • Query time, number of messages, network usage, per node state Structured Unstructured • Efficient searching • Proximity • Locality • Data availability • Decentralization • Scalability • Load balancing • Fault tolerance • Maintenance • Join/leave • Repair
Distributed Hash Table (DHT) • Types of search • Central index (Napster) • Distributed index with flooding (Gnutella) • Distributed index with hashing (Chord) • Basic operations find(key), insert(key, value), delete(key), but no search(*)
REGISTER INVITE alice P2P overlay Alice 128.59.19.194 128.59.19.194 No central server, search latency Why P2P-SIP? REGISTER alice@columbia.edu =>128.59.19.194 INVITE alice@columbia.edu Contact: 128.59.19.194 Alice’s host 128.59.19.194 Bob’s host columbia.edu Client-server=> maintenance, configuration, controlled infrastructure
SIP-using-P2P Replace SIP location service by a P2P protocol P2P-over-SIP Additionally, implement P2P using SIP messaging How to combine SIP + P2P? P2P network REGISTER INVITE alice FIND INSERT P2P-SIP overlay Alice 128.59.19.194 INVITE sip:alice@128.59.19.194 Alice 128.59.19.194
SIP-using-P2P • Reuse optimized and well-defined external P2P network • Define P2P location service interface to be used in SIP • Extends to other signaling protocols
P2P-over-SIP • P2P algorithm over SIP without change in semantics • No dependence on external P2P network • Reuse and interoperate with existing components, e.g., voicemail • Built-in NAT/media relays • Message overhead
What else can be P2P? • Rendezvous/signaling • Configuration storage • Media storage • Identity assertion (?) • Gateway (?) • NAT/media relay (find best one)
What is our P2P-SIP? • Unlike server-based SIP architecture • Unlike proprietary Skype architecture • Robust and efficient lookup using DHT • Interoperability • DHT algorithm uses SIP communication • Hybrid architecture • Lookup in SIP+P2P • Unlike file-sharing applications • Data storage, caching, delay, reliability • Disadvantages • Lookup delay and security
0 1 2 3 4 5 6 7 8 Background: DHT (Chord) • Identifier circle • Keys assigned to successor • Evenly distributed keys and nodes • Finger table: logN • ith finger points to first node that succeeds n by at least 2i-1 • Stabilization for join/leave 1 54 8 58 10 14 47 21 42 38 32 38 24 30
Background: DHT (Chord) • Find • Map key to node • Join, Leave, or Failure • Update the immediate neighbors • Successor and predecessor • Stabilize: eventually propagate the info • Reliability • Log(N) successors; data replication
d471f1 1 d467c4 d46a1c 8 d462ba 58 54 d4213f 14 10 47 21 Route(d46a1c) d13da3 42 38 32 65a1fc 38 24 30 Design Alternatives servers 1 54 10 38 24 30 clients Use DHT in server farm Use DHT for all clients; But some are resource limited Use DHT among super-nodes Hierarchy Dynamically adapt
Discover DHT (Chord) User location Audio devices User interface (buddy list, etc.) ICE RTP/RTCP Codecs SIP Architecture Signup, Find buddies IM, call On reset Signout, transfer On startup Leave Find Join REGISTER, INVITE, MESSAGE Peer found/ Detect NAT Multicast REGISTER REGISTER SIP-over-P2P P2P-using-SIP
Naming and authentication • SIP URI as node and user identifiers • Known node: sip:15@192.2.1.3 • Unknown node: sip:17@example.com • User: sip:alice@columbia.edu • User name is chosen randomly by the system, by the user, or as user’s email • Email the randomly generated password • TTL, security
SIP messages 1 • DHT (Chord) maintenance • Query the node at distance 2k with node id 11 REGISTER To: <sip:11@example.invalid> From: <sip:7@128.59.15.56> SIP/2.0 200 OK To: <sip:11@example.invalid> Contact: <sip:15@128.59.15.48>; predecessor=sip:10@128.59.15.55 • Update my neighbor about me REGISTER To: <sip:1@128.59.15.60> Contact: <sip:7@128.59.15.56>; predecessor=sip:1@128.59.15.60 10 22 7 15 Find(11) gives 15
SIP messages • User registration REGISTER To: sip:alice@columbia.edu Contact: sip:alice@128.59.19.194:8094 • Call setup and instant messaging INVITE sip:bob@example.com To: sip:bob@example.com From: sip:alice@columbia.edu
sipd DB Node Startup columbia.edu • SIP • REGISTER with SIP registrar • DHT • Discover peers: multicast REGISTER • SLP, bootstrap, host cache • Join DHT using node-key=Hash(ip) • Query its position in DHT • Update its neighbors • Stabilization: repeat periodically • User registers using user-key=Hash(alice@columbia.edu) REGISTER alice@columbia.edu Detect peers REGISTER alice=42 58 42 12 14 REGISTER bob=12 32
Node Leaves • Chord reliability • Log(N) successors, replicate keys • Graceful leave • Un-REGISTER • Transfer registrations • Failure • Attached nodes detect and re-REGISTER • New REGISTER goes to new super-nodes • Super-nodes adjust DHT accordingly REGISTER key=42 REGISTER OPTIONS DHT 42 42
Dialing Out (message routing) • Call, instant message, etc. INVITE sip:hgs10@columbia.edu MESSAGE sip:alice@yahoo.com • If existing buddy, use cache first • If not found • SIP-based lookup (DNS NAPTR, SRV,…) • P2P lookup • Use DHT to locate: proxy or redirect to next hop INVITE key=42 Last seen 302 INVITE DHT 42
1 30 26 9 19 11 Implementation 31 • sippeer: C++, Unix (Linux), Chord • Node join and form the DHT • Node failure is detected and DHT updated • Registrations transferred on node shutdown 29 31 25 26 15
Adaptor for existing phones • Use P2P-SIP node as an outbound proxy • ICE for NAT/firewall traversal • STUN/TURN server in the node
Hybrid architecture • Cross register, or • Locate during call setup • DNS, or • P2P-SIP hierarchy
Offline messages • INVITE or MESSAGE fails • Responsible node stores voicemail, instant message. • Delivered using MWI or when online detected • Replicate the message at redundant nodes • Sequence number prevents duplicates • Security: How to avoid spies? • How to recover if all responsible nodes leave?
Conferencing (further study) • One member becomes mixer • Centralized conferencing • What if mixer leaves? • Fully distributed • Many to many signaling and media • Application level multicast • Small number of senders
Evaluationscalability • #messages depends on • Keep-alive and finger table refresh rate • Call arrival distribution • User registration refresh interval • Node join, leave, failure rates M={rs+ rf(log(N))2} + c.log(N) + (k/t)log(N) + (log(N))2/N • #nodes = f(capacity,rates) • CPU, memory, bandwidth • Verify by measurement and profiling
Evaluationreliability and call setup latency • User availability depends on • Super-node failure distribution • Node keep-alive and finger refresh rate • User registration refresh rate • Replicate user registration • Measure effect of each • Call setup latency • Same as DHT lookup latency: O(log(N)) • Calls to known locations (“buddies”) is direct • DHT optimization can further reduce latency • User availability and retransmission timers • Measure effect of each
Explosive growth (further study) • Cache replacement at super-nodes • Last seen many days ago • Cap on local disk usage (automatic) • Forcing a node to become super node • Graceful denial of service if overloaded • Switching between flooding, CAN, Chord, … • . . .
More open issues (further study) • Security • Anonymity, encryption, • Attack/DOS-resistant, SPAM-resistant • Malicious node • Protecting voicemails from storage nodes • Optimization • Locality, proximity, media routing • Deployment • SIP-P2P vs P2P-SIP, Intra-net, ISP servers • Motivation • Why should I run as super-node?
d471f1 d467c4 d46a1c d462ba d4213f 763 427 C C P P S 364 123 Route(d46a1c) d13da3 324 C C P P 365 135 564 65a1fc C P Conclusions • P2P useful for VoIP • Scalable, reliable • No configuration • Not as fast as client/server • P2P-SIP • Basic operations easy • Implementation • sippeer: C++, Linux • Interoperates • Some potential issues • Security • Performance http://www.cs.columbia.edu/IRT/p2p-sip
Centralized index File names => active holder machines Sophisticated search Easy to implement Ensure correct search Centralized index Lawsuits Denial of service Can use server farms Napster P1 P5 S P2 P4 P2 Where is “quit playing games” ? FTP P3
Flooding Overlay network Decentralized Robust Not scalable. Use TTL. Query can fail Can not ensure correctness Gnutella P P P P P P P P P
Super-nodes Election: capacity bandwidth, storage, CPU and availability connection time public address Use heterogeneity of peers Inherently non-scalable If flooding is used KaZaA (FastTrack) P P P P P P P P P P P P
File is cached on reverse search path Anonymity Replication, cache Similar keys on same node Empirical log(N) lookup TTL limits search Only probabilistic guarantee Transaction state No remove( ) Use cache replacement FreeNet 2 P 1 P P 3 12 7 11 4 6 P 10 P 5 P 9 8 P
Distributed Hash Tables • Types of search • Central index (Napster) • Distributed index with flooding (Gnutella) • Distributed index with hashing (Chord) • Basic operations find(key), insert(key, value), delete(key), no search(*)
CANContent Addressable Network • Each key maps to one point in the d-dimensional space • Each node responsible for all the keys in its zone. • Divide the space into zones. 1.0 C D E B A 0.0 1.0 0.0 C D E B A
CAN 1.0 .75 .5 .25 0.0 E E A A B C B C X X Z D D (x,y) 0.0 .25 .5 .75 1.0 Node Z joins Node X locates (x,y)=(.3,.1) State = 2d Search = dxn1/d
0 1 2 3 4 5 6 7 8 Chord • Identifier circle • Keys assigned to successor • Evenly distributed keys and nodes 1 54 8 58 10 14 47 21 42 38 32 38 24 30
Chord • Finger table: logN • ith finger points to first node that succeeds n by at least 2i-1 • Stabilization after join/leave 1 54 8 58 10 14 47 21 42 38 32 38 24 30
Tapestry • ID with base B=2b • Route to numerically closest node to the given key • Routing table has O(B) columns. One per digit in node ID. • Similar to CIDR – but suffix-based 763 427 364 123 324 365 135 564 **4 => *64 => 364
Pastry • Prefix-based • Route to node with shared prefix (with the key) of ID at least one digit more than this node. • Neighbor set, leaf set and routing table. d471f1 d467c4 d46a1c d462ba d4213f Route(d46a1c) d13da3 65a1fc
Other schemes • Distributed TRIE • Viceroy • Kademlia • SkipGraph • Symphony • …
P P P P P P P P P P P P Related work: Skype From the KaZaA community • Host cache of some super nodes • Bootstrap IP addresses • Auto-detect NAT/firewall settings • STUN and TURN • Protocol among super nodes – ?? • Allows searching a user (e.g., kun*) • History of known buddies • All communication is encrypted • Promote to super node • Based on availability, capacity • Conferencing
Master Slave Master Slave Reliability and scalabilityTwo stage architecture for CINEMA a*@example.com a.example.com _sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com a1 s1 a2 sip:bob@example.com s2 sip:bob@b.example.com b*@example.com b.example.com _sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com s3 b1 b2 ex example.com _sip._udp SRV 0 40 s1.example.com SRV 0 40 s2.example.com SRV 0 20 s3.example.com SRV 1 0 ex.backup.com Request-rate = f(#stateless, #groups) Bottleneck: CPU, memory, bandwidth? Failover latency: ?