1 / 27

Commercial Applications of Multi-core at Ericsson

Commercial Applications of Multi-core at Ericsson. Hans Nilsson, Hans.R.Nilsson@ericsson.com. Ericsson. 1876 Customers: mobile and fixed network operators Over 1,000 networks in 140 countries 40 percent of all mobile calls are made through our systems 65507 emplyees 50% of .

blodwyn
Download Presentation

Commercial Applications of Multi-core at Ericsson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Commercial Applications of Multi-core at Ericsson Hans Nilsson, Hans.R.Nilsson@ericsson.com

  2. Ericsson • 1876 • Customers: mobile and fixed network operators • Over 1,000 networks in 140 countries • 40 percent of all mobile calls are made through our systems • 65507 emplyees • 50% of

  3. Mobile Phones are no fun anymore The original mobile phone, 1956

  4. Multimedia Service network IMS and communication enablers Multi Access Edge Network Transport network Fixed access Mobile access Users A new communications architecture Softswitch and IMS, first steps towards an all-IP network

  5. Modern Telecom 3G Phones IP-phones UMTS Communication clients on PCs SIP, RTP SIP, RTP H.323, RTP SIP, RTP SIP, RTP Telecom network ISUP, PCM GSM Lots of old phones ?

  6. Modern Telecom (cont’d) Huge data volumes Billing Many; Variationse.g. Java-basedframeworks Services Diameter SIP Complex FSMsHigh degree of concurrency SIP Session handling SIP ISUP H.248 RTP Media processing RTP Wire-speedNetwork processors,DSPs, ...

  7. Different Execution Platforms Highly scalable processingclusters (C++/Erlang/3PP) ... CPU Compact ”system on a blade”Tight space and power budget DP DSPs NP Subracks of dedicated blades(e.g. ”Integrated Site”, or IS) http://www.ericsson.com/ericsson/corpinfo/publications/review/2005_01/files/2005014.pdf

  8. Different Execution Platforms Highly scalable processingclusters (C++/Erlang/3PP) ... • Multicore can help reduce footprint • E.g. through virtualization • Software already written for distributed processing CPU Compact ”system on a blade”Tight space and power budget DP DSPs NP Subracks of dedicated blades(e.g. ”Integrated Site”, or IS) http://www.ericsson.com/ericsson/corpinfo/publications/review/2005_01/files/2005014.pdf

  9. Different Execution Platforms Highly scalable processingclusters (C++/Erlang/3PP) ... CPU Compact ”system on a blade”Tight space and power budget DP DSPs NP • Multicore can help through • Packing more umph in one slot • Replacing different special-purpose processors, reducing cost and footprint Subracks of dedicated blades(e.g. ”Integrated Site”, or IS) http://www.ericsson.com/ericsson/corpinfo/publications/review/2005_01/files/2005014.pdf

  10. Different Execution Platforms Highly scalable processingclusters (C++/Erlang/3PP) ... • Multicore can help through • Making each ”blade system” more compact • Better performance/slot => better price/performance • (Chaining subracks is always tricky – inter-subrack links a bottleneck) CPU Compact ”system on a blade”Tight space and power budget DP DSPs NP Subracks of dedicated blades(e.g. ”Integrated Site”, or IS) http://www.ericsson.com/ericsson/corpinfo/publications/review/2005_01/files/2005014.pdf

  11. VirtualContainer: OS App Thread Code/DataSegm App Thread Code/DataSegm OS App OS Code/DataSegm Thread OperatingSystem: Application: Difficult(?) Thread: Different levels of parallelism

  12. Our parallellism • Many (≥105) independent threads • Few supervisors, many workers (instances of the same code) • No matrix operations • No long loops with inherenet parallellism

  13. Erlang program features • Erlang already modelled ”with annotations” in form of processes • Processes models real world activities • C(++) is not like that

  14. VirtualContainer: Code/DataSegm OS App Thread Code/DataSegm OS App Thread Thread Code/DataSegm App OS OperatingSystem: Application: Easy! Difficult(?) Thread: Different levels of parallelism

  15. What is left to do then? • Analysis of dataflow // • pattern matching…. • Bandwidth? • Memory bandwidth?

  16. Case study: The Ericsson TGC

  17. The TGC • Telephony Gateway Controller • Formerly known as • the Hybrid • reported (by customer!) to have ISP of 0.999999999 • the Mediation Logic • Runs on AXD 301 and IS • Developed in Erlang • Multicore version shipped to customer AXE TGC GW GW GW

  18. The TGC (internal)‏ • Process model • Servers • Workers • ~100 servers • ~10 workers per call • 10,000 workers is perfectly normal AXE c-link half-call half-call half-call half-call dispatcher half-call half-call p-link decoder p-link decoder decoder GW GW

  19. Erlang on multicore ”Big bang” benchmark on Sunfire T2000 • Erlang programs are meant for distributed processing • Erlang processes do not share data • Message passing is distribution transparent • SMP prototype -97, serious implementation in -05. • Mid -06 we ran a benchmark mimicking call handling (axdmark) with a prototype SMP emulator. Observed speedup/core: 0.95 • TGC released on multicore in Q207 1 scheduler Time (s) 16 schedulers Simultaneous processes http://www.franklinmint.fm/blog/archives/000792.html

  20. TGC Results (top)‏ Tasks: 50 total, 2 running, 48 sleeping, 0 stopped, 0 zombie Cpu0 : 62.5% us, 3.7% sy, 0.0% ni, 32.4% id, 0.0% wa, 0.0% hi, 1.3% si Cpu1 : 36.1% us, 2.7% sy, 0.0% ni, 60.9% id, 0.0% wa, 0.0% hi, 0.3% si Mem: 4092764k total, 459352k used, 3633412k free, 8196k buffers Swap: 0k total, 0k used, 0k free, 215796k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1975 homer 25 0 2295m 192m 2144 S 99.9 4.8 179:40.46 beam.smp 1 root 16 0 664 244 208 S 0.0 0.0 0:01.50 init 2 root RT 0 0 0 0 S 0.0 0.0 0:00.02 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT 0 0 0 0 S 0.0 0.0 0:00.01 migration/1 5 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1

  21. TGC Results (dtop)‏ ppb1_bs13-R3A@blade_ size 2345(131M, cpu% 107, procs 10371, runq 0 15:15:53 memory[kB]: proc 58223, atom 1768, bin 170, code 29772, ets 39215 pid name current msgq mem cpu <0.5872.491 prfTarg (prfPrc:pinf/2) 0 2036 22 <0.18323.47 (erlang:apply/2) (gcpServ:recv1/3) 0 17 10 <0.18436.47 (erlang:apply/2) (gcpServ:recv1/3) 0 24 5 <0.1813.0> sysProc (gen_server:loop/6) 0 981 2 <0.27384.47 (pthTcpNetHandler:init/1) (gen_server:loop/6) 0 587 1 <0.18350.47 (erlang:apply/2) (gcpTransportProxy: 0 8 1 <0.1935.0> ccpcServer_n (gen_server:loop/6) 0 587 0 <0.18526.47 (erlang:apply/2) (gcpTransportProxy: 0 6 0 <0.1923.0> sbm (gen_server:loop/6) 0 1719 0 <0.3603.0> (erlang:apply/2) (gcpServ:recv1/3) 0 5 0

  22. Trafficscenario IS/GCP 1slot/board IS/GEP Dual core One core running 2slots/board IS/GEP Dual core Two cores running 2slots/board AXDCPB5 AXDCPB6 POTS-POTS /AGW X call/sec 2.3X call/sec One core used 4.3X call/sec OTP R11_3 beta+patches 0.4X call/sec 2.1X call/sec ISUP-ISUP /Inter MGW 3.6X call/sec 7.7X call/sec One core used 13X call/sec OTP R11_3 beta+patches 1.55X call/sec 7.6X call/sec ISUP-ISUP /Intra MGW 5.5X call/sec 26X call/sec 3.17X call/sec 14X call/sec TGC results (performance)‏

  23. TGC Experience • Porting effort: negligible (for the application)‏ • Porting effort: modest (for the platform)‏ • Architecture dependency: low (for the application) • Results: excellent • Future: bright • “Funky languages” (Hagersten) can sometimes save the day

  24. Conclusion • Still need much more research on low-level parallellism like pattern matching • Commercial multi-core applications is not the future. • It is NOW (iff you are an Erlang adict)

  25. Questions?

More Related