Communications in Embedded Systems

Communications in Embedded Systems EE202A (Fall 2003): Lecture #6

Reading List for This Lecture • Required • Lahiri, K.; Raghunathan, A.; Lakshminarayana, G. LOTTERYBUS: A new high-performance communication architecture for system-on-chip designs. Proceedings of the 38th ACM Design Automation Conference, 2001. • Recommended • none • Others • none

Real-time Communications • Analogous to real-time computation • Value of communication depends on the time at which the message is delivered to the recipient • Metrics • Throughput • Delay • Delay jitter • Loss rate • Fairness • Comes in hard & soft variety • Deterministic vs. statistical guarantees

Key Problem • Allocation and scheduling of communication resources • Point-to-point link e.g. wire (scheduling at transmitter) • Distributed link e.g. wireless (MAC) or bus (arbiter) • Entire network (routing) • Anywhere there is a shared resource! • Analogous to task scheduling on processors, but with certain crucial differences • Often no preemption or coarse pre-emption • Channels of time-varying quality/capacity

Type of Traffic Sources • Constant bit rate: periodic traffic • Fixed-size packets at periodic intervals • Analogous to periodic tasks with constant computation time in RM model • Variable bit rate: bursty traffic • Fixed-size packets at irregular intervals • Variable-size packets at regular intervals • Traffic characteristics may change as it passes through the communication system

Key Issues • Scheduling • Admission control (schedulability) • Policing (for “isolation”) • Goals: • meet performance and fairness metrics • high resource utilization (as measured by resource operator) • easy to implement • small work per data item, scale slowly with # of flows or tasks • easy admission control decisions • Schedulable region: set of all possible combinations of performance bounds that a scheduler can simultaneously meet

Fairness • Intuitively • each connection gets no more than what it wants • the excess, if any, is equally shared • Fairness is intuitively a good idea • Fairness also provides protection • traffic hogs cannot overrun others • automatically builds firewalls around heavy users • reverse is not true: protection may not lead to fairness Transfer half of excess Unsatisfied demand A B A B C C

Max-min Fairness • Maximize the minimum share of task or flow whose demand is not fully satisfied • Resources are allocated in order of increasing demand, normalized by weight • No task or flow gets a share larger than its demand • Task or flows with unsatisfied demands get resource shared in proportion to their weights

Example • Given • Four flows with demands 4, 2, 10, 4 and weights 2.5, 4, 0.5, and 1, and a resource with capacity C=16 • Steps • Normalize weights so that smallest is 1: 5, 8, 1, 2 • In each round give a flow a share  to its weight • Round 1: allocation is 5, 8, 1, 2 • Results in 1 and 6 units extra for flows 1 & 2 = 7 • Allocate this 7 to flows still in deficit according to re-normalized weights • Round 2: allocation is 7*1/3 and 7*2/3 to flows 3 & 4 • Results in 2.666 excess for flow 4 while flow 3 is still short • Allocate this 2.666 to flows still in deficit according to re-normalized weights • Round 3: allocation is 2.666 for flow 3 • Results in flow 3 with a total of 6, i.e. a deficit of 4

Policing • Three criteria: • (Long term) Average (Sustained) Rate • 100 packets per sec or 6000 packets per min?? • crucial aspect is the interval length • Peak Rate • e.g., 6000 p p minute Avg and 1500 p p sec Peak • (Max.) Burst Size • Max. number of packets sent consecutively, i.e. over a short period of time

Leaky Bucket Mechanism • Provides a means for limiting input to specified Burst Size and Average Rate. Figure from: Kurose & Ross

Leaky Bucket Mechanism (contd.) • Bucket can hold b tokens; token are generated at a rate of r token/sec unless bucket is full of tokens • Over an interval of length t, the number of packets that are admitted is less than or equal to (r t + b) • How can one enforce a constraint on peak rate?

Real-time Communications over a Link • Scenario: applications sharing a link of fixed capacity • Which packet to send when? • FIFO • Priority queuing: preemptive, non-preemptive • Round robin • Weighted fair queuing • EDF • Which packet to discard if buffers at sender are full? • What if senders not at the same place? • Need multiple access mechanism • Need distributed implementation

Fundamental Choices • Number of priority levels • a priority level served only if higher levels don’t need service (multilevel priority with exhaustive service) • Work conserving vs. non-work conserving • never idle when packets await service • why bother with non-work conserving? • Degree of aggregation • cost, amount of state, how much individualization • aggregate to a class • members of class have same performance requirement • no protection within class • Service order within a level • FCFS (bandwidth hogs win, no guarantee on delays) • In order of a service tag (both protection & delay can be ensured)

Non-work-conserving Disciplines • Idea • Delay packet till eligible • Reduces delay-jitter => fewer buffers in network • E.g. traffic remains smooth as it proceeds through the network • How to choose eligibility time? • rate-jitter regulator: bounds maximum outgoing rate • delay-jitter regulator: compensates for variable delay at previous hop • Do we really need it? • one can remove delay-jitter at an endpoint instead • but it also reduces expensive switch memory • easy to computer end-to-end performance • sum of per-hop delay and delay jitter leads to tight end-to-end delay and delay-jitter bounds • wastes bandwidth • but can serve background traffic or tasks • increases mean delay • always punishes a misbehaving source • more complex to implement (more state)

The Conservation Law • The sum of mean delay for a flow or task, weighted by its mean utilization of the resource, is a constant if the scheduler is work-conserving • A work conserving scheduler can only reallocate delays among the flows or tasks • A non-work-conserving will only have a higher value

Priority Queuing • Flows classified according to priorities • Preemptive and non-preemptive versions • What can one say about schedulability? Figure from: Kurose & Ross

CAN Bus: Distributed Priority Queuing • Developed during late 80’s for automotive industry • an ISO defined serial communication bus • peer-to-peer, multi-server network • defined for the physical and data link layers • 250 Kbaud for basic CAN, 1 Mbaud for full CAN • Messages are sent if • host computer requests transmission of a message • channel is idle • message priority wins over the messages that other nodes intend to send at the same time

CAN Bus: Real-time Capability viaPriority Arbitration • Non-destructive bit-wise arbitration • As each message is assigned a priority, a message with a higher priority that collides with another message with a lower priority would be allowed to continue transmitting • Transmitter with lower priority will detect mismatch of message sent and read back and temporarily halt • Another attempt will subsequently be made to send it once the bus is released

An Example • There are two transmitters A and B active at one time. Transmitter A has higher priority than B

Round Robin • Scan class queues serving one from each class that has a non-empty queue Figure from: Kurose & Ross

Weighted Round Robin • Round-robin unfair if packets are of different length or weights are not equal • Different weights, fixed packet size • serve more than one packet per visit, after normalizing to obtain integer weights • Different weights, variable size packets • normalize weights by meanpacket size • e.g. weights {0.5, 0.75, 1.0}, mean packet sizes {50, 500, 1500} • normalize weights: {0.5/50, 0.75/500, 1.0/1500} = { 0.01, 0.0015, 0.000666}, normalize again {60, 9, 4} • Problems • with variable size packets and different weights, need to know mean packet size in advance • fair only over time scales > round time • round time can be large • can lead to long periods of unfairness

Generalized Processor Sharing (GPS) • Generalized Round Robin • In any time interval, allocates resource in proportion to the weight among the set of all backlogged connections (i.e. non empty queue) • Serves infinitesimal resource to each • Achieves max-min fairness • Provide a class with a differentiated amount of service over a given period of time • But is non-implementable Figure from: S. Keshav, Cornell

Weighted Fair Queueing (WFQ) • Deals better with variable size packets and weights • GPS is fairest discipline • Find the finish time of a packet, had we been doing GPS • Then serve packets in order of their finish times Figure from: Kurose & Ross

WFQ: First Cut • Suppose, in each round, the server served one bit from each active connection (bit-by-bit round robin) • Round number is the number of rounds already completed • can be fractional • If a packet of length p arrives to an empty queue when the round number is R, it will complete service when the round number is R + p => finish number is R + p • independent of the number of other connections! • If a packet arrives to a non-empty queue, and the previous packet has a finish number of f, then the packet’s finish number is f+p • Serve packets in order of finish numbers from: S. Keshav, Cornell

A Catch • A queue may need to be considered non-empty even if it has no packets in it • e.g. packets of length 1 from connections A and B, on a link of speed 1 bit/sec • at time 1, packet from A served, round number = 0.5 • A has no packets in its queue, yet should be considered non-empty, because a packet arriving to it at time 1 should have finish number 1+ p • A connection is active if the last packet served from it, or in its queue, has a finish number greater than the current round number from: S. Keshav, Cornell

WFQ continued • To sum up, assuming we know the current round number R • Finish number of packet of length p • if arriving to active connection = previous finish number + p • if arriving to an inactive connection = R + p • Dealing with weights • replace p by p/w • To implement, we need to know two things: • is connection active? • if not, what is the current round number? • Answer to both questions depends on computing the current round number (why?) from: S. Keshav, Cornell

WFQ: computing the round number • Naively: round number = number of rounds of service completed so far • what if a server has not served all connections in a round? • what if new conversations join in halfway through a round? • Redefine round number as a real-valued variable that increases at a rate inversely proportional to the number of currently active connections • replace # of connections by sum of weights • With this change, WFQ emulates GPS instead of bit-by-bit RR from: S. Keshav, Cornell

WFQ and GPS • In GPS, a packet completes service when the round number increases beyond the packet’s finish number • In WFQ, the finish time of a packet is not the same as its finish number • Once assigned, the finish number does not depend on future packet arrivals and departures • The finish number of a packet is independent of the other connections awaiting service • because the rate of increase of the round number varies with the number of active connections

Example • Link of rate 1 unit/sec • Flows A, B, & C of equal weight • Packets of 1, 2, & 2 units arrive at t=0 • Finish numbers = 1, 2, 2 • Packet of size 2 at t=4 on A • Round number R at t=4? • t=0: 3 active connections, dR/dt = 1/3 • t=3: R=1 • 1st packet of A completes at R=1 in GPS emulation • Thus A gets inactive at t=3 • dR/dt=1/2 in [3,4] • At t=4, R = 1.5 • 2nd packet of A gets F = 1.5+2 = 3.5 • Server becomes idle when the second packet of A finishes service, at R=3.5 • In GPS, B & C finish service simultaneously at t=5.5 • In the remaining 1.5 time units, dR/dt=1 • So, R=3.5 at t=7 • Service order: ABCA (or, ACBA)

WFQ Implementation • On packet arrival: • use source + destination address (or VCI) to classify it and look up finish number of last packet served (or waiting to be served) • Re-compute round number • compute finish number • insert in priority queue sorted by finish numbers • if no space, drop the packet with largest finish number • On service completion • select the packet with the lowest finish number

Analysis • Un-weighted case: • if GPS has served x bits from connection A by time t • WFQ would have served at least x - P bits, where P is the largest possible packet in the network • WFQ could send more than GPS would => absolute fairness bound > P • To reduce bound, choose smallest finish number only among packets that have started service in the corresponding GPS system (WF2Q) • requires a regulator to determine eligible packets

Evaluation • Pros • like GPS, it provides protection • can obtain worst-case end-to-end delay bound • gives users incentive to use intelligent flow control (and also provides rate information implicitly) • Cons • needs per-connection state • iterated deletion is complicated • requires a priority queue

WFQ Performance • Turns out that WFQ also provides performance guarantees • Bandwidth bound • ratio of weights * link capacity • e.g. connections with weights 1, 2, 7; link capacity 10 • connections get at least 1, 2, 7 units of b/w each • End-to-end delay bound • assumes that the connection doesn’t send ‘too much’ (otherwise its packets will be stuck in queues) • more precisely, connection should be leaky-bucket regulated • # bits sent in time [t1, t2] <= r (t2 - t1) + b

WFQ + Leaky Bucket Figure from: Kurose & Ross

Parekh-Gallager Theorem • Let • a connection be allocated weights at each of K WFQ schedulers along its path such that the bandwidth allocated at the k-th scheduler is gk • g = smallest gk • the connection be leaky-bucket regulated such that # bits sent in time [t1, t2] <= r (t2 - t1) + b • the kth scheduler have a rate r(k) • Let the largest packet allowed in the connection be Pc, and in the connection be Pn

Significance • Theorem shows that WFQ can provide end-to-end delay bounds • So WFQ provides both fairness and performance guarantees • Bound holds regardless of cross traffic behavior • Can be generalized for networks where schedulers are variants of WFQ, and the link service rate changes over time

Problems • To get a delay bound, need to pick g • the lower the delay bounds, the larger g needs to be • large g => exclusion of more competitors from link • g can be very large, in some cases 80 times the peak rate! • Sources must be leaky-bucket regulated • but choosing leaky-bucket parameters is problematic • WFQ couples delay and bandwidth allocations • low delay requires allocating more bandwidth • wastes bandwidth for low-bandwidth low-delay sources

Rate-controlled Scheduling • A class of disciplines • two components: regulator and scheduler • incoming packets are placed in regulator where they wait to become eligible • then they are put in the scheduler • Regulator shapes the traffic, scheduler provides performance guarantees

Other Communication Issues in Embedded Systems • RLC effects of interconnects • Energy consumption • Clock synchronization – timing errors • Electro magnetic interference • Electrical Noise • Wiring delays • Unreliable and unpredictable data transfer • Generally broadcast in nature • Bus contention

New Trend: On-Chip Networks • Embedded Systems on chips (SoCs) designed and fabricated in 25-100nm technologies • Several challenges arise from the complexity of designing billion-transistor chips. • Systems on chips will be designed using pre-existing components, such as processors, controllers and memory arrays. • The physical interconnections on chip will be a limiting factor for performance and energy consumption. • The overall design goal (for most SoCs) is to satisfy some quality of service (QoS) metric (performance, reliability) with the least energy consumption. • View a SoC as a micro-network of components

Commercial On-chip Networks • AMBA from ARM • Network from Sonics • CoreConnect from IBM • Palmbus from Palmchip • Virtual Socket Interface (VSI) Alliance

Network - Basic Architecture • Backplane bus • Operated on a different clock • Use TDMA for guaranteed services • Each service assigned time slot • When IP core idle • Surrender time slot • Round-robin for contenders • Also deliver test vectors from pin • - Agents attached to each IP core support: • Bandwidth tuning • Latency tuning • Tunable clock • Data width setting • Pipelining • Interrupt passing • Testing (eg. JTAG) • - Agents need to gain backbone access • from bus arbiter

Network - Design Environment • Pre-synthesis simulation • Use pre-characterized timing values • Synthesis • Soft-wired: • Post-fab. Configurable • Registers as config. Element • Hard-wired • -Post-synthesis simulation

Network - application • PMC-Sierra Voice Over Packet Processor • Fabricated in 0.35, 0.25, 0.18 m technologies • Network configured to run at 80-250 Mhz • Mixed 32/64-bit data • 2, 3, 4 and 5 pipeline stages

Network - Comments • Technical difficulties • IP core ordering for spatial locality • TDMA scheduling • Conventional bus architecture • Bus, contention, arbitration, raw data • Future buses • Packet based? • Novel fabrics (eg. Optical, Radio)?

Bottleneck of Electrical Interconnect • RC time delay • IR voltage drop • CV2f power loss • Error in high data rate • Crosstalk, wave reflection… • Hard to scale down • Alternative solutions ??

Optical Interconnects • Extremely high bandwidth • 2.5 Gbps ~ 10Gbps • Two Groups • Wave-guided • Free-space

Optical Interconnects Guided-Wave

Optical Interconnects Free-Space ( Module of UCSD )

Communications in Embedded Systems

Communications in Embedded Systems

Presentation Transcript

Embedded Systems

EMBEDDED SYSTEMS

Embedded Systems

Embedded Systems

Embedded Systems

Embedded Systems

Directions in Embedded Systems

Embedded Systems

Embedded Systems

Embedded Systems

careers in embedded systems

Embedded Systems

EMBEDDED SYSTEMS