570 likes | 715 Views
Communications in Embedded Systems. EE202A (Fall 2003): Lecture #6. Reading List for This Lecture. Required
E N D
Communications in Embedded Systems EE202A (Fall 2003): Lecture #6
Reading List for This Lecture • Required • Lahiri, K.; Raghunathan, A.; Lakshminarayana, G. LOTTERYBUS: A new high-performance communication architecture for system-on-chip designs. Proceedings of the 38th ACM Design Automation Conference, 2001. • Recommended • none • Others • none
Real-time Communications • Analogous to real-time computation • Value of communication depends on the time at which the message is delivered to the recipient • Metrics • Throughput • Delay • Delay jitter • Loss rate • Fairness • Comes in hard & soft variety • Deterministic vs. statistical guarantees
Key Problem • Allocation and scheduling of communication resources • Point-to-point link e.g. wire (scheduling at transmitter) • Distributed link e.g. wireless (MAC) or bus (arbiter) • Entire network (routing) • Anywhere there is a shared resource! • Analogous to task scheduling on processors, but with certain crucial differences • Often no preemption or coarse pre-emption • Channels of time-varying quality/capacity
Type of Traffic Sources • Constant bit rate: periodic traffic • Fixed-size packets at periodic intervals • Analogous to periodic tasks with constant computation time in RM model • Variable bit rate: bursty traffic • Fixed-size packets at irregular intervals • Variable-size packets at regular intervals • Traffic characteristics may change as it passes through the communication system
Key Issues • Scheduling • Admission control (schedulability) • Policing (for “isolation”) • Goals: • meet performance and fairness metrics • high resource utilization (as measured by resource operator) • easy to implement • small work per data item, scale slowly with # of flows or tasks • easy admission control decisions • Schedulable region: set of all possible combinations of performance bounds that a scheduler can simultaneously meet
Fairness • Intuitively • each connection gets no more than what it wants • the excess, if any, is equally shared • Fairness is intuitively a good idea • Fairness also provides protection • traffic hogs cannot overrun others • automatically builds firewalls around heavy users • reverse is not true: protection may not lead to fairness Transfer half of excess Unsatisfied demand A B A B C C
Max-min Fairness • Maximize the minimum share of task or flow whose demand is not fully satisfied • Resources are allocated in order of increasing demand, normalized by weight • No task or flow gets a share larger than its demand • Task or flows with unsatisfied demands get resource shared in proportion to their weights
Example • Given • Four flows with demands 4, 2, 10, 4 and weights 2.5, 4, 0.5, and 1, and a resource with capacity C=16 • Steps • Normalize weights so that smallest is 1: 5, 8, 1, 2 • In each round give a flow a share to its weight • Round 1: allocation is 5, 8, 1, 2 • Results in 1 and 6 units extra for flows 1 & 2 = 7 • Allocate this 7 to flows still in deficit according to re-normalized weights • Round 2: allocation is 7*1/3 and 7*2/3 to flows 3 & 4 • Results in 2.666 excess for flow 4 while flow 3 is still short • Allocate this 2.666 to flows still in deficit according to re-normalized weights • Round 3: allocation is 2.666 for flow 3 • Results in flow 3 with a total of 6, i.e. a deficit of 4
Policing • Three criteria: • (Long term) Average (Sustained) Rate • 100 packets per sec or 6000 packets per min?? • crucial aspect is the interval length • Peak Rate • e.g., 6000 p p minute Avg and 1500 p p sec Peak • (Max.) Burst Size • Max. number of packets sent consecutively, i.e. over a short period of time
Leaky Bucket Mechanism • Provides a means for limiting input to specified Burst Size and Average Rate. Figure from: Kurose & Ross
Leaky Bucket Mechanism (contd.) • Bucket can hold b tokens; token are generated at a rate of r token/sec unless bucket is full of tokens • Over an interval of length t, the number of packets that are admitted is less than or equal to (r t + b) • How can one enforce a constraint on peak rate?
Real-time Communications over a Link • Scenario: applications sharing a link of fixed capacity • Which packet to send when? • FIFO • Priority queuing: preemptive, non-preemptive • Round robin • Weighted fair queuing • EDF • Which packet to discard if buffers at sender are full? • What if senders not at the same place? • Need multiple access mechanism • Need distributed implementation
Fundamental Choices • Number of priority levels • a priority level served only if higher levels don’t need service (multilevel priority with exhaustive service) • Work conserving vs. non-work conserving • never idle when packets await service • why bother with non-work conserving? • Degree of aggregation • cost, amount of state, how much individualization • aggregate to a class • members of class have same performance requirement • no protection within class • Service order within a level • FCFS (bandwidth hogs win, no guarantee on delays) • In order of a service tag (both protection & delay can be ensured)
Non-work-conserving Disciplines • Idea • Delay packet till eligible • Reduces delay-jitter => fewer buffers in network • E.g. traffic remains smooth as it proceeds through the network • How to choose eligibility time? • rate-jitter regulator: bounds maximum outgoing rate • delay-jitter regulator: compensates for variable delay at previous hop • Do we really need it? • one can remove delay-jitter at an endpoint instead • but it also reduces expensive switch memory • easy to computer end-to-end performance • sum of per-hop delay and delay jitter leads to tight end-to-end delay and delay-jitter bounds • wastes bandwidth • but can serve background traffic or tasks • increases mean delay • always punishes a misbehaving source • more complex to implement (more state)
The Conservation Law • The sum of mean delay for a flow or task, weighted by its mean utilization of the resource, is a constant if the scheduler is work-conserving • A work conserving scheduler can only reallocate delays among the flows or tasks • A non-work-conserving will only have a higher value
Priority Queuing • Flows classified according to priorities • Preemptive and non-preemptive versions • What can one say about schedulability? Figure from: Kurose & Ross
CAN Bus: Distributed Priority Queuing • Developed during late 80’s for automotive industry • an ISO defined serial communication bus • peer-to-peer, multi-server network • defined for the physical and data link layers • 250 Kbaud for basic CAN, 1 Mbaud for full CAN • Messages are sent if • host computer requests transmission of a message • channel is idle • message priority wins over the messages that other nodes intend to send at the same time
CAN Bus: Real-time Capability viaPriority Arbitration • Non-destructive bit-wise arbitration • As each message is assigned a priority, a message with a higher priority that collides with another message with a lower priority would be allowed to continue transmitting • Transmitter with lower priority will detect mismatch of message sent and read back and temporarily halt • Another attempt will subsequently be made to send it once the bus is released
An Example • There are two transmitters A and B active at one time. Transmitter A has higher priority than B
Round Robin • Scan class queues serving one from each class that has a non-empty queue Figure from: Kurose & Ross
Weighted Round Robin • Round-robin unfair if packets are of different length or weights are not equal • Different weights, fixed packet size • serve more than one packet per visit, after normalizing to obtain integer weights • Different weights, variable size packets • normalize weights by meanpacket size • e.g. weights {0.5, 0.75, 1.0}, mean packet sizes {50, 500, 1500} • normalize weights: {0.5/50, 0.75/500, 1.0/1500} = { 0.01, 0.0015, 0.000666}, normalize again {60, 9, 4} • Problems • with variable size packets and different weights, need to know mean packet size in advance • fair only over time scales > round time • round time can be large • can lead to long periods of unfairness
Generalized Processor Sharing (GPS) • Generalized Round Robin • In any time interval, allocates resource in proportion to the weight among the set of all backlogged connections (i.e. non empty queue) • Serves infinitesimal resource to each • Achieves max-min fairness • Provide a class with a differentiated amount of service over a given period of time • But is non-implementable Figure from: S. Keshav, Cornell
Weighted Fair Queueing (WFQ) • Deals better with variable size packets and weights • GPS is fairest discipline • Find the finish time of a packet, had we been doing GPS • Then serve packets in order of their finish times Figure from: Kurose & Ross
WFQ: First Cut • Suppose, in each round, the server served one bit from each active connection (bit-by-bit round robin) • Round number is the number of rounds already completed • can be fractional • If a packet of length p arrives to an empty queue when the round number is R, it will complete service when the round number is R + p => finish number is R + p • independent of the number of other connections! • If a packet arrives to a non-empty queue, and the previous packet has a finish number of f, then the packet’s finish number is f+p • Serve packets in order of finish numbers from: S. Keshav, Cornell
A Catch • A queue may need to be considered non-empty even if it has no packets in it • e.g. packets of length 1 from connections A and B, on a link of speed 1 bit/sec • at time 1, packet from A served, round number = 0.5 • A has no packets in its queue, yet should be considered non-empty, because a packet arriving to it at time 1 should have finish number 1+ p • A connection is active if the last packet served from it, or in its queue, has a finish number greater than the current round number from: S. Keshav, Cornell
WFQ continued • To sum up, assuming we know the current round number R • Finish number of packet of length p • if arriving to active connection = previous finish number + p • if arriving to an inactive connection = R + p • Dealing with weights • replace p by p/w • To implement, we need to know two things: • is connection active? • if not, what is the current round number? • Answer to both questions depends on computing the current round number (why?) from: S. Keshav, Cornell
WFQ: computing the round number • Naively: round number = number of rounds of service completed so far • what if a server has not served all connections in a round? • what if new conversations join in halfway through a round? • Redefine round number as a real-valued variable that increases at a rate inversely proportional to the number of currently active connections • replace # of connections by sum of weights • With this change, WFQ emulates GPS instead of bit-by-bit RR from: S. Keshav, Cornell
WFQ and GPS • In GPS, a packet completes service when the round number increases beyond the packet’s finish number • In WFQ, the finish time of a packet is not the same as its finish number • Once assigned, the finish number does not depend on future packet arrivals and departures • The finish number of a packet is independent of the other connections awaiting service • because the rate of increase of the round number varies with the number of active connections
Example • Link of rate 1 unit/sec • Flows A, B, & C of equal weight • Packets of 1, 2, & 2 units arrive at t=0 • Finish numbers = 1, 2, 2 • Packet of size 2 at t=4 on A • Round number R at t=4? • t=0: 3 active connections, dR/dt = 1/3 • t=3: R=1 • 1st packet of A completes at R=1 in GPS emulation • Thus A gets inactive at t=3 • dR/dt=1/2 in [3,4] • At t=4, R = 1.5 • 2nd packet of A gets F = 1.5+2 = 3.5 • Server becomes idle when the second packet of A finishes service, at R=3.5 • In GPS, B & C finish service simultaneously at t=5.5 • In the remaining 1.5 time units, dR/dt=1 • So, R=3.5 at t=7 • Service order: ABCA (or, ACBA)
WFQ Implementation • On packet arrival: • use source + destination address (or VCI) to classify it and look up finish number of last packet served (or waiting to be served) • Re-compute round number • compute finish number • insert in priority queue sorted by finish numbers • if no space, drop the packet with largest finish number • On service completion • select the packet with the lowest finish number
Analysis • Un-weighted case: • if GPS has served x bits from connection A by time t • WFQ would have served at least x - P bits, where P is the largest possible packet in the network • WFQ could send more than GPS would => absolute fairness bound > P • To reduce bound, choose smallest finish number only among packets that have started service in the corresponding GPS system (WF2Q) • requires a regulator to determine eligible packets
Evaluation • Pros • like GPS, it provides protection • can obtain worst-case end-to-end delay bound • gives users incentive to use intelligent flow control (and also provides rate information implicitly) • Cons • needs per-connection state • iterated deletion is complicated • requires a priority queue
WFQ Performance • Turns out that WFQ also provides performance guarantees • Bandwidth bound • ratio of weights * link capacity • e.g. connections with weights 1, 2, 7; link capacity 10 • connections get at least 1, 2, 7 units of b/w each • End-to-end delay bound • assumes that the connection doesn’t send ‘too much’ (otherwise its packets will be stuck in queues) • more precisely, connection should be leaky-bucket regulated • # bits sent in time [t1, t2] <= r (t2 - t1) + b
WFQ + Leaky Bucket Figure from: Kurose & Ross
Parekh-Gallager Theorem • Let • a connection be allocated weights at each of K WFQ schedulers along its path such that the bandwidth allocated at the k-th scheduler is gk • g = smallest gk • the connection be leaky-bucket regulated such that # bits sent in time [t1, t2] <= r (t2 - t1) + b • the kth scheduler have a rate r(k) • Let the largest packet allowed in the connection be Pc, and in the connection be Pn
Significance • Theorem shows that WFQ can provide end-to-end delay bounds • So WFQ provides both fairness and performance guarantees • Bound holds regardless of cross traffic behavior • Can be generalized for networks where schedulers are variants of WFQ, and the link service rate changes over time
Problems • To get a delay bound, need to pick g • the lower the delay bounds, the larger g needs to be • large g => exclusion of more competitors from link • g can be very large, in some cases 80 times the peak rate! • Sources must be leaky-bucket regulated • but choosing leaky-bucket parameters is problematic • WFQ couples delay and bandwidth allocations • low delay requires allocating more bandwidth • wastes bandwidth for low-bandwidth low-delay sources
Rate-controlled Scheduling • A class of disciplines • two components: regulator and scheduler • incoming packets are placed in regulator where they wait to become eligible • then they are put in the scheduler • Regulator shapes the traffic, scheduler provides performance guarantees
Other Communication Issues in Embedded Systems • RLC effects of interconnects • Energy consumption • Clock synchronization – timing errors • Electro magnetic interference • Electrical Noise • Wiring delays • Unreliable and unpredictable data transfer • Generally broadcast in nature • Bus contention
New Trend: On-Chip Networks • Embedded Systems on chips (SoCs) designed and fabricated in 25-100nm technologies • Several challenges arise from the complexity of designing billion-transistor chips. • Systems on chips will be designed using pre-existing components, such as processors, controllers and memory arrays. • The physical interconnections on chip will be a limiting factor for performance and energy consumption. • The overall design goal (for most SoCs) is to satisfy some quality of service (QoS) metric (performance, reliability) with the least energy consumption. • View a SoC as a micro-network of components
Commercial On-chip Networks • AMBA from ARM • Network from Sonics • CoreConnect from IBM • Palmbus from Palmchip • Virtual Socket Interface (VSI) Alliance
Network - Basic Architecture • Backplane bus • Operated on a different clock • Use TDMA for guaranteed services • Each service assigned time slot • When IP core idle • Surrender time slot • Round-robin for contenders • Also deliver test vectors from pin • - Agents attached to each IP core support: • Bandwidth tuning • Latency tuning • Tunable clock • Data width setting • Pipelining • Interrupt passing • Testing (eg. JTAG) • - Agents need to gain backbone access • from bus arbiter
Network - Design Environment • Pre-synthesis simulation • Use pre-characterized timing values • Synthesis • Soft-wired: • Post-fab. Configurable • Registers as config. Element • Hard-wired • -Post-synthesis simulation
Network - application • PMC-Sierra Voice Over Packet Processor • Fabricated in 0.35, 0.25, 0.18 m technologies • Network configured to run at 80-250 Mhz • Mixed 32/64-bit data • 2, 3, 4 and 5 pipeline stages
Network - Comments • Technical difficulties • IP core ordering for spatial locality • TDMA scheduling • Conventional bus architecture • Bus, contention, arbitration, raw data • Future buses • Packet based? • Novel fabrics (eg. Optical, Radio)?
Bottleneck of Electrical Interconnect • RC time delay • IR voltage drop • CV2f power loss • Error in high data rate • Crosstalk, wave reflection… • Hard to scale down • Alternative solutions ??
Optical Interconnects • Extremely high bandwidth • 2.5 Gbps ~ 10Gbps • Two Groups • Wave-guided • Free-space
Optical Interconnects Free-Space ( Module of UCSD )