Advanced Embedded Systems

Advanced Embedded Systems Lecture5 Embedded Systems Hardware

Advanced Embedded Systems • A classical design information flow, for complex ESs, is shown in fig.: • Hardware for ESs is less standardized than hardware for personal computers; • However, there are hardware components frequently used in ESs: keys, sensors, microcontrollers, DSPs, LCDs, leds, seven segment displays, serial memories etc. • Communication is mostly implemented through serial interfaces: RS232, I2C, CAN etc.

Advanced Embedded Systems • Fig. shows a classical structure for an ES used in control applications: • Fig. shows the reactive feature of an ES: it reads and monitors the external environment and executes an external operation based on the data read; • There are variations of these scheme imposed by different components; • For example there are sensors which give digital data (in serial form), there are information processing units which include A/D converters and there are execution elements (actuators) requiring digital data (also in serial form);

Advanced Embedded Systems Sensors • There are a lot of sensors for every physical quantity; • Acceleration sensors: it contains a small mass in its center; when accelerated, the mass will be displaced from its initial position and will change the resistance of the tiny wires connected to it; • Rain sensors: the automotive industry became an important application area for rain sensors; a lot of cars contain them, commanding the speed of the wipers in accordance with the amount of rain; • Artificial eyes: application areas: robotics and medicine; • Medicine: a little camera is attached to glasses; it is connected to a computer which translates the patterns in electrical pulses; these pulses are sent directly to the brain, through electrodes; the resolution obtained (2003) is in order of 128 x 128 pixels, enabling a blind person to drive a car in controlled areas; • Robotics: cameras connected to computers;

Advanced Embedded Systems • Image sensors: two types: charge-coupled devices (CCDs) and CMOS; in both cases, arrays of light sensors are used; • The architecture of CMOS sensor arrays is similar to that of standard memories: individual pixels can be randomly addressed and read at an array boundary; • CMOS sensors are made in CMOS technology and they can be integrated on the same chip with the processing unit; they are smart sensors; • CMOS sensors require a single power supply voltage and interfacing is easy, so they are cheap; • In contrast, CCD sensors are adequate for high quality, expensive optical applications (video cameras, optical telescopes); • In CCD technology, charges have to be transferred from one pixel to the next until they can finally be read; • Images generated with CCDs have low level of noise, so they are of higher quality than those generated by CMOS sensors; but interfacing is more complex leading to higher costs;

Advanced Embedded Systems • Biometrical sensors: • Are used for security, more exactly in authentication; • Classical password based authentication is limited; • Biomedical authentication tries to identify a certain person by scanning parts of its body: face, iris, finger print; • Finger print sensors are fabricated in CMOS technology and facer recognition can be made with image sensors; • The hit rate is lower than in password based authentication; • Proximity sensors: indicate how close are two moving objects; an application area: cars with proximity sensors for helping the driver to park in small places; • Wireless sensors: • Include on the same chip the sensor, the processing unit and an interface for wireless communications; • Are connected in networks; • Low consumption is mandatory; • Many application areas: meteorology, medicine, smart houses, surveillance and tracking etc.

Advanced Embedded Systems A/D converters • Information processing units work with digital values; if sensors give analog values, they must be converted using A/D converters; • First, the analog voltage must be sampled and hold; • The transistor operates like a switch; each time the switch is closed by the clock, the capacitor is charged to a value equal to the incoming voltage Ve; after opening the switch, the voltage remains essentially the same until the switch is closed again; • Each of the values stored on the capacitor can be considered as an element of a discrete sequence of values Vx, obtained from an analog signal Ve; the values Vx will be converted in digital form;

Advanced Embedded Systems • Two types: independent and included in other circuits; • Flash A/D converter: • Each comparator has 2 inputs, denoted as + and -; if V+ > V-, the output gives a 1 and 0 otherwise; • In the A/D converter, all inputs are connected to a voltage divider; • If input voltage Vx > Vref, the comparator at the top will generate a 1; the encoder will identify the most significant 1 and will encode the case Vx > Vref as the largest output value;

Advanced Embedded Systems • If input voltage Vx < Vref but still > ¾ Vref, the comparator at the top will generate a 0 and the next comparator will generate a 1; the encoder will encode this value as the second largest value; • Similarly for the cases: 2/4 Vref < Vx < ¾ Vref, 1/4 Vref < Vx < 2/4 Vref and 0 < Vx < ¼ Vref, which will be encoded as the third largest, the fourth largest and the smallest values, respectively; • Advantage: high speed, no clock; it can be used in high-speed video applications; • Disadvantage: hardware complexity: n – 1 comparators are needed to distinguish between n values; • Successive approximation:

Advanced Embedded Systems • It is based on binary search and on successive approximation; for that, a register is necessary; • Initially, the most significant output bit of the successive approximation register is set to 1 and all other bits are set to 0; this digital value is converted to an analog value, corresponding to 0.5 x the maximum input voltage; if Vx > the generated analog value, the m.s.b. is kept to 1, otherwise is reset to 0; • A same process is applied to the next bit; it will remain 1 if the input value is within the second or the fourth quarter of the input value range; it will be reset otherwise; • The same process is applied to all the bits from the approximation register; • Advantage: hardware efficiency: for distinguishing n digital values, log2n bits are needed in the approximation register and the D/A converter; • Disadvantage: low speed, since it needs f(log2n) steps; • It is appropriate for applications where high precision conversions at moderate speeds are required; ex.: audio applications;

Advanced Embedded Systems Communication • Communication is done on communication media: wireless, wires, optical etc. through abstract entities called channels; • Communication requirements: • Real-time behavior: very important and must be taken into account from the design phase; some low-cost solution (e.g. Ethernet) are not appropriate; • Efficiency: communication media can be quite expensive; for ex. point to point connections in large buildings are a very expensive solution; the situation is worse if separate wires are foresight for control, data and addresses; with separate wires is almost impossible to add new modules; the weight of the wires must also be considered, for ex. in cars; the most efficient solution is the bus; • Appropriate bandwidth and communication delay: bandwidth requirements of ESs may vary in accordance with the requirements of the application; high bandwidth means high cost so it is important to provide only the necessary bandwidth;

Advanced Embedded Systems • Support for event-driven communication: communication with the external environment can be done by polling the sources or by interrupts; the first solution is simpler but the delay may be too large; interrupts are appropriate for event-oriented communication but they require a specific software, and possibly hardware, support; • Robustness: reliable communication must be maintained even in harsh conditions: large temperature domain (- 200C - + 1800C in cars), close to major sources of electromagnetic radiation, in presence of mechanical vibrations, major light sources etc.; voltage levels and clock frequency may be affected; • Fault tolerance: ESs should work even after faults occur; classical solutions, as restarts in general purpose computers, cannot be accepted; retries are frequently used after communications with errors; retries may affect the real-time requirement; • Maintainability, diagnosability: it concerns the possibility to repair ESs in reasonable time domains; • Privacy: solutions must be found for ensuring privacy of confidential information;

Advanced Embedded Systems • Electrical robustness • Single-ended signaling and • Differential signaling; • Single-ended signaling: • Signals are represented by voltages reported to ground; • A single ground wire is sufficient for a certain number of signals; • Is susceptible to external noise (for ex. from a motor which is switched on); • It is difficult to establish high-quality common ground signals between a large number of systems, due to the resistance and inductance of the ground wires; • Differential signaling: • Each signal is transferred on two wires; they are twisted; • If the voltage on the first wire is greater than the voltage on the second wire it is encoded as a logical 1, otherwise as a logical 0;

Advanced Embedded Systems • Signals do not generate any currents on the ground wires, hence the quality of the ground wires becomes less important; • Noise is added to the two wires in the same way and the comparator will remove all the noise; that is why the differential signaling ensures a much longer transmition (for example 1200 m in a serial interface compared with 30 m in a RS232 single-ended serial interface); • The logic value depends just on the polarity of the voltage between the two wires; the magnitude of the voltage can be affected by reflections or by the resistance of the wires but the decoded value will not be affected; • Signals do not generate any currents on the ground wires, hence the quality of the ground wires becomes less important; • No common ground is necessary; hence there is not need to establish high quality ground wires between a large number of communicating systems; this affects positively the cost; • Differential signaling allows a larger throughput than single-ended signaling; • Disadvantage: the need for two wires for a signal; it dramatically increases the number of wires; also, there is a more complex electronic for sending and receiving signals;

Advanced Embedded Systems • Guaranteeing real-time behavior • The communication on buses, like Ethernet, are affected by collisions, which can affect the real-time feature; • Carrier-sense multiple access/ collision detect (CSMA/CD) method: if a collision occurs, the systems must stop, wait for some time and retry; the waiting time is chosen randomly and it may happen a new collision at retry; collisions can repeat a number of times resulting in waste of time; this method is not appropriate when real-time constraints exist; • Carrier-sense multiple access/ collision avoidance (CSMA/CA) method: collisions are avoided; priorities are assigned to partners and communication media are allocated to partners during arbitration phases; when a system wants to communicate it must wait an arbitration phase and indicate its will; if a system with higher priority wants also to communicate, the first system has to remove its indication and wait another arbitration phase; • CSMA/CA guarantees a predictable real-time behavior for the system with the highest priority, considering an upper bound on the time between arbitration phases; for the other systems, real-time behavior can be guaranteed only if the higher priority partners do not access continuously the media;

Advanced Embedded Systems Processing units • Only several types of processing units are appropriate for ESs: ASICs, reconfigurable logic and processors; • The efficiency, measured in operations/Watt is higher for ASICs and lower, with one order of magnitude, for reconfigurable logic and with two orders of magnitude for processors; • Flexibility is higher for processors, lower for reconfigurable logic and very low for ASICs; the flexibility of the processors is given by their programmability feature; • Minimization of power and energy consumption is important; • Power consumption influences the size of the power supply, the design of the voltage regulators, the dimensions of the interconnections and the cooling process; • Minimizing the energy consumption is important especially in mobile applications, since battery technology is only slowly improving;

Advanced Embedded Systems • The energy consumption affects also the reliability, since the lifetime of electronic circuits decreases at high temperatures; • The energy for a certain application is closely related to the power required per operation, since the mathematical relationship between them; • According to the mathematical relationship, reducing the power consumption also decreases the energy consumption but it is not necessarily always true; • In some cases a slightly increased power consumption may lead to an important reduction in execution time resulting a decrease of the energy; • Application-Specific Integrated Circuits (ASICs) • Ensures high performances (high speed, energy efficiency) but requires high cost (for the mask of the chips); a price in the order of 105 euros for a mask is quite common; • Appropriate if the market accepts the costs or for a large market;

Advanced Embedded Systems • Reconfigurable logic: represents a compromise between the high costs of ASICs and low speed and high energy consumption of processors; • The function it executes can be changed using configuration data; • Application areas: • Fast prototyping: in experimental phases; • Low volume applications; • Reconfigurable logic usually includes RAM to store configurations; since RAM is volatile, ROM or Flash memories are necessary for providing the configuration data to RAM at power-up; • Field Programmable Gate Arrays (FPGA) are the most common form of reconfigurable logic; they consist of arrays of processing elements which can be programmed after fabrication; • Example: the Xilinx Virtex-II: • It contains up to 112 x 104 configurable logic blocks (CLB) interconnected through a programmable interconnect structure; • Contains also up to 1108 input/ output connections and special clock processing; • Contains also 168 18x18 bit multipliers and 3024 kbits of RAM;

Advanced Embedded Systems • Each CLB consists of 4 so-called slices:

Advanced Embedded Systems • Each slice contains two 16 bit memories, F and G; these memories can be used as look-up tables, LUT, for implementing all 216 boolean functions of 4 variables; • Using multiplexers, MUXF5, MUXFx, several of these memories can also be combined for creating LUTs for up to 8 variables; • They can also serve as ordinary RAM or as shift registers, SRLs; • Each slice also includes two output registers and some special logic (ORCY, CY) for additions; • Configuration data determines the setting of multiplexers, the clocking of registers, the content of RAM and the connection between CLBs; • Typically, the configuration data is generated from a high-level description of the functionality of the hardware, for ex. in VHDL; • Integration of reconfigurable logic with processors is possible.

Advanced Embedded Systems • Processors • Key advantage: flexibility; • Microcontrollers, DSPs, microprocessors • Main requirement: efficiency; has different aspects; • Energy-efficiency: • Architectures have to be optimized for their energy-efficiency and care must be taken for not loosing efficiency in the software generation process; for ex. compilers generating 50% overhead in terms of number of cycles are not desirable; • Energy efficiency must be considered from the design of the instruction set to the design of the manufacturing process; • Techniques for making processors energy efficient: • Gated clocking; • Dynamic power management; • Dynamic voltage scaling; • Gated clocking: parts of the processor are decoupled from the clock during idle periods;

Advanced Embedded Systems • Dynamic power management: processors have several low power modes in addition to the standard mode; each low power mode has a different power consumption and a different time for transitions into the normal operating mode; fig. shows an example: • The higher is the saving of the power in a low power mode, the smaller is the number of operations done by the processor in that mode; • Dynamic voltage scaling: the energy consumption of CMOS processors increases quadratically with the supply voltage Vdd; • The power consumption of CMOS: where α is the switching activity, CL is the load capacitance, Vdd is the supply voltage and f is the clock frequency;

Advanced Embedded Systems • The delay of CMOS circuits is described by the relation: where k is a constant and Vt is the threshold voltage; • Vt has an impact on the transistor input voltage required to switch the transistor on; for ex., for a maximum supply voltage of 3.3 V, Vt may be in the order of 0.8 V; consequently, the maximum clock frequency is a function of the supply voltage; • However, decreasing the supply voltage reduces the power quadratically, while the speed is only linearly decreased; • Ex.: the Crusoe processor has 32 voltage levels, between 1.1 and 1,6 V, and the clock can be varied between 200 MHz and 700 MHz, in increments of 33 MHz; transition from one voltage/ frequency pair to another one requires about 20 ms; • Code-size efficiency: capacity of internal memory is limited and, typically, there is no external memory; the code size must be minimized;

Advanced Embedded Systems • CISC machines are more efficient, in code size, than RISC machines; RISC machines are faster; • Compression techniques: • Reduces both the area of the memory in the chip and the energy necessary to fetch the instructions; • Due to the reduced bandwidth requirements, fetching can also be faster; • A decoder is necessary between the processor and the instruction memory for recreating the original instructions on the fly; • A variation of the compression technique is the existence of the second instruction set; ex.: the ARM processors; the original ARM instruction set is 32 bit wide but there is also a 16 bit wide set, called THUMB; during execution THUMB instructions are dynamically converted into full ARM instructions; the disadvantage is in software development cost;

Advanced Embedded Systems • Run-time efficiency: in order to meet time constraints, without high clock frequencies, architecture can be customized to certain application domain; ex.: DSPs; • In digital signal processing, digital filter generating is a very frequent operation; the next equation describes a digital filter generating an output sequence, y = (y0, y1, …) from an input sequence x = (x0, x1, …): • A certain output element, yi, correspond to a weighted average over the last n sequence elements of x and can be computed iteratively using the following equations: yi,j = yi, j-1 * aj where yi, -1 = 0 and yi = yi, n-1 • DSPs are designed such that each iteration can be encoded as a single instruction;

Advanced Embedded Systems • Ex.: the internal architecture of an DSP:

Advanced Embedded Systems • D and P are two memories, accessed through a special addressing unit, AGU; there are separate units for additions and multiplications, each with their own argument registers, AX, AY, AF, MX, MY and MF; the multiplier is connected to a second adder for computing series of multiplications and additions quickly; • The update of the partial sum is essentially done in a single cycle; for that, the two memories are allocated to hold the two arrays x and a and address registers are allocated such that relevant pointers can be easily updated in the AGU; partial sums, yi,j, are stored in MR; • The pipelined computation involves registers A1, A2, MX and MY, like in the following implementation of the filter: MR:=0; A1:=1; A2:=n-2; MX:=x[n-1]; MY:=a[0]; for (j=1; j<=n; j++) {MR:=MR + MX * MY; MX:=x[A2]; MY:=a[A1]; A1++; A2--}

Advanced Embedded Systems • A single instruction encodes the loop body, comprising the following operations: • Reading two arguments, from argument registers MX and MY, multiplying them and adding the product to register MR storing values yi,j; • Fetching the next elements of arrays a and x from meories P and D and storing them in argument registers MX and MY; • Updating pointers to the next arguments, stored in address registers A1 and A2; • Testing for the end of the loop; • This way, each iteration requires only one instruction and for that, several operations are done in parallel; this leads to relatively low clock frequencies; • The registers in this architecture perform different functions; they are said to be heterogeneous; heterogeneous register files are a common characteristic for DSPs; • In order to avoid extra cycles for testing for the end of the loop, zero-overhead loop instructions exist in DSPs; with them, a single or a small number of instructions can be executed a fixed number of times;

Advanced Embedded Systems • Microcontrollers: the classical 8051: • Is the core of a large family of 8 bit microcontrollers; • CMOS technology; • Includes 4 Kbytes ROM memory and 128 bytes RAM memory; • Includes an ALU and a boolean processor; • Has 4 I/ O ports which can be used as general purpose ports but have also alternative functions; • Can address up to 64 kbytes external program memory and up to 64 kbytes external data memory; • Has 2 independent timers; • Includes a full duplex serial UART; • The instruction set is oriented on real-time applications; • The interrupt system can manage 5 external and internal sources, with 2 priority levels; • Low consumption: 16 mA in normal mode, 3.7 mA in Idle mode and 50 μA in Power Down mode;

Advanced Embedded Systems

Advanced Embedded Systems • DSPs: other application oriented features: • Specialized addressing modes: modulo addressing; is useful when the addressed elements are in a ring buffer; addresses can be incremented and decremented until the first or last element of the buffer is reached; • Separate address generation units: addresses are stored in dedicated address registers; this allows the indirect addressing modes, saving machine instructions, cycles and energy; • Saturating arithmetic: changes the way overflows and underflows are handled; in standard binary arithmetic, wrap-around is used for the values returned after an ov. or und.; saturating arithmetic returns the closest result to the true one; in video and audio applications; • Fixed-point arithmetic: floating-point hardware increases the cost and power consumption; consequently, 80% of DSPs do not contain this hardware; however, many of them have support for fixed-point numbers; • Multiple memory banks or memories: this allows to fetch both arguments of an operation at the same time; • Multiply/ accumulate instructions: such an instruction performs multiplications followed by additions.

Advanced Embedded Systems • Memories: • For increasing the run-time and energy efficiency, memory hierarchies should be used; the reason is that large memories require more energy per access and are also slower than small memories; • The gap between the processor and memory speeds is increasing; while the speed of memory is increasing by a factor f about 1.07/ year, processor speed is increasing by a factor of 1.5 – 2/ year;

Advanced Embedded Systems • Therefore, it is efficient to use small and fast memories as buffers between the main memory and the processor; • The PC solution: the cache memory; the hardware checks require additional energy and caches cannot offer predictability of real-time performance; • The alternative solution: small memories can be mapped into the address space: • They are called scratch pad memories; • The compiler should allocate frequently used variables and instructions to that address space; no checking is necessary; • As a result the energy per access is reduced;

Advanced Embedded Systems • Fig. compares the energy/ access in case of cache memories and scratch pad memories: Output devices • Displays: LEDs, LCDs, seven segment displays, small touch-screens; • Electro-mechanical devices: action on the environment through electrical motors, transforming rotation in movement; • They are generally called actuators and there is a large spectrum of actuators from very tiny ones, in the μm area (a challenging application area is the human body), to very big ones, capable of moving tons of weight;

Advanced Embedded Systems • D/ A converters: • The operational amplifier amplifies the voltage difference between the two inputs by a very large factor; • Due to resistor R1, resulting output voltages are fed back to input -, reducing the input voltage; the differential voltage between the two inputs is reduced to zero and since input + is connected to gnd the voltage between input – and gnd is zero; • The main idea is to generate a current proportional to the value represented by a bit-vector x and convert this current in a voltage; • Current I is the sum of the currents through the resistors; • The current through a resistor is 0 if the corresponding bit of bit-vector x is 0; if this is 1, the current corresponds to the weight of that bit since the resistor values are chosen accordingly;

Advanced Embedded Systems • The equation for I is (one of the Kirchoff’s law): • The other Kirchoff’s law gives (due to the zero value at input -): V + R1*I’ = 0; • The current into the input of the operational amplifier is practically 0 and I = I’; hence: V + R1*I = 0; • From the first and the last equations we obtain: • It denotes the natural number represented by bit-vector x; • The output voltage is proportional to the value represented by x.

Advanced Embedded Systems