630 likes | 907 Views
Computer Engineering Senior Projects & Research Overview. An informal overview of past & current projects students & my own. by Al Davis. The Engineering Discipline. Role design and build things change the world around us hopefully for the better
E N D
Computer Engineering Senior Projects&Research Overview An informal overview of past & current projects students & my own by Al Davis
The Engineering Discipline • Role • design and build things • change the world around us • hopefully for the better • hence faced with a continuous ethical dilemma • Ultimate requirement • what we build must work • Requisite skills • science: math, physics, chemistry, materials, CS, … • engineering: state of the art, current practice, technology trends, manufacturing, testability, maintenance, life cycle costs, … • art: creative component that is clearly evident in the great engineers
Computer Engineering • Design and build computer systems • inherently involves both software and hardware design skills • System software • compiler, operating system, device drivers, … • as opposed to application specific software • applications are the target system “user” • hence they are used in design evaluation (pre- and post-build) • Hardware: possibly many disciplines and levels • VLSI chip design: analog and digital circuit aspects • CS, EE, physics are the key disciplines • yet cooling is a big issue – enter ME aspects • board design: CS, EE, and manufacturing issues are dominant • system design: balance of HW and SW capabilities
CE Senior Projects at Utah • Logistics • CE program run jointly by SoC and ECE departments • Senior project is capstone project course • team based • students choose their own project • best mechanism to demonstrate your abilities to future employers • CE Senior Project is a year long activity • at least for the last 2.5 years • Spring term of junior year: plan and propose • Summer: get parts and start building (optional) • Fall term of senior year: build and demonstrate • Exit interview feedback • rave reviews for being hard, fun, and instructive
04 Projects • Satellite Tracking station • Weaver – a 802.11 remote control vehicle interface • camera on car: image and commands to base station via wireless • car has autonomous anti-collision capability (infrared) • GPS Hummer • autonomous navigation and anti-collision • some AI in route finding since Hummer remembers obstacles that it saw previously • PCI Coprocessor • efficient acceleration via PCI add-on • Jiggawax • build your own iPod • RVI – remote vehicle interface • control via web or cell phone • control windows, engine, and door locks from RF base station
05 Projects • Carputer • OBDII car data and 802.11g auto-sync to base station • monitor your car or your kids • IR tag • paintball without the mess • Athlete monitor system • real time tracking of position and heart rate to central coaching station • GPS, RF, and HRM on-athlete • Inverted pendulum 2-wheeled robot • Multi-carrier reflectometry • finding faults in aircraft wires without tearing the plane apart • Glider avionics package • using accelerometers, GPS, and strain sensors
Current 06 projects (underway now) • PEN • electronic paper – the only paper you’ll ever buy! • Recipedia • a cook book that talks and listens to you • GPS tracker • use campus ubiquitous wireless to keep track of where things are via your cell phone or computer • OmegaCore • a DVR that knows how to remove commercials for you • NoCPR • bathtub drowning prevention • Tracking Visor • virtual reality on your head Come & watch on Reading Day
Selected Examples • Some images to illustrate previous projects
Satellite Tracking Station Final dual band antenna on the roof of MEB during demo day
2 meter (VHF side) antenna specs – students used an antenna design CAD tool
Autonomous anit-collision system
Glider Avionics Package (note this ended up being done by a single student as a thesis)
Designing an electronic compass is non-trivial especially if you want tilt-compensation
Board Schematic
Power Supply Filters and Registers Board Artwork
Senior Project Synopsis • This was just a peek • Just remember • if you can imagine it you can usually build it • there are some things you just can’t do • like a perpetual motion machine • which violates the laws of physics • all it takes is dedication and time • Huge diversity of both opportunities and problems • You might have noticed the world isn’t perfect • so help fix it!
Personal Research Overview • Past • dataflow, VLSI, asynchronous circuits, parallel computing, high performance architectures (50% academia, 50% industry) • Currently there are 4 projects • Domain specific architectures • target highly constrained embedded systems • will highlight the perception processor today • have also worked in signal processing and cell phone domains • Interconnect driven architecture • w/ Rajeev Balasubramonian & students • RPU design • w/ Erik Brunvand, Pete Shirley, Steve Parker, & students • VLSI wire scaling theory • w/ Stephanie Forrest & Melanie Moses @ UNM
Embedded Computing Characteristics • Historically • narrow application specific focus • typically cheap, low-power, provide just enough compute power • niche filled by small microcontroller/dsp devices • AND often ASIC component(s) • New Pressures • world goes bonkers on mobility and the web • expects ubiquitous information access • expects better and cheaper everything • sensors, microphones & cameras become free • so use lots of them • now we’re talking real computing
New Look for ECS • Sophisticated application suites • not single algorithms – e.g. • 3G and 4G cellular handsets • multiple channels and multiple encoding models • plus the usual DSP stuff • process what is streaming in from the net • includes real time media & web access • process the sensor, microphone, and camera streams • plus network information from the neighborhood • since things are starting to happen in groups • wide range of services • dynamic selection • no single app will do • Rate of algorithmic change is staggering
ECS Economics • Traditional reliance on the ASIC design cycle • lengthy IC design - > 1 year typical • little re-use • IP import works but there are many pitfalls • HDL code synthesize ed inefficiency • Macroblock forces process and layout issues • turning an IC is costly • even when it works the first time • ECS product cycles • lifetime similar to a mayfly • need next improved version “real soon now” • Result • sell monster volumes in a short time or lose
What is Perception Processing ? • Ubiquitous computing needs natural human interfaces • Processor support for perceptual applications • Gesture recognition • Object detection, recognition, tracking • Speech recognition • Biometrics • Applications • Multi-modal human friendly interfaces (our focus) • Intelligent digital assistants • Robotics, unmanned vehicles • Perception prosthetics
Perception Processing Problem consider always on aspect!!
Current Processors Inadequate • Too slow, too much power for embedded space! • 2.4 GHz Pentium 4 ~ 60 Watts • 400 MHz Xscale ~ 800 mW • 10x or more difference in performance but 100x in power • Inadequate memory bandwidth • Sphinx requires 1.2 GB/s memory bandwidth • Xscale delivers 64 MB/s ~ 1/19th • Our methodology • Characterize applications to find the problems • Derive acceleration architecture • History of FPUs is an analogy
The Problem w/ GPP’s • caches & speculation • consume significant area and energy • great when they work – a liability when they don’t • rigid communication model • data moves from memory to registers • register execution unit register • inability to support specialized computational pipelines • ASIC advantage • bottom line • can process anything • but not efficiently in many cases • it’s the von Neumann trap • lots of overhead for almost no work
FaceRec In Action Bobby Evans
Application Structure ANN based Rowley Face Detector • Flesh toning: Soriano et al, Bertran et al • Segmentation: Text book approach • Rowley detector, voter: Henry Rowley, CMU • Viola & Jones’ detector: Published algorithm + Carbonetto, UBC • Eigenfaces: Re-implementation by Colorado State University Neural Net Eye Locator Eigenfaces Face Recognizer Segment Image Flesh tone Viola & Jones Face Detector Image Identity, Coordinates ~200 stage Adaboost
Face Recognition Analysis • Cache • small L1D$ high hit rate • L2$ is useless – most L1 misses pass through • IPC • low even with lots of FP execution units • Why? • load store register & memory ports saturate • multiple large matrix traversals are the critical kernel • several indirect accesses per operation • dominant loop is a SFP inner product • no single cycle accumulate • Implications • restructure the code – loop fusion more temporary reg’s • need architectures which move data well
CMU Sphinx 3.2 Profile • Feature Vector = 13 Mel + 1st and 2nd derivative • 10 ms of speech is compressed into 39 SP floats • iMic possibility
Results similar to FaceRec cache port saturation big difference also memory B/W starved due to language model Speech Analysis (opt)
Simple ASIC Design Example: Matrix Multiply def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0.0 for i in range(0,16): sum = sum + A[row][i] * B[i][col] return sum
ASIC Accelerator Design: Matrix Multiply Control Pattern def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0.0 for i in range(0,16): sum = sum + A[row][i] * B[i][col] return sum
ASIC Accelerator Design: Matrix Multiply Access Pattern def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0.0 for i in range(0,16): sum = sum + A[row][i] * B[i][col] return sum
ASIC Accelerator Design: Matrix Multiply Compute Pattern def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0.0 for i in range(0,16): sum = sum + A[row][i] * B[i][col] return sum
ASIC Accelerator Design: Matrix Multiply def matrix_multiply(A, B, C): # C is the result matrix for i in range(0, 16): for j in range(0, 16): C[i][j] = inner_product(A, B, i, j) def inner_product(A, B, row, col): sum = 0.0 for i in range(0,16): sum = sum + A[row][i] * B[i][col] return sum