330 likes | 487 Views
TigerSHARC CLU Closer look at the XCORRS. M. Smith, University of Calgary, Canada smithmr@ucalgary.ca. Overview. Recap GPS correlation Look at XCORRS instruction in detail This was part of Take home quiz for 5005 Additional information on the web
E N D
TigerSHARC CLUCloser look at the XCORRS M. Smith, University of Calgary, Canada smithmr@ucalgary.ca
Overview • Recap GPS correlation • Look at XCORRS instruction in detail • This was part of Take home quiz for 5005 • Additional information on the web • Xcorrs.asm – assembly code discussed in class • Xmain.cpp – demonstrates the use of the xcorrs.asm code • XcorrsTest.cpp – demonstrates testing of all the functions being used • Additional correlation presentations (not XCORRS) from Analog Devices developers • In 2005, we pointed out many errors in TigerSHARC XCORRS explanation – if my figures are not the same as in the manual, then they fixed the manual errors
GPS Positioning Concepts • For now make 2 assumptions: • We know the distance to each satellite • We know where each satellite is • With this information from 2 satellites – you know you are on a “plane of intersection. • Require 3 satellites for a 3-D position in this “ideal” scenario • Requires 4 satellites to account for local receiver clock drift. (1)
Determining Time Signal send by satelliteSignal received by you You know the signal sent Perform correlations till you get a match • Use the PRN code to determine time • Use time to determine distance to the satellite distance = speed of light * time (1)
The practice • Suppose we have the vector – in-phase and out-of-phase data gathered over an antenna from a satellite for example. Gain issues make it x16 -16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j -16-16j, 16+16j, 16+16j, -16-16j 16+16j, 16+16j, -16-16j 16+16j, 16+16j, etc • Question – if the original data from the satellite had this form -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, -1-j,1+j,1+j, How much is the satellite data delayed? FOR THIS EXAMPLE …….. 0, 3, 6, 9, 12 etc
Tackle the issue with FIR • First – modify correlation function to handle complex values • Ignore that issue at the moment – 1 add + 1 multiplication + 2 memory fetches to 3 adds + 4 multiplications plus 4 memory fetches • Imagine 1024 data points + 1024 PRN • Need to do 1024 FIR each of 1024 taps • We know how to optimize to do 2 taps every cycle (one in X and one in Y) • Cycle time is 1024 * 512 cycles = 1 ms at 500 MHz • XCORS can do 8 * 16 taps each cycle in each compute block – 148 times faster
THEORYMathematicaldefinition Uses registers TR -- accumulate D -- 8 data? C -- 1 coefficient? And something called CUT – essentially awindow operation fcut = 0 -- don’t use
2005 Lab. 4Satellite data Quad fetch brings in 8 complex values 8 bits each Pattern here is -1 + 0j, 1 + 0j, 1 + 0j, -1 + 0j, 1 + 0j, 1 + 0j, ……….
PRN code – 2 bit complex number Seems strange to have two dummy bits But actually makes sense PRN -1+ -1j, 1 + j, 1 + j, -1 + -1j, 1 + j, 1 + j, ………. +1, -1 are associated with the PSK – more another lecture Problem BINARY means 1 and 0, so how represent 1 and -1 -1 are stored as 1’s, +1 stored as 0’s (DAMY)
PRN 0x3 value go in as C15 and C16 0011 -- C15 = -1 –j C16 = +1 + j
Standard XCORRS instruction Lower 46 bits ofTHR1:0 R7:3 TR0, TR1, TR2 ……. TR15
TR15:0 = XCORRS(R7:4, THR3:0) Doing 8 complex taps of 16 correlationat each cycle TR0 += D7 * C22 + D6 * C21 +… 8 taps TR1 += D7 * C21 + D6 * C20 +… 8 taps ……….. ……….. TR15 += D7 * C7 + D6 * C6 + … 8 taps 64 taps each cycles – on both x and y compute blocks – if set up properly 128 taps each cycle – these are “complex taps” compared to 2 real taps / cycle after lab. 3
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7) Because of offsets, sometimes wemust only use “some of the taps” TR0 += D7 * C22 + D6 * C21 + … 8 taps TR1 += D7 * C21 + D6 * C20 + … 8 taps ……….. ……….. TR14 += D7 * C8 + D6 * C7 2 taps TR15 += D7 * C7 1 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15) TR0 += D7 * C22 + D6 * C21 … 8 taps TR1 += D7 * C21 + D6 * C20 … 7 taps ……….. TR7 += D7 * C15 … 1 taps TR0 += 0 … 0 taps ……….. TR15 += 0 … 0 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT +7?) TR0 += 0 … 0 taps TR1 += D0 *C14 1 taps ……….. TR7 += D6 * C14 + D5 * C13 + … 7 taps TR0 += D7 * C14 + D6 * C13 + … 8 taps ……….. TR15 += D7 * C7 + D6 * C7 + … 8 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -15) TR0 += D7 * C22 + D6 * C21 … 8 taps TR1 += D7 * C21 + D6 * C20 … 7 taps ……….. TR7 += D7 * C15 … 1 taps TR0 += 0 … 0 taps ……….. TR15 += 0 … 0 taps
TR15:0 = XCORRS(R7:4, THR3:0) (CUT -7) TR0 += D7 * C22 + D6 * C21 + … 8 taps TR1 += D7 * C21 + D6 * C20 + … 8 taps ……….. ……….. TR14 += D7 * C8 + D6 * C7 2 taps TR15 += D7 * C7 1 taps
TR15:0 = XCORRS(R7:4, THR3:0) TR0 += D7 * C22 + D6 * C21 +… 8 taps TR1 += D7 * C21 + D6 * C20 +… 8 taps ……….. ……….. TR15 += D7 * C7 + D6 * C6 + … 8 taps 64 taps each cycles – on both x and y compute blocks – if set up properly 128 taps each cycle – these are “complex taps” compared to 2 real taps / cycle after lab. 3
Problem at this point -- THR3:2 emptyNeed to bring in more PRN values
TR15:0 = XCORRS(R7:4, THR3:0) (CUT +15) TR0 += 0 … 0 taps TR1 += D0 *C14 1 taps ……….. TR7 += D6 * C14 + D5 * C13 + … 7 taps TR0 += D7 * C14 + D6 * C13 + … 8 taps ……….. TR15 += D7 * C7 + D6 * C7 + … 8 taps
Final Result Maximum correlation occurs every 3 shifts – which is what we expect Is it the correct result?
Correlation – result expected In step -1 +0j, 1 + 0j, 1 + 0j, … 16 times with -1 - j, 1 + j, 1 + j, … 16 times -1 * -1 + 1 * 1 + 1 * 1 + 48 = 0x30 -- Real component Out of step -1 +0j, 1 + 0j, 1 + 0j, … 16 times with 1 + j, 1 + j, -1 - j, … 16 times -1 * 1 + 1 * 1 + 1 * -1 + -16 = -0x10 = 0xFFF0
Final Result 1) Now have correlation values for 16 shifts in TR registers – store to external memory Repeat for all other necessary shifts – find the maximum 2) Now make parallel in SISD mode 3) Now make parallel in SIMD
Overview • Recap GPS correlation • Look at XCORRS instruction in detail • This was part of Take home quiz for 5005 • Additional information on the web • Xcorrs.asm – assembly code discussed in class • Xmain.cpp – demonstrates the use of the xcorrs.asm code • XcorrsTest.cpp – demonstrates testing of all the functions being used • Additional correlation presentations (not XCORRS) from Analog Devices developers • In 2005, we pointed out many errors in TigerSHARC XCORRS explanation – if my figures are not the same as in the manual, then they fixed the manual errors