250 likes | 532 Views
DDS, A Seismic Processing Architecture. Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers @ BP.com Joseph A. Dellinger* Joseph.Dellinger @ BP.com. DDS ORIGINS: Amoco TRC, early 90’s.
E N D
DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers @ BP.com Joseph A. Dellinger* Joseph.Dellinger @ BP.com
DDS ORIGINS: Amoco TRC, early 90’s DDS began at the Amoco Tulsa Research Center at a time of great organizational strain. The job of the TRC was to do research and crunch data, not to write software. Creating software is expensive! Amoco’s solution was an edict that “everyone will use DISCO, or else”.
Else! But DISCO just wasn’t good enough! And so chaos ensued... We were “mired in seismic processing diversity”. DDS grew up surrounded by: • USP(Amoco internal trace-header based) • SEPlib(ASCII header pointing to data cubes) • SU(SEGY trace-header based) • DISCO (proprietary monitor-based system) .... and needed to be compatible with all of these!
Although formally cast as a research group, in fact the TRC also functioned as an “internal contractor” processing shop. 1) So to catch on, not only would any software have to be usable for quick-turnaround research, but 2) the ability to process large datasets efficiently and in parallel was also of vital importance. [Terabytes of data, Connection Machines, MPI, OpenMP] 3) The group had accumulated a considerable number and variety of computers. [All “Unix”, but CM5,Cray, Sun, SGI, Linux, Linux clusters, 32 and 64 bit...] 4) Finally, there was an urgent need for software that could accomodate all the various mutant SEGY formats coming into the shop, as well as DISCO, SEPlib, SU, and USP!
and out of the chaos came... John Etgen was using SEPlib for migration algorithm research on the CM200, a machine that required massively parallel data I/O. He showed SEPlib to Randy Selzler: “I want something that looks like THIS, but can handle the large industrial-strength jobs I need to do!” And thus DDS was born...
How SEPlib did it “header” file data file ... processing history ... esize=4 (bytes) data_format=xdr_float in=data_location n1=trace_length n2=number_traces_per_record n3=number_records d1=sample_interval o1=starting sample etc... regularly sampled cube of IEEE 4-byte floats of dimension n1 x n2 x n3 SEPlib was the system favored by the folks writing programs that worked on large data volumes instead of individual traces.
DDS can look a lot like SEPlib SEPlib header file DDS “dictionary” file ... processing history ... type=float4 format=fcube data= data location axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records delta.t= sample_interval origin.t= starting sample units.t= seconds etc... ... processing history ... esize=4 (bytes) data_format=xdr_float in=data_location n1=trace_length n2=number_traces_per_record n3=number_records d1=sample_interval o1=starting sample label1=seconds etc...
DDS can look a lot like SEPlib “dictionary” file type=float4 format=fcube data= data location axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records delta.t= sample_interval origin.t= starting sample units.t= seconds etc... data file regularly sampled cube of IEEE 4-byte floats of dimension size.t x size.offset x size.cdp (command-line arguments look a LOT like SEPlib too)
DDS’s Generalizations • N-Dimensional Array of I/O Records • Densely populated for random access • Sequential access if sparse • Meaningful Axis Names • t, x, y, z, w, kx, ky, kz, cmp, shot, offset, … • Extensible Axis Attributes • Regular grid (size, origin, delta, units, …) • Variable grid (grid.z= 1 3 5 7 11, …) • Non-numeric (label.attr= Vp Vs rho) Dictionary …axis= t y cmp…size.t= 1000size.y= 96size.cmp= 24…delta.t= 0.008units.t= s…origin.y= 5000units.y= m…format= segydata= oak39_@ Great for research! Exotic algorithms and unforeseen domains can be accurately represented and processed as easily as traditional ones. Binary Data Card HeaderLine Header Traces…
How USP did it USP-format data file Unix Seismic Processing USP was Amoco’s internally home-grown trace-based processing system, beloved of Amoco’s signal processors. USP is similar to SU in concept. USP uses longer trace headers than SU, but they still turned out to not be long enough! USP is still used as much as ever today. historical line header (processing history and 3 data dimensions) element count trace header trace samples traces element count trace header trace samples element count trace header trace samples ...
SU and USP use fixed-format trace headers defined by include files /* * hdr.h – SU include file for segy offset array */ static struct { char *key; char *type; int offs; } hdr[] = { { "tracl", "i", 0}, { "tracr", "i", 4}, { "fldr", "i", 8}, { "tracf", "i", 12}, { "ep", "i", 16}, { "cdp", "i", 20}, { "cdpt", "i", 24}, { "trid", "h", 28}, { "nvs", "h", 30}, { "nhs", "h", 32}, { "duse", "h", 34}, { "offset", "i", 36}, { "gelev", "i", 40}, { "selev", "i", 44}, { "sdepth", "i", 48}, { "gdel", "i", 52}, { ...
DDS also plays well with USP DDS dictionary file USP-format data file type=float4 format=usp data= data location axis= t offset cdp comp size.t = trace length size.offset=number traces per record size.cdp= number records size.comp= number components delta.t= sample_interval origin.t= starting sample units.t= seconds etc... line header (three dimensions) element count trace header trace samples traces element count trace header trace samples element count trace header trace samples ... DDS knows what USP headers look like!
and SEGY... type=float4ibm format=segy data= data location axis= t offset cdp comp size.t = trace length size.offset=number traces per record size.cdp= number records size.comp= number components delta.t= sample_interval origin.t= starting sample units.t= seconds etc... SEGY-format data file DDS dictionary file EBCDIC cards binary header trace header IBM-format samples traces trace header IBM-format samples trace header IBM-format samples ... Note DDS only bothers to convert back to SEGY’s archaic IBM floats when writing to disk!
DDS can speak SU note input format auto-detected editd in=minute2.usp \ 3s=16 3e=16 2s=2 2e=32 2i=2 \ out_format= su \ out_data= stdout: | \ supswigp clip=.2 > wiggle.ps
DDS dictionaries can point at dictionaries! type=float4ibm format=segy slice.comp data= dict.comp1 dict.comp2 dict.comp3 axis= t offset cdp comp size.t = trace length size.offset=number traces per record size.cdp= number records size.comp= number components ... type=float4ibm format=segy data= data.c1.segy axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records ... type=float4ibm format=segy data= dict.c2.segy axis= t offset cdp size.t = trace length size.offset=number traces per record size.cdp= number records ... SEGY binary data file data.c1.segy SEGY binary data file data.c2.segy dict.comp1 dict.comp2
DDS plays well with mutant SEGY bridge in= Atlantis_EQ.segy \ in_format=segy \ out_format=usp \ comment="Component Type" \ map:segy:usp.RcComp= "TotalStatic" \ \ comment="Src and rec locations" \ map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \ \ map:segy:usp.RcPtXC= "GrpX / 10" \ map:segy:usp.RcPtYC= "GrpY / 10" \ map:segy:usp.GrpElv= "Spare.I4[10] / 10" \ map:segy:usp.CabDep= "Spare.I4[10]" \ map:segy:usp.DstSgn= "DstSgn / 10" \ \ comment="Rec point and line numbers" \ map:segy:usp.DpPtLn= "Spare.I4[8]" \ map:segy:usp.DpPtLt= "Spare.I4[9]" \ \ comment="Dead or Live" \ map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \ |\ editd in= stdin: 3e=106 out_data= raw.usp straight map fixed number arithmetic calculation
Data formats and mappings • This is how DDS differs from SEPlib... The properties of the binary data, and all the elements within the binary data, are looked up in the “dictionary”. • Even the array of trace samples is just another trace field as far as DDS is concerned. • DDS knows a few default formats, but can use any format that you can define. • It can also map to and from any format that you can define the necessary mappings for. • This has the important side effect of documenting the data format, making future reproducibility possible
DDS supports generic formats In fact, besides having a few built-in default formats such as USP, SU, and SEGY that are convenient for geophysicists, there is nothing in the core of DDS that limits it to being a seismic processing system!
Internal data formats • Programs can define their own internal data formats as well, simply by writing definitions into their own internal dictionary: fdds_printf (‘MOD_FIELD’, ‘ *+ float MyHeader1, MyHeader2;\n\0’) • DDS will then convert from the format of the data, as documented by its dictionary, to the internal format specified by the program. • On output, the internal format will be converted back into whatever output format has been requested on the command line, or by default, the output format will be the same as the input format.
Leverage Diversity? Interoperate! Data handling is fundamental… Non-DDS Application Disk FilePipe/SocketTape Disk FilePipe/SocketTape Non-DDSApplication Generic Read DDSApplication API Emulation Generic I/O Generic Write Any DDSSupported Format Non-DDS Application Format and API EmulationWith Random Access I/O USP Re-link1998 Proofof Concept DISCO Support1997-2003 DDSApplication Generic I/O API Emulation Foreign Library Foreign Format
Are you scared yet? • You can probably imagine that all this translating between formats can get very complicated... ... fmt:SAMPLE_TYPE= typedef float4 SAMPLE_TYPE; fmt:USP_ADJUST= typedef enum4 {USP_LINE_PAD \= 0, USP_TRACE_PAD \= 0, USP_HLH_SIZE \= 2236} USP_ADJUST; fmt:SEQUENCE= typedef USP_TRACE SEQUENCE; alias:fmt:USP_TRACE_PAD= fmt:USP_ADJUST alias:fmt:USP_HLH_SIZE= fmt:USP_ADJUST alias:fmt:USP_LINE_PAD= fmt:USP_ADJUST usp_NumRec= 2056 ... But still better than having to change your code or relink your code for every different mutant data format! It also makes it possible to interoperate with historical data formats without too much pain.
DDS scripting as a Rosetta stone /apps/global/bin/bridge \ in= /hpc/dat13/zdsr01/Node/EQ/all.segy \ in_format=segy out_format=usp \ comment="Component Type" \ map:segy:usp.RcComp= "TotalStatic" \ comment="Src and rec locations" \ map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \ comment="Azimuth, Roll Tilt" \ map:segy:usp.TVPT01= "100 * Spare.F4[11]" \ map:segy:usp.TVPT02= "100 * Spare.F4[12]" \ map:segy:usp.TVPT03= "100 * Spare.F4[13]" \ comment="Dead or Live" \ map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \ comment="Shot Time" \ map:segy:usp.TVPT15=Date.DateYear \ map:segy:usp.TVPT16=Date.DateDay \ map:segy:usp.TVPT17=Date.DateHour \ map:segy:usp.TVPT18=Date.DateMin \ map:segy:usp.TVPT19=Date.DateSec \ ....
In Conclusion: caveats • Things aren’t so complicated if you use DDS as if it were SEPlib, but then what’s the point? • Because so much functionality already exists in USP, there has been little motivation to flesh out DDS. • The external distribution is a subset of the same stuff we use internally. There has been little effort put into improving the “packaging”. • While there is some documentation, it is somewhat lacking!
In Conclusion: upsides • The software infrastructure inside BP today is based almost entirely on DDS and USP. It is BP’s infrastructure both for research and for processing. BP’s advanced imaging team in Houston is “BP’s largest contractor”. • The DDS I/O library was released publicly in 2003 on “freeusp.org”. The core of the USP system was released a year or so earlier on the same web site, along with some ARCO-heritage processing systems as well. • By releasing USP and DDS, BP hoped to make it easier to share algorithms with academia and contractors. • Randy Selzler now wants to create a successor to DDS, but that’s his talk, as the “prophet”, to give...