240 likes | 357 Views
HAND OUTS DExT Project UK Data Archive September 2007. Exploring text online. DDI record. Standard level 1 and 2 of DDI2 used Some fixed vocabulary used for qualitative data types, data formats and data collections methods File level 3 attributes not used. UKDA DDI record – HTML.
E N D
DDI record • Standard level 1 and 2 of DDI2 used • Some fixed vocabulary used for qualitative data types, data formats and data collections methods • File level 3 attributes not used
UKDA DDI record - XML <sumDscr> <timePrddate="1870-00-00"event="start">1870</timePrd> <timePrddate="1973-00-00"event="end">1973</timePrd> <collDatedate="1969-00-00"event="start">1969</collDate> <collDatedate="1973-00-00"event="end">1973</collDate> <nation>Great Britain</nation> <geogUnit>Regions in England, Scotland and Wales</geogUnit> <anlyUnit>Individuals; Families/households</anlyUnit> <universelevel="study">Location of units of observation:</universe> <universelevel="study">National</universe> <universelevel="study">Population keywords:</universe> <universelevel="study">Families</universe> <universelevel="study">Population:</universe> <universelevel="study">Men and women born between 1870 and 1908</universe> <dataKind>Textual data; Numeric data</dataKind> <dataKind>in-depth interview transcripts</dataKind> </sumDscr> </stdyInfo> <method> <dataColl> <timeMeth>Cross-sectional (one-time) study</timeMeth> <sampProc>Quota sample derived from the occupational census of 1911, clustered and stratified by region and social class</sampProc> <deviat>449 (qualitative); 444 (quantitative)</deviat> <collMode>Face-to-face interview; Compilation or synthesis of existing material</collMode> <weight>No weighting used</weight> <cleanOps>A</cleanOps> </dataColl> </method> <dataAccs> <setAvail> <accsPlac>ESDS Qualidata, UK Data Archive</accsPlac> <collSize>Variables per Case: 191 variables per case <br> </collSize> </setAvail> <useStmt> <specPerm>2003A</specPerm> <restrctn>The depositor has specified that registration is required and standard conditions of use apply. The depositor may be informed about usage. See <a href='/orderingdata/termsandConditions.asp'>terms and conditions</a> for further information.</restrctn> <contact>Help desk: qualidata@esds.ac.uk</contact> </useStmt> </dataAccs>
More TEI mark-up? • three basic groups of structural features • defining idiosyncrasies in transcription • links to analytic annotation and other data types (e.g.. thematic codes, concepts, audio or video links, researcher annotations) • identifying information such as real names, company names, place names, occupations, temporal information • we have piloted an NLP system to semi-automated mark up of named entities
Identifying elements • identify atomic elements of information in text • Person names • Company/Organisation names • Locations • Dates • Times • Percentages • Occupations • Monetary amounts • example: • Italy's business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Anderson 13
Progress on textual mark-up • text mining collaboration important • two bids in for key word extraction systems to help conceptually index qualitative data • Hence other need for annotation schema! • Some CAQDAS software also employ NLP tools for autocoding
CAQDAS Examples • Atlas-ti • HyperResearch • Max-QDA • NU*DIST 6 • N*VIVO 2 • QDA Miner • QUALRUS • Weft QDA