300 likes | 527 Views
Special session on NASS CDS. Objective To explain the nature and format of the database used for your homework and exam To learn of issues that are generalizable to other datasets used in injury prevention. NASS CDS.
E N D
Objective • To explain the nature and format of the database used for your homework and exam • To learn of issues that are generalizable to other datasets used in injury prevention
NASS CDS • The National Automotive* Sampling System Crashworthiness Data Set is owned and maintained by the National Highway Traffic Safety Administration. It developed since 1979, but it has had the current structure since 1988. *Formerly Accident
Disclaimer • This presentation relies heavily in materials developed by the National Highway Traffic Safety Administration (US. Dept. of Transportation). Some of the are available in their web site www.nhtsa.dot.gov whereas others were presented in special meetings, such as the power point presentation developed by Dr. Carra, Director of the National Center for Statistics and Analysis, a division of the agency and that follows For more information, visit www-nrd.nhtsa.dot.gov/departments/nrd-30/ncsa/nass.html or /availinf.html
Case Inclusion Criteria • For a crash to make it into the NASS CDS, it has to: • Be a crash on an open road that generates a police report • Involve at least one passenger car, light truck, van or utility vehicle • At least one vehicle must be towed away from the scene • Among all eligible cases, probabilistic sample (next)
Because the system was designed to be representative ONLY of the US, it is not possible to derive region, state- or county- level estimates.
The current number of cases being collected (approx. 4500) is a declining number due to budget limitations in the past years. The system used to collect more than 6000 cases per year
Implications of the PROBABILISTIC sampling • Because of this method of selecting cases, if one wants to have the real distribution of any crash-related characteristics in the US, one must use the “sampling weight” attached to that case. • Sampling weights range from 0 (cases that were collected but are not representative of any crash that year in the US) to almost 58,000. The weights have a wide variation. They are available in the variable ratwgt
(II) • These weights are derived at the end of the year, once all cases have been selected. • STATA has survey commands that allow to use this “weight” variable in most commands. SUDAAN is a special statistical software with similar capabilities. • The weights are re-evaluated by the agency to accommodate for changes in the number of cases collected per PSU and the number of PSU active at any given time.
Who collects data • Police through their regular police reports • Crash investigators who are NHTSA employees and are located near the police jurisdictions that are part of the system. • They locate the vehicles, photograph and measure them; visit the crash site, photograph and collect data; and, follow up victims by interviewing them and/or reading medical records
OUTCOME INFORMATION • Collected in a variety of ways: • Abbreviated Injury Severity Score • Injury Severity Score • Maximum severity (Dead, treated at hospital, treated at ED and released, etc) • Work days lost • …
SEVERITY OF CRASH INFORMATION • In depth crash investigation allows for careful assessment of energy released during crash (measured in a variety of ways)
Data files accessibility • 1978-1987 (not quite same system) • 1988-1996 trhough NHTSA’s offices in Cambridge, MA • 1997- on line
Data structure • Per each year, the approximately 400 variables collected are stored in 6 files that contain information on specific forms: • Accident file (Accident record and accdient event record), it has about 40 variables • General vehicle file (General vehicle form), with some 200 variables • Exterior vehicle file (Exterior vehicle form), some 125 variables
… • Interior vehicle file (interior vehicle form), some 150 variables • Occupant assessment file (occupant assessment form), some 125 variables • Occupant injury file (occupant injury form), some 50 variables Sum of variable per file exceeds 400 because of duplication of some variables across files
Organizing the data • The hierarchical database can be then managed to generate any new database with selected information on whichever analysis unit one wants (e.g., crash, person) • The files are available in SAS and flat file formats
Linking the data • One could merge files from one year while using the identifying information available through files (e.g., psu, case number, record number, version number, accident number, vehicle number, occupant number), or/and • Append years to create a larger dataset
For your HWs and Exam • We appended years 1991-2001 • We created an occupant-level file with selected information from accident, (all) vehicle, and occupant files.
How to understand the data • Every year, the agency produces a “Coding Manual”, a 800+ page document that outlines all the operational issues related to the system and the definitions of the variables. You can access those at www-nass.nhtsa.dot.gov/NASS/CDS/DataColl (Note: take a peak if you want, NO NEED TO PRINT THEM) Bill, it would be nice to have here The picture of the over page of the Document, but I don’t know how to Capture the first page of a pdf file As an image to bring in here You can use www-nass.nhtsa.dot.gov/ NASS/CDS/DataColl/man1995.pdf as the cover page
II • There is also a summary manual that indicates all variables ever collected since 1988 and summarizes changes over time Bill, it would be nice to have here The picture of the over page of the Document, but I don’t know how to Capture the first page of a pdf file As an image to bring in here You can use www-nass.nhtsa.dot.gov/ NASS/Manuals/CDS8896.pdf as the cover page
NASS CSD 1988-1994 Male 1 Female 2 Unknown 9 NASS CSD 1995-2001 Male 1 Female, not pregnant 2 Female, 1st trimester pregnant 3 Female, 2nd trimester pregnant 4 Female 3rd trimester pregant 5 Female, pregnant, unknown trimester 6 Unknown 9 For example: Age
NASS CDS 1988-1992 In hundreds of pounds. E.g., 136 = 13600 pounds Special codes: 000, less than 50 pounds 135, 135000 pounds or more 010, less than 1050 pounds 999, unknown NASS CDS 1993-2001 In tens of kilograms. E.g., 46=460 kilograms Special codes 045, less than 5 kg (until 1995); less than 454 since 1996 610, 6100 kg or more 612, 6124 kg or more (since 1996) unknown For example: Vehicle curb weight
BEFORE YOU ANALYZE THE DATA • Understand it • Create the file you need • Clean the variables
Take home message • Know your data • Get the reference manuals • Get a contact person who is very knowledgeable about the data to assist.