370 likes | 501 Views
The FRB and XML:. National data and International standards San Cannon Federal Reserve Board IASSIST 2005. Background:. The Fed is a statistical agency as well as a central bank and regulatory agency. Lots of data and information are available on the public website.
E N D
The FRB and XML: National data and International standards San Cannon Federal Reserve Board IASSIST 2005
Background: The Fed is a statistical agency as well as a central bank and regulatory agency. • Lots of data and information are available on the public website. • Statistical data is varied: Monthly industrial production indexes (non-financial), daily interest and exchange rates (financial) and quarterly financial flows for various sectors of the economy, surveys of small businesses and consumers, etc.
The different roles are often competing interests... Sometimes it seems that the statistical agency role is secondary. • Data are not always easy to find. • Downloads are not customizable. • Example: Trying to extract one industrial production series: Requires two text files, cutting and pasting, reformatting…. • All or nothing approach. • Complete – yes. User Friendly – no.
Other agencies making great strides: • Bureau of Economic Analysis has wonderful tabling capabilities: www.bea.gov • Bureau of Labor Statistics has query screens, series select screens and frequently requested statistics: www.bls.gov
Taking an extra step: We wanted to build something forward looking; XML was identified early on. • Most flexible and seems to be the trend for future. • Financial data already heading that way: FinXML, FpML (financial product ML), MDDL (Market data definition language), XBRL (eXtensible Business reporting language)
How do we do it? • Build our own XML definitions: • Pro: would fit our data perfectly • Con: we’d be the only ones • Use financial definitions: • Pro: lots of others use them • Con: we have nonfinancial data • Try SDMX (Statistical Data and Metadata eXchange): • Pro: designed for time series data • Con: new kid on the block
But nothing goes smoothly at first: SDMX is based on ‘key families’ and codelists where every concept can be represented by a code with a corresponding definition in a list:
We think about data differently The Fed uses mnemonic series names where each character in our series name has meaning and names are hierarchical.
Fitting a square peg in a round hole…. • Data represented by a concrete number of concepts are much easier to represent with key family dimensions and attributes: Q.SCBA.GB.92 → Freq.Topic.Country.BIS code M.HBBA.US.01 → Freq.Topic.Country.BIS code • Hierarchical relationships and varying number of concepts makes life more difficult – a single key family isn’t possible: JQI_I02YMF_N.M → Topic_Industry_SA.Freq RIFSPPNA2P2D30_N.B→ Topic?_SA.Freq
SDMX only provides a framework: • We still needed to build the actual schemas to describe our data within the SDMX metaschema framework. • Each data release uses its own schema or set of schemas. Each schema is based on a key family used to describe the data. • Currently, our schemas are tailored to meet our data needs.
Storage adds further complications: • We need to store data and metadata in a database to be retrieved with queries. • Native XML databases in their infancy. • We couldn’t find many people storing XML tagged data in relational databases
So what did we end up with? • Data model is hybrid: tree structure flattened to fit codelist setup. • We store the XML as carefully sliced text in a relational database and we can build an index structure that allows us to respond to ad-hoc queries very efficiently, even for large volumes of data.
Looks like this in SDMX-ML: <structure:KeyFamily id="CP_OUTST" agency="FRB"> <structure:Name xml:lang="en">Commercial Paper Outstandings</structure:Name> <structure:Components> <structure:TimeDimension concept="TIME" codelist="CL_TIME"> <structure:TextFormat/> </structure:TimeDimension> <structure:FrequencyDimension concept="FREQ" codelist="CL_FREQ"/> <structure:Dimension concept="CP_SA" codelist="CL_CP_SA"/> <structure:Dimension concept="CP_IND_TYPE" codelist="CL_CP_IND_TYPE"/> <structure:Dimension concept="CP_ORIG" codelist="CL_CP_ORIG"/> <structure:Dimension concept="CP_OWN" codelist="CL_CP_OWN"/> <structure:Dimension concept="CP_NSASC" codelist="CL_CP_NSASC"/> <structure:Attribute concept="UNIT" codelist="CL_UNIT" attachmentLevel="Group" assignmentStatus="Mandatory"/> <structure:Attribute concept="UNIT_MULT" codelist="CL_UNIT_MULT" attachmentLevel="Group" assignmentStatus="Mandatory"/> <structure:Attribute concept="OBS_STATUS" codelist="CL_OBS_STATUS" attachmentLevel="Observation" assignmentStatus="Mandatory"/> <structure:Attribute concept="SERIES_NAME" attachmentLevel="Series" assignmentStatus="Mandatory" /> <structure:Attribute concept="DESCRIPTION" attachmentLevel="Series" assignmentStatus="Conditional" /> </structure:Components> </structure:KeyFamily>
And the end result? • The Data Download Project (DDP) is the largest, most complex application on the Board’s public website. • It’s also the first production application to deliver customized data extracts in SDMX format. • And now……. Version 1.0!
Next steps… • Performance testing and verify server load capabilities. • Polish interface, do usability testing and verify compliance with Section 508 regulations. • Long run: work with other central banks on common schema framework. • Release on the unsuspecting public! Target: Third quarter 2005
The last slide… Questions? Comments? Thank you for your attention! San Cannon scannon@frb.gov (202) 452-3710