230 likes | 429 Views
Student Centered ODS. ETL Processing. Insert. Search for rows not previously in the database within a snapshot type for a specific subject and year Check for duplicates Identify test Lookup metadata Create Unique Test Event identifiers Load data Copy Previous Year Test details.
E N D
Student Centered ODS ETL Processing
Insert • Search for rows not previously in the database within a snapshot type for a specific subject and year • Check for duplicates • Identify test • Lookup metadata • Create Unique Test Event identifiers • Load data • Copy Previous Year Test details
Source Data PADMID SnapshotFg SMFConfig_ID UpdtDt EndDt 20179458T1008 F 134 2008-11-21 09:21:36.117 NULL The current date and time is 11/21/2008 10:15AM when load starts Event ID SMFConfig_ID UnqTstEvent_ID EffDt EndDt 567917 The current date and time is 11/21/2008 2:30PM when load starts 134 438502 2008-11-21 00:00:00.000 NULL Event ID SMFConfig_ID UnqTstEvent_ID EffDt EndDt 567917 134 438502 2008-11-22 00:00:00.000 NULL Event ID SMFConfig_ID UnqTstEvent_ID EffDt EndDt 567917 134 438502 2008-11-22 00:00:00.000 2008-11-22 23:59:59.000 ≠ How the effective date is determined before / after noon In some rare cases, a row is found to be inserted that has already been end dated. In that case it will be added to the database and be effective for one day.
Update • Search for changes to rows already in the ODS from Staging within a snapshot type for a specific subject and year that occurred since the last run of the update ETL • No need to check for duplicates • Identify test • Lookup metadata • Load data • Determine how to apply update • Normally end date prior version of row and insert new row • On some occasions rows that had been previously end dated may be reintroduced to the ODS • Sometimes the only action is to end date the current row • Copy Previous Year Test details
Source Data PADMID SnapshotFg SMFConfig_ID UpdtDt EndDt 20179458T1008 F 134 2008-11-21 09:21:36.117 NULL The current date and time is 12/15/2008 10:15AM when load starts Event ID SMFConfig_ID UnqTstEvent_ID FNm EffDt EndDt In ODS 567917 134 438502 Bob 2008-11-21 00:00:00.000 2008-12-14 23:59:59.000 Change from Staging 627423 134 438502 Robert 2008-12-15 00:00:00.000 NULL The current date and time is 12/15/2008 2:30PM when load starts Event ID SMFConfig_ID UnqTstEvent_ID FNm EffDt EndDt In ODS 567917 134 438502 Bob 2008-11-21 00:00:00.000 2008-12-15 23:59:59.000 Change from Staging 627423 134 438502 Robert 2008-12-16 00:00:00.000 NULL = How the effective date is determined before / after noon
Update Logic • Read updated row from Staging • Read current record from ODS • If row in ODS is end dated • Add new row to ODS • If row in ODS is not end dated • Compare columns Staging <> ODS • If the columns are different • End date ODS row • Add Staging row to ODS • If the columns are the same and the end date in the ODS row is not set and the end date from the Staging record is set • End date ODS row
Delete (soft) • For the subject / year table search for rows in the ODS that are no longer in Staging • The examinee table is not checked for end dated rows, only the subject / year table is used to determine if a delete is needed per Assessment • A Cascaded update is performed end dating the entire test event and all related rows from the following tables: • Event • EventInst • EventInd • BnchLvl • Score • RaterScore • PaperPencilData • CmptrBasedData • EventClsRm • The end date is determined the same way as in the insert and update ETL
The affects of source tables on Student Centered tables Not all tables are loaded for all tests • Score, Benchmark Level and Rater Score are not loaded for Virtual tests • Presently only Writing has Rater Scores (possibly ELPA in the future) • Paper / Pencil tests load to Paper Pencil Data and Event Class Room • Computer based tests load to Computer Based Data • Writing has only total Benchmark Level scores while other subjects have Benchmark Level scores at the category (aka strand) level Some tables are only loaded if there are values present • Only non-blank scores are loaded to the Benchmark Level table • Only non-null scores are loaded to the Score and Rater Score tables • Only non-null institution identifiers are loaded to the Event Institution table
Example of applying updates Row is inserted Update to Subject / Year table << Changes to an Institution and an indicator – end date current rows << Insert new current rows – null end date Update to Subject / Year table << Changes to student demographic data – end date current row << Insert new current row – null end date
Maintenance • Occasionally there may be the need to make corrections or to reload portions of the ODS • After a few years there may also be the need to remove some lower level of detail from the ODS • In any case as maintenance is needed communication will be made to inform clients of what changes are coming and some suggestions on how to deal with those changes
How to store the Extracted ODS Data • It is recommended that the extract layout be used as a guide for the staging database • A data model is available in this format for your use • The model is in power designer 12.5 and available in html format for review
Considerations for your local ODS • The ID column from each table is sufficient for a primary key • Columns ending in _ID are foreign keys from other tables and should be indexed • The SMFConfig_ID can be used for vertical partitioning of the Core Content data • Additional indexes are in the data model and you may want to tune and add more based on your needs
Loading data to your local ODS • If you follow the suggested database design, the process of loading data is simple • On your schedule you will receive data • Rows that match by ID are updated • Rows not found by matching on ID are inserted • The Security tables are replaced on each load in total • That’s all there is to keeping your local ODS up to date.
How to retrieve data from your local ODS • Generally there are two types of queries • Current data • Data as of a date • For current data select where the end date is null • select * from Event where UnqTstEvent_ID = 500 and Enddt is null • For data as of a date a query such as this will work • select * from Event where EffDt <= ‘2008-11-19 10:57:00.000’ and (EndDt is null or EndDt >= ‘2008-11-19 10:57:00.000’) and UnqTstEvent_ID = 500
Formal and Inferred Registration System (FIRS) • FIRS makes it possible for ODE to give the proper data for each of the students served by your clients • Spring and Fall Membership, all state assessments and any student transfers through OSTX since 2004-2005 are used by FIRS to form a chronology of which institutions a student was related to and when • This information is used by the extract process to provide the most complete data possible • In the coming year the new Consolidated ADM data collection will begin providing information to this process as well • The tables FIRS and DistUnqTstEvent provided in your extract are taken from this system
Extract Process • Regions must provide a list of districts in order to receive the extract • The ODE helpdesk will setup the relationships between the Region and their districts • The first extract will be a full • The next extract will change to an incremental automatically • An incremental extract includes changes for continuing students plus full extracts for students new to the client districts • Each time an extract is performed the last extract date in the Region’s configuration is set • The next extract will contain all changes since the last extract date
Extract Process (continued) • Data is extracted into the formats specified by the StudentCenteredExportFormat.xls • One file is produced per table described in the format • The files are in CSV format with text delimited by quotes • Files will only be produced if there are rows qualifying for the extract • The file Manifest.txt contains a list of the files extracted with the count of rows and when it was produced • The CSV files and the Manifest.txt are compressed into a .zip file and placed on ODE’s secure FTP site for pickup by the Region
File Transfer and Scheduling • When the districts the Region serves are communicated to the ODE Helpdesk ODE will also make sure security is setup for connecting to the secure FTP server • Instructions for connection will be provided • The ODE helpdesk can schedule which days the Region will receive files • The same scheduling system is already in use for extracting data from Student Centered Staging
Requested extracts • At times it may be necessary for the Region to receive a full extract to repopulate your local ODS or the Region may have missed some extracts produced previously • Full extracts can be scheduled by the ODE Helpdesk • The number that can be done in one day is limited • Full extracts may have to wait until the weekend • Full extracts will not be provided on a regular basis • For missed extracts, the ODE Helpdesk can set the date as of which the extract will pull information and provide a larger incremental extract • The date as of which the extract will pull information is cleared after the run of the extract • Processing will return to normal automatically
Limiting Access • Each extract will provide a new copy of the security files (FIRS and DistUnqTstEvent) • The FIRS file contains the district related to the student with the end date provided for review if needed • The DistUnqTstEvent relates the district to the specific test events available to that district • By simply joining through this table when providing access to your clients you can restrict access to just the information they are allowed to access Declare @DistInstID int Set @DistInstID = 2082 select e.* from Event e join DistUnqTstEvent d on e.UnqTstEvent_ID = d.UnqTstEvent_ID and d.DistInstID = @DistInstID
Data Model • Follows the same definitions provided by the StudentCenteredExportFormat.xls spreadsheet • Organized around the document as well providing different color coding for the same major subjects as the spreadsheet: • Core Content Data • Score Data • File Processing Control Data • Metadata • Security – plus – • Possible changes • The model show how the tables are related to each other and provides useful information about the data • The model is in Power Designer 12.5; an html version is available to review the model that contains table create statements as well
Pulling it all together • The SMFConfig table is essentially a link to the subject and year that the data belongs with • Most of the metadata is contained in a Table of Tables / Master Codes scheme which houses virtual tables and related code values • Each row in the Table of Tables represents a virtual table • Rows in the Master Codes table relate back to the Table of Tables for rows that represent the values stored in the virtual table of tables • The Ctgry table is used to indicate the score reporting category (aka strand) related to score data • The Ctgry table also contains entries for total scores • Review samples
Student Centered ODS Thanks for coming!