390 likes | 497 Views
MI School Data May 2012. MI School Data – Functionality Overview. District/School Summary Quick Facts Openings/Closings School data file Assessment and Accountability Dashboard and Report Card MEAP, MME, MI-Access, and ACT College Readiness Indicator (ACT scores)
E N D
MI School Data – Functionality Overview • District/School • Summary • Quick Facts • Openings/Closings • School data file • Assessment and Accountability • Dashboard and Report Card • MEAP, MME, MI-Access, and ACT • College Readiness Indicator (ACT scores) • Students not tested report • Assessment revised cut scores • Student • Graduation/Dropout • Non-resident Report • Student Count • Staffing/Financial • Educator Effectiveness • Effectiveness Ratings (Principals only 2010/11) • Evaluation Factors • Postsecondary Reports by High School/District • Enrollment/Credit Accumulation • Remedial Coursework
MI School Data – Current Work • Earliest Priorities: • Migration of Data for Student Success (D4SS) Dynamic Inquiries • Additional dashboard metrics (Best Practices) • K-3 Pupil Teacher Ratio, General Fund Balance, Salaries, Days of Instruction • Additional displays/reports from MSLDS data sources: • Pupil Attendance, Retention in Grade, Pupil Mobility • Usability improvements • “Front Page,” Location Selection, “Sticky Settings” • User Administration Improvements • Early Childhood • More stakeholder discussion required • Additional K-12 • Finance - Source: FID • Staffing - Source: REP • Special Education public reporting and data portrait queries • Top to Bottom Listing of Schools • Postsecondary • Enrollment, Credit Accumulation, & Remediation - User interface • By High School • By Institution of Higher Education • Requirements initiated for additional reports • More stakeholder discussion required • Workforce Reports • Workforce supply/demand study
CEPI Data Quality – Overview “YOUR DATA ARE NOT NECESSARILY WRONG!” The goal of our data quality process is finding ANOMALIES, not ERRORS An ERROR is: “a deviation from accuracy or correctness” An ANOMALY is: “an odd, peculiar or strange condition, situation, quality, etc.” (definitions from Dictionary.com)
CEPI Data Quality – Applications • CEPI has several data collection applications • The Michigan Student Data System (MSDS) • Graduation and Dropout Application (GAD) • Title I Supplemental Education Services (SES) • The Financial Information Database (FID) • The Educational Entity Master (EEM) • The Registry of Educational Personnel (REP) • The School Infrastructure Database (SID) • We will be focusing primarily on the last three databases (REP, SID and EEM)
CEPI Data Quality – Collection Windows • Data are submitted for each of our CEPI Applications during Collection Windows(except the EEM, which is always open for updates) • REP has two collections per year • The End-of-Year (EOY) REP collection is open from April 1 through June 30 • The Fall REP collection is open from September 1 through the first business day in December • The SID collection is once a year from April 1 through June 30
CEPI Data Quality – Process • The data quality process is similar across the applications in the School Data Quality unit • Data Quality runs are completed at three points in the collection • Before the collection opens (pre) • During the collection (mid) • After the collection closes (post) • Started by checking 10-20 items in EOY 2007 • Expanded to over 300 in the REP collection alone for Fall 2011
CEPI Data Quality – PRE collection • Analyzes data from the PRIOR collection • Prior collection data cannot be modified in the current collection window • Identifies data elements that can be improved upon in the current collection • Each district’s authorized users are informed of the findings via e-mail shortly after the collection period opens • Identifies issues in the data structure and tables of the new collection cycle before they are an issue for the districts
CEPI Data Quality – MID collection • Snapshot of data submissions taken with about one month left in the collection window • Identifies anomalies in the current collection • Each district’s authorized users are informed of the findings via e-mail with time to modify the data before the end of the collection window • Identifies issues in the data structure and tables periodically throughout the collection period
CEPI Data Quality – POST collection • Snapshot of data submissions taken immediately after the close of the collection • Identifies anomalies in the current collection now completed • Analysis is completed in about a week • Each district’s authorized users are informed of the findings via e-mail • Data cleansing period takes place allowing the authorized users to modify their data prior to it being used for reporting
CEPI Data Quality – What are we looking for? • System edit violations or table integrity issues • Data values that are anomalies • Values outside of the expected range, but that might not be ERRORS • Values that don’t match other data • Interactions with other data collections • Issues arising out of the whole of the collection • Comparisons to prior submissions
CEPI Data Quality – System Edits • The system of validates each record as it is processed by the system • Ensure required fields are submitted • Ensure that the dependencies with other fields are followed • Most of these system edits are also built into the data quality process • Issues errors and warnings • Errors prevent the record from being saved • Warnings allow the record to be saved, but the data may need to be modified
CEPI Data Quality – System Edits • There are limitations to what the system can validate • Cannot look at the submission as a whole • Cannot look at the prior year’s submission • Cannot have exceptions to the rules • Cannot be as flexible as the data quality process • Several of the items in the Data Quality process have been turned into new system edits
School Infrastructure Database SID Data Quality
SID Data Quality – Basics • Mostly looking for outliers • Issues with Shared Space Entities • Dual Enrollment data in high schools and only in high schools • System Edit Checks
SID Data Quality – Scatter Plots Examine scatter plots of the raw number submitted and the "rate" per student reported
SID Data Quality – Scatter Plots • Identify “outliers” based on different factors • Too high of a number • A building with 4500 incidents of bullying • Too high of a rate • A building with 300 students and 450 incidents of truancy • Some incident types will flag any value reported as an outlier • Homicides • Drive-by shootings
SID Data Quality – Robbery Plot These are the lines indicating the outliers
SID Data Quality – Robbery Plot This line indicates the minimum we want to flag as an anomaly
SID Data Quality – Robbery Plot The five circled points are what have been identified as outliers and feedback will be sent on them
Report of Educational Personnel REP Data Quality
REP Data Quality – Starting out • Started looking at data using Excel and Access • Focused on rules that could not be built into the Application • Started with a dozen checks in EOY 2007 • Grew to 25 checks in Fall of 2007 • Continues to grow each collection • Examples: • Suffixes in First or Middle Name • No Title IX Coordinator Submitted • Too many classes taught by a single teacher
REP Data Quality – Name Issues • Data Quality Checks built on name fields: • Titles in name fields • First Name of “Dr. Timothy” • Last name of “Smith, DDS” • Name changes • Incorrectly submitted Suffixes • First names incorporating “To the Estate of” • Names of “Test Data” and other artificial names used for testing purposes
REP Data Quality – Date Issues • Data Quality Checks built on date fields: • Teachers that are too young • Staff members that are too old • Staff members that are hired too young • Enforcing the order of dates • Birth Date < Hire Date < Termination Date • Terminated records without a valid termination date • Credential Date issues
REP Data Quality – Title IX Issues • Data Quality Checks built on Title IX Coordinator submissions: • No Title IX coordinator Submitted • Title IX coordinator submitted with a full FTE • Title IX coordinator submitted with a terminated status and no other staff member assigned to that position • Have developed over time
REP Data Quality – Current State • For Fall 2011: • Over 300 Checks were run • Districts were notified about 48 different issues • 1381 messages were sent out • 1058 different users of 540 districts received data quality feedback
REP Data Quality – Near Future • Data Quality Checks are being added and improved • Looking improving the following issues: • Grade-Levels of Students submitted in MSDS • Accounting Function Codes and their use in the FID • Data contained in the Michigan Online Educator Certification System (MOECS) • Teacher-Student Data Link (TSDL) related issues
Educational Entity Master EEM Data Quality
EEM Data Quality – Differences • EEM is different from the other collections in that it does not have a window • Data quality is ongoing and periodic • Often checking for data that is not in the correct format • A starting point for using our data profiling tools
EEM Data Quality – Sample Issues • Issues between EEM and other applications • Grades for a student or teacher • Educational Settings • Lead Administrator issues • System edits working • Physical Addresses that do not exist • Data profiling has allowed us to find issues in the contents of the data where they might not be in a consistent form
EEM Data Quality – Profiling Finds • Fields that contain both the descriptive value and the code value in the same field • County records that contain both “Wayne” and “81” referring to the same thing • Leading zeros or spaces in a text field • State entries of “_ _ _ _ MI” • Congressional Districts of “1” “01” and “001” • Zip Code formatting • Zip+four containing the dash or not? • Capitalization inconsistencies
Questions and Answers CEPI Data Quality