280 likes | 441 Views
Program Review Support Tool. Nathan Pellegrin Research Analyst. Goals. Background and purpose of the tool Demonstration Cal-PASS update OLAP Development at Cal-PASS OLAP Success Story: SSPIRE Cube Future development. The Program Review Support Tool. Funded by Hewlett.
E N D
Program ReviewSupport Tool Nathan Pellegrin Research Analyst
Goals • Background and purpose of the tool • Demonstration • Cal-PASS update • OLAP Development at Cal-PASS • OLAP Success Story: SSPIRE Cube • Future development
The Program Review Support Tool • Funded by Hewlett. • Currently being tested by several colleges. • All data is from MIS. • Caveat: figures may not match what is found in locally produced reports due to differences in master data sources and formulae used to derive figures. • Like a “data smorgasbord” and includes • student demographics • course grades • TOP code course hierarchy … the menu will be expanding !
Purpose • Not the product of a mandate or requirement from the Chancellors office. • Not intended to take the place of local tools. • Not intended to drive evaluation activities. • Intended for use by colleges as an optional FREE tool in their program review process. • Obtain feedback from users to scale and improve our data model and OLAP infrastructure.
Cal-PASS Statistics • Over 300,000,000 records • Up to 15 years of data in some regions • Over 7,000 schools, colleges and university members • Over 150 research studies conducted in the last two years • Sixty-six Professional Learning Councils (1,200+ faculty)
Universities (23) • UC . • Davis • Merced • Riverside • San Diego • Santa Barbara • Santa Cruz • CSU . • Channel Islands • Dominguez Hills • Fresno • Long Beach • Los Angeles • Monterey Bay • Pomona • Sacramento • San Bernardino • San Marcos • Stanislaus • San Francisco • San Jose • Sonoma • PRIVATE . • Otis College of Art and Design • National University • University of the Pacific
Changing the Paradigm:OLAP Applications • OLAP = On-Line Analytical Processing • Like Excel pivot tables, except Excel handles only two dimensional data. • Stores pre-computed aggregations of data with B-Tree indexing for delivering fast retrieval times and fast calculation. • Enables users to perform analysis of data quickly with drag-and-drop manipulation of variables and dynamic visualization. • Web-based for easy access – all processing is performed on the server so it does not tie up your work station (zero footprint). • Big time savings! • Ideal for the action research paradigm and design research.
3 Layers of the Application User Interface - Dundas MDX OLAP Cube - SSAS SQL Database(s) – SQL Server
Development Process of the OLAP project is a technical collaboration between IT and Research … • Server Architecture/O.R. – Alex Zakharenkov (IT) • Submission Processes/User Interface – Nick Wade (IT) • Data Model/ETL – Nathan Pellegrin (Research) • Design/Feedback of OLAP cubes - All IT and Research Staff, including Terrence Willett and Mary Kay Patton
Development times • Development of initial Dim Model started in July 2008 … incremental additions/changes congealed into a (basic) model by February, 2009. • Initial development of Program Review, including feedback and changes ≈ 8 weeks. • Dim model ETL execution ≈ 15 hours. • Processing of OLAP cube ≈ 20 min./300K rows . • Initial deployment of UI ≈ 3 weeks. Several changes since then. • UI required tweaks to OLAP cube design.
Development Tools • .NET • SQL Server • storage • Integration Services • Analysis Services • BIDS • Dundas
Analytical • Integrated K-12/CC/Univ • Time-dependent • 2NF (Redundant CK) • Optimized Indexing Dimensional Model Cal-PASS Data Flow ETL Object Repository • Storage • Application Integration • Key-value pairs (KVP) design 0607,01612590000001,0000179441,U,,12,ABARCA,CARLOS,09091988,M,500,,,,,,,,,,,,,,,,,,,01,15,1,,,,,,N,010,Y,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,275,0,0,4,24,0,0,0,0,0,0,,10032006,X,,,,,,,,,,,,,,,,,,,,,,,N,Y,,,8,,,,,,,,,, 0607,01612590000001,0000154281,9107510861,,11,BLACK,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,,,,00,13,1,,,,,,N,000,N,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,302,1,8,6,35,3,15,5,28,4,33,01,10032006,N,340,5,71,12,67,18,90,0,0,7,47,2.5,,,,,,,,,,,N,N,U,72,80,,,,,,,,,, 0607,01612590000001,0000159553,U,,11,BOWIE,EARLISHA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,,,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0,0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,, 0607,01612590000001,0000161233,U,, 0607,01612590000001,0000179441,U,,12,ABARCA,CARLOS,09091988,M,500,,,,,,,,,,,,,,,,,,,01,15,1,,,,,,N,010,Y,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 9107510861,,11,BLACK,BRITNI,11291990,F,600,,,,,,,,,,,,,,,,,,,00,13,1,,,,,,N,000,N,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,01,10032006,N,302,1,8,6,35,3,15,5,28,4,33,01,10032006,N,340,5,71,12,67,18,90,0,0,7,47,2.5,,,,,,,,,,,N,N,U,72,80,,,,,,,,,, 0607,01612590000001,0000159553,U,,11,BOWIE,EARLISHA,10231988,F,999,,,,,,,,,,,,,,,,,,,00,14,1,,,,,,Y,060,Y,,Y,,,,,,,,,,,,,,,,,,,,,,,Y,,,,,,,,,,,,,,,,,,,01,10032006,N,278,3,23,4,24,4,20,0,0,0,0,,10032006,A,,,,,,,,,,,,,,,,,,,,,,,N,N,,,40,,,,,,,,,, 0607,01612590000001,0000161233,U,, ETL CUSTOM FILES Cal-PASS Submission • Semistructured data • Format/value Validation Universities K-12 CSU UC UC UC CCCCOMIS LA CSU Private
What does a dimensional data model do for Cal-PASS? UNIFY: Data from across segments is integrated into a unified dataset. STANDARDIZE: Table and field names, data types and value coding systems are standardized to be the same for all segments. SIMPLIFY:The number of tables and fields used to store the data is reduced. Granularity of tables are at the units of analysis. Table relationships reflect analytical relationships between entities. IMPROVE DATA QUALITY: Merging data brings data quality issues to light so they can be noted and/or resolved. Establishing primary and foreign key relationships enforces referential integrity. Multiple student identifiers are unified to produce a single “metakey” Missing course CBEDS classifications imputed using machine learning. REDUCE RISK: Without it, in order to produce one metric for all segments separate analytical data processing pathways are required for each segment, which means more maintenance and increased risk of inconsistent results. Using a dimensional model the analytical computations and services are centralized. BOOST PRODUCTIVITY: The simpler, cleaner data model makes it easier to develop cubes with re-usable components, generalized for all segments. Currently, analytical data processing must be developed separately for each segment. Using a dimensional model, only one pathway needs to be developed that applies to all segments.
Cal-PASS Unified Dimensional Data Model (Selected Tables) Organization Student Status Student Course Outcome Course Course Taxonomy Award = Fact Table = Dimension Table Term = Foreign Key Relationship
The Ideal: Centralization of analytical query processing Each statistic can emerge at multiple presentation points, but there is only one logical control point. Presentation & User Engagement OLAP Views and Stored Proc’s Dimensional Model Tables User-defined cohorts; model outputs
Student identifiers from each source system are mapped to a new identifier through transitive closure of all connected values (using a modified version of the Floyd–Warshall algorithm). Name + gender + DOB Local district student id CCCCO SID (SB00) CSIS SSID n1 d1 o1 n2 m1 c1 d2 d3 Each edge represents a record linking two values of different identifiers in submitted student records.
OLAP Success Story: SSPIRE Cube • Funded by Irvine Foundation. • Currently used by nine colleges. • Incorporates MIS data with data submitted by colleges (custom files). • Tracks cohorts of students. • Demonstrate using Merced college (thank you Dr. Duran!)
This is only the beginning… Provide access to K-12 districts and Universities Inter-segmental OLAP Cubes Link non-academic outcomes (Employment Development Department, Child Welfare Data System) “Success at Every Level”
Education Data and Information Act of 2008SB 1298 1. convene a high-level working group to decide the best the governance structure for the comprehensive education data system; 2. directs the State Chief Information Office (CIO), in consultation with educators and education policymakers, to prepare a strategy plan outlining a clear path for technical implementation; and3. requires the various education segments to begin using a common student identifier, so that once a governance structure and technical architecture are in place we can begin linking records from pre-k through the university with relative ease and speed. Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/
Early Childcare WORKING DRAFT NOT EXHAUSTIVE Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ In development Direct data sharing* High level cross-agency systems map of key collections Existing Data sharing through local agencies Planned/ potential CDE Data Systems Non CDE Data Systems National Student Clearinghouse AYP/API CASEMIS CASAS TOPS Pro CSU ERS CCC COMIS CALTIDES** Assessments CPEC CALPADS** CALPASS Prisons, Census Migrant CCTC CASE UC CSS ConApps SACS CDPH EDD Other CDE systems/ units including CDS, Charter schools Other CDE units including Homeless, CALSAFE, Title 3, Private Schools etc. From Franchise tax, benefits system etc. * Does not imply direct data linkages. Only state system linkages shown ** CALPADS is envisioned to replace much of the CBEDS, Language Census, Student National Origin Report and select Consolidated Application data *** CALTIDES is envisioned to collect data primarily from CALPADS and Commission on Teacher Credentialing’s CCTC’s Credential Automation System Enterprise CASE system Source: Interviews with respective agencies, RAND, team analysis
DRAFT Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ High level system profiles of key CDE collections (1/2) System name Description Key identifier Data categories Granularity Data sharing CALPADS California Longitudinal Pupil Achievement Data System. System (under development) for tracking K12 students longitudinally, that will replace CBEDS collections SSID Student demographic, program participation, grade level, enrollment, course enrollment and completion, discipline, and statewide assessment Student Planned include- Assessments, API/AYP, Migrant, ConApps, CALTIDES California Longitudinal Teacher Integrated Data Education System. Iintegrated data system for teacher data based on unique SEID SEID Teacher credentials, authorizations, teacher participation program, alternative routes, participation in Beginning Teacher Support and intern program, SEID, Salary Student Planned include- CALPADS, CCTC CASE CASEMIS California Special Education Management Information System. Integrated data system for special education students on students, services and provider programs SSID Attendance/Enrollment, Disciplinary, Education Agency, Mobility, Special Education, Staffing Data, Student Demographic, Other (services, age, gender, race/ethnicity) Student, School district, School, county, region None at state level Assessments California High School Exit Exam CAHSEE, Standardized Testing and Reporting STAR and CELDT SSID Attendance/Enrollment, Education Agency, Food and Nutrition, Parent Data, Special Education, Student Demographic Student, School District, School, County CASAS, Migrant, AYP/API, CALPADS (planned) API/ AYP Accountability related information based on California's Public Schools Accountability Act of 1999 as well as No Child Left Behind Act of 2001 CDS code AYP/API score by student characteristics School Assessments Source: Respective CDE departments
DRAFT Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ High level system profiles of key CDE collections (2/2) System name Description Key identifier Data categories Granularity Data sharing Migrant Student enrollments in migrant education programs. Includes migrant education forms and a directory of offices providing services Migrant ID, COE number, CDS code Student demographics, educational programs, counseling, health and support services, emergency health, clothing, food, transportation Student Assessments, CALPADS (planned) SACS Standardized Account Code Structure. Offers LEAs with a means of reporting financial information CDS code For every general ledger accounting transaction- information on funds, resources, project year, goal, function, and object. Includes information on Attendance/Enrollment, Education Agency, Fiscal, Transportation School, District CDS, Charter schools ConAPPS Consolidated applications. Includes information on categorical programs e.g., Title I, II, V etc. CDS code Student demographic, Title I, III, V, Part A, Immigrant, LEP, funding model, charter status, Gradespan, participants School, District, County CALPADS (planned) Early Childcare Systems CD-801A,B, CDMIS, Special Education Desired Result System SEDRS, and CD 9600 SSID Child demographics, IEP flag, family identification/case number, household name, type of program, DRDP Desired Result Development Profile, Early Childhood Environment Rating Scale ECERS Student None CASAS TOPSPro Comprehensive Adult Student Assessment Systems. System for tracking Students in Adult Education Programs ADA ID, SSID, CASAS no Student demographics, Agency, instruction level and program, assessment scores, date of entry, reason for exit, class number, attainable goal within program year Student Assessments Source: Respective CDE departments
DRAFT Source: http://www.senatorsimitian.com/legislation/entry/sb_1298_education_information_system/ High level system profiles of non-CDE collections System name Description Key identifier Data categories Granularity Data sharing CPEC California Post Secondary Education Commission. Data system for Higher Ed- post secondary systems Student ID based of SSN Demographic, IEP, grade level, program, Graduation rate, teacher, institution Student CDE, CSU, UC,CCC, prison, census UC CSS Corporate Student System provides information on student enrollment and performance for University of California campuses SSN Student demographic, income, financial aid, education history, assessment Student CDE, CCC, CALPASS CCC COMIS California Community Colleges Management Information System. COMIS data is used to prepare reports for Federal and State reports including Integrated Postsecondary Education Data System (IPEDS) and to track student outcomes SSN, Student ID student demographic, income, financial aid, education history, assessment, teacher, institution Student CALPASS, CPEC, CSU, EDD, National Student Clearinghouse CSU ERS Enrollment Recording System is used by Cal State to track student retention and graduation to support regular term reports, IPEDS, and state budget requests SSN Student demographic, financial aid, education history, assessment Student CPEC, CALPASS, CCC CDPH California Department of Public Health. System use to track CDPH ID Case ID and demographics, clinical and diagnostic data Case None EDD Employment Development Database ID based of SSN Wages, payroll taxes, unemployment tracking, job matching, job training Employee Franchise tax, benefits system, CCC Source: Respective agencies, RAND
Thank you! Have fun …. HAPPY DATA! Data Data