800 likes | 1.07k Views
Chaos To Order. CAIR 2017. Integrating National Clearinghouse Data into a Data Warehouse. Office of Institutional Effectiveness Humboldt State University. Data Warehouse: Strategic Data Repository - SDR. Presenters:
E N D
Chaos To Order CAIR2017 Integrating National Clearinghouse Data into a Data Warehouse
Office of Institutional EffectivenessHumboldt State University Data Warehouse: Strategic Data Repository - SDR Presenters: Ward Headstrom, Data Scientist Ronda Stemach, Data Administrator Michael Le, Research Associateward.headstrom@humboldt.eduronda.stemach@humboldt.edumichael.le@humboldt.edu
National Student Clearinghouse Who are They? 1 CAIR2017
National Student Clearinghouse Who are They? 1 • Nonprofit founded in 1993 • Educational Reporting • Verification and Research Services • 3600+ post-secondary institutions (93%) CAIR2017
National Student Clearinghouse Who are They? 1 How do they provide Value? 2 CAIR2017
National Student Clearinghouse How do they provide Value? 2 CAIR2017
National Student Clearinghouse Who are They? 1 How do they provide Value? 2 What questions can I answer? 3 CAIR2017
National Student Clearinghouse What happens to students admitted to my University but ultimately do not attend? How many of my students who leave before graduating get a degree from another institution? What is the migration of our local high school students who choose to continue to college? What types of degrees do HSU students earn from other institutions? CAIR2017
NSC Summary Report CAIR2017
What happens to students admitted to my University but ultimately do not attend? CAIR2017
Overview of Getting Student Tracker Data • Create and Upload Data Request File to NSC • Download Student Tracker data files • Add student tracker data to database • Aggregate data into a dimension that connects to one or more of our data models CAIR2017
You will need • an FTP account (shared with others on your campus) • an Individual account (assigned by campus admin)
Student Tracker Request File Types Our Request Files • SE - student enrollment • DA - declined admission • both accept names, birthdates, starting dates, and IDs • SE - all students who have left the university ever, using the starting date at the end the student’s last semester (62k). • DA - all undergrads who applied during the last 10 years (177k). CAIR2017
Upload File Format • Fixed-width text file or tab delimited text file (Excel) • Requires a header and footer descriptive row • File layout specifications provided CAIR2017
Uploading the File to FTP CAIR2017
Confirmation Receipt CAIR2017
Warnings? Errors? CAIR2017
File Validation CAIR2017
when data is ready Email Notification CAIR2017
FTP text file to UNIX server Define external text file Copy data to indexed Oracle table Move the raw data into our Oracle database CAIR2017
External table definition create table clearinghouse_deg ( credential varchar(85), lev varchar(2), levname varchar(25) ) organization external ( type oracle_loader default directory ujob_prod_data_dir access parameters ( records delimited by newline skip 1 fields terminated by ' ' ) location ('CREDENTIAL_LEVEL_LOOKUP_TABLE.txt') );
Moving data to Oracle table truncate table clearinghousedeg_sdr; insert into clearinghousedeg_sdr select credential, lev, substr(levname,1,length(levname)-1) levname from clearinghouse_deg / CAIR2017
NSC Detail Data CAIR2017
Challenges posed by detail data • Missing data - students not found • Extra data - questionable enrollment records • Incorrect school codes • Confusing degree information • Many rows per student (average=9, max=216) SE - 306K records, DA - >1M records CAIR2017
NSC Control Report CAIR2017
students not found Missing Data • FERPA blocking or partial blocking • Schools not reporting (122 with over 1,000 students) • No match - proprietary algorithm only has name and birthdate to match on • Old data - NSC founded in 1993 • 372 schools gave degrees without prior enrollment • 193 students received degrees from HSU without enrollment • Reported enrollment data rejected because of duplicate SSN • ?? Some other reason ?? 1.3% of students we reported were not found. 1.5% of degrees we reported were missing (only .2% FERPA)
Extra enrollment records • Enrollment at multiple institutions reported at the same time. L 04-JAN-10 19-MAR-10 UNIVERSITY OF OREGON F 19-JAN-10 14-MAY-10 HUMBOLDT STATE UNIVERSITY • Short term enrollment • Extended education enrollment • Enrollment reported and then withdrawn. • Student exchange - reported by home school CAIR2017
Incorrect School Codes • Same school with different code in NSC data 041271-00 UNIVERSITY OF CALIFORNIA - MERCED 001313-00 UNIVERSITY OF CALIFORNIA-MERCED • Duplicate codes in CSU cross-walk table 001059 C30000040 Tuskegee U 001059 C30000047 Lawson ST Comm Col • Codes missing from the CSU cross-walk table 1 out of 7 schools missing from COSAR 04 8% of applicants go to a school with missing code CAIR2017
Confusing Degree Information • 1,767 credential values in our DA detail data • downloadable file of credentials from NSC contains over 86k different values and “levels” ... ACHELOR BD Bachelors Degree ACHELOR APPLIED SCIENCE BD Bachelors Degree ACHELOR ARTS & SCIENCE BD Bachelors Degree ACHELOR HEALTH SCIENCE BD Bachelors Degree ACHELOR OF APPLIED SCIENC BD Bachelors Degree ACHELOR OF ARCHITECTURE BD Bachelors Degree ... • still over 900 credentials in DA missing from NSC file CAIR2017
Credential levels from NSC LEV LEVNAME COUNT(*) AD Associate Degree 11784 BD Bachelors Degree 54064 CR Credential 452 DP Doctoral-Professional 233 DR Doctoral-Research 111 MD Masters Degree 4088 PC PostBach Certificate 16 PD Postsecondary Diploma 39 UC Certificate 4840 (missing) 929 CAIR2017
My Strategy for getting degree data • put NSC credential file in our data warehouse • create view that outer joins detail data to NSC credentials • make a guess at the degree level from the degree name • use my guess when credential code cannot be looked up CAIR2017
Function Definition create or replace function nscdeg (flevvarchar,fdeg varchar,f2yr varchar) return varchar2 as -- return NSC degree level if it exists or a guess from the degree name if it doesn't result varchar(6); begin if flev is null then return case when fdeg like 'A%' then '3-AA' when fdeg like 'B%' then '4-BA' when fdeg like 'M%' and fdeg not like 'MAT*' then '5-MA' when fdeg like '%DOC%' or fdeg like 'PH%' then '6-PHD' when f2yr='2' and void(fdeg) is not null then '3-AA' when void(fdeg)='N' then '2-CERT' else '0-NONE' end; else select decode(flev,'UC','2-CERT','AD','3-AA','PD','3-AA','BD','4-BA','CR','4-CRED', 'MD','5-MA','DP','6-PHD','DR','6-PHD','4-??') into result from dual; return result; end if; end; / CAIR2017
View Definition create or replace view clearinghouseDA_s1 as -- did not attend clearinghouse records with base as (select rpad(date2term(search_dt)||id_,13) nscda_key, substr(college_code,1,6) colcode, substr(college_code,1,6)||college_state||jc_or_4year||college_typecolinfo, id_ id,id_ emplid, rpad(date2term(search_dt),4) term_app, case when void(begin_dt)='N' then rpad(date2term(begin_dt),4) end term, case when void(grad_dt)='N' then rpad(date2gradterm(grad_dt),4) end term_grad, substr(nscdeg(c.lev,d.degree,d.jc_or_4year),1,6) deg, rank() over (partition by id_,search_dt order by begin_dt) rnk, d.*,c.lev,c.levname from sa.clearinghouseda_sdr d, sa.clearinghousedeg_sdr c where d.degree=c.credential(+)) select b.*,c.*, case when seqno=1 then college_name end firstcol0 from base b,dv_college c where b.colcode=c.college_key(+) / CAIR2017
Maximum Degrees Earned 0-NONE 98,362 2-CERT 56 3-AA 7,104 4-BA 48,460 4-CRED 392 5-MA 3,993 6-PHD 344 CAIR2017
Take Away With 3,600+ schools reporting enrollment and degrees over a period of 25 years and inadequate data validation, there is bound to be bad and missing data. Use NSC data with a grain of salt. It is not completely accurate, but still can provide some useful insights. CAIR2017
Addressing Volume of Data Group data by student and application term A few of the aggregate fields we derive: BA_COL – college where student first earned a Bachelors FIRSTCOL – first college a student attended MAXDEGREE – highest degree earned TERM_BA – term when student earned first bachelors TERM_FIRSTCOL – term when student first attended college CAIR2017