370 likes | 553 Views
Center for African Studies The MasterCard Foundation Scholars Program Database Project Final Presentation. PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochen shan Yigang Wang Guo Yu Ying Yang Ye Zhong. Client Overview. UC Berkeley Center for African Studies (CAS)
E N D
Center for African StudiesThe MasterCard Foundation Scholars ProgramDatabase Project Final Presentation PRESENTED BY TEAM 3 Min Lin Minchao Lin Tian Liu Yifai NG Haochenshan Yigang Wang Guo Yu Ying Yang Ye Zhong
Client Overview • UC Berkeley Center for African Studies (CAS) • The MasterCard Foundation Program • Provide comprehensive support for Education • For economically disadvantaged students from developing countries in Africa • Financial, academic, social, and career counseling • 2012-2020, 25 new students per year • Client Need • A new database help better manage the program • Specialized in tracking financial transactions • Also store data andinformationrelated to the program • Financial, academic, social, career
Project Review DP I DP II • Familiarized with clients and understood their needs • Created 1st version of Simplified EER diagram • Set up schedule for the project • Expanded and Revised EER diagram • Conceptualized 5 queries • Developed Relational Schema DP III Final Presentation • Finalized EER Diagram • Created database in Access and implemented relations into Access • Altered Queries and developed SQLcode • ImplementedQueries into Access • Modified relational Schema • Did Normalization Analysis
Relational Schema 1. Person (PID, Lname, Fname, MI, Birth_Date, Nationality, Gender, Primary_Phone_No, Address) 1a. Alumni (Alumini_PID1, Occupation, Company, Degree, Graduation_Date, Admission_Date) 1b. Student (Student_PID1, SID) 1bb. Pre_Student(Student_PID1, Financial_Background, SAT_Score, TOFEL_Score, High_School_GPA, No_HighSchool_Honor, No_HighSchool_Awards, HID9) 1bc. Student_current_status(SID, Mentor_Employee_PID1, Class_Level, Expected_Graduation_Date, App_Date, College_GPA) 1c. Employee (Employee_PID1, Position) 2. TransitionAccount (TID, Amount, Time, Year, Month, Date, PC_ID3) 2a. Withdrawal (W_TID2) 2aa. Check (Check_TID2, Description, Responsible_Person, Account_No) 2aaa. Account_Detail (Account_No, Routing_No) 2ab. Bearbuy (Bear_TID2 ) 2ac. Cash (C_TID2, Responsible_Person) 2ad. BlueCard (BC_TID2) 2b. Deposit (D_TID2) 3. Personal_Card (PC_ID, Year, Month, Day, Time, Amount, Reimbursed, Employee_PID1) 4. Expense (EID, Amount, Year, Month, Day, Description, PMID5) 4a. InternalExpense (I_EID4) 4ab. Tuition (T_EID4, Semester, Degree_Class) 4ac. CourseMaterial (C_EID4, Course_Name, ISBN) 4b. ExternalExpense (E_EID4, OID5) 4ba. Summer_Winter_Housing (SWH_EID4, Address, Start_Date, Finish_Date) 4baa. Summer_Winter_Housing_Timeline (Start_Date, Finish_Date, Duration) 4bab. Housing_Information(Address, Landlord, Agent,Contact_Phone_No) 4bb. External_Course_Material (EC_EID4, Course_Name, ISBN) 4bc. SEVIS_Fee (S_EID4) 4bd. Travel_Airline_Ticket (TAT_EID4, Flight_No, Airline_Name) 4bda. Ticket_Detail (Flight_No, Airline_Name, Date, Departure_Location, Destination, Class) 4be. Office_Supplies (O_EID4, Supply_Name, Quantity, Lead_Time, Price, Discount) 5. Order (OID, Total_Amount, Description, Product_Name, Quantity, Price, Date,Req_Student_PID1) 5a. OnlineOrder (OOID5, Tracking_No, Discount, OSID) 5b. NormalOrder (NOID5) 6. Item (NOID5, IID, Iname, Description) 7. OnlineShop(OSID7, Website, Shipping_Method) 7a.Website_Email_Address (Website, Supplier_Email_Address ,Description)
Relational Schema (Continue) 8. Course (CID, Semester, Professor, Final_Date, Description, Ctitle) 8a. GE (GE_CID8, Category) 8b. LowerDivision (L_CID8) 8c. UpperDivision (U_CID8) 8d. Sections(CID, Section_Number) 8e. Ctitle (CID, Ctitle) 9.High_School(HID, Name, School_Type, Year_Founded, Relidious_Affiliation, Academic_Calender, Setting, Student_Faculty_Ratio, Total_Enrollment_No, Gender_Ratio, College_Enrollment_Rate) 10.Job (JID, Position, No_Employee, Start_Date, Finish_date, Work_On_Alumini_PID1, Work_on_Student_PID1) 10a.Job_Detail (Position, Job_function, Location, Salary, Major_Req, Skill_Need, Description, Hour_Per_Week, Language_Need, Visa_Req) 10b.Job-Timeline (Start_Date, Finish_date, Duration) 10c. Full_1Time (F_JID10, Degree_Level, Exp_Req, Pre_I_JID10c) 10d. Part_Time (P_JID10, Degree_Level, Exp_Req) 10e. Internship (I_JID10, Class_Level, Referrer) 11.Company (Company_ID, Company_Name, Description, Company_Type, Contact_Phone_No, Size, Industry_Type, Email_Address, Website, City, Street, Zip_Code) 12.Country (Name, No_Student) 13.Event (Event_ID, title, Start_Date, Finish_date, Duration, No_People, Sponsor, City) 13a. CAS_Event (CAS_Event_ID13, Description, Cost) 13b. Other_Event (O_Event_ID13, Description) 13ba. Recruiting_Event. (R_Event_ID13, Industry_Type, Job_Class, Major_Preferred) 14. Grade (Student_PID1, CID8, Grade_Option, Grade, Semester) 15. Survey (Survey_ID, title, date, description) 15a. Alumni_Survey (A_Survey_ID15, Alumini_PID1) 15b. Student_Survey (S_Survey_ID15, Student_PID1) N to N Relationships: 16.Withdrawal_Pay_Expense (W_TID2, EID4) 17.Employee_Check_Inventory (Employee_PID1, OID5) 18.Student_Take_Course (Student_PID1, CID8, Grade, Grade_Option, Semesters) 18a. Ctitle(CID8, Ctitle) 19.Student_Participate_Event (Student_PID1, Event_ID13) 20.Company_Provide_Job (Company_ID11, JID10) 21.Company_Participate_Recuriting_Event (Company_ID11, Recuriting_Event_ID13) 22.Company_Located_Country (Company_ID11, Country_Name12) 23.Event_Located_Country (Event_ID13, Country_Name12) Multivalue: 24.Person_Email (PID1, E-mail_Address) 25.Alumni_Major (Alumni_PID1a, Major) 26.Student_Major (Student_PID1b, Major) 27.Course_Midterm_Date (CID8, Year, Month, Day)
Decomposing to 1NF and 2NF Course (CID, Semester, Professor, Final_Date, Section_Number, Description, Ctitle) To1NF Course (CID, Semester, Professor, Final_Date, Description, Ctitle) Sections(CID, Section_Number) Student_Take_Course Student_PIDCID Grade Grade_Option Semester Ctitle To2NF Student_Take_Course (Student_PID, CID, Grade, Grade_Option, Semesters) CourseTite(CID, Ctitle)
Decomposing to 3NF Check Check_TID2 Description Responsible_PersonAccount_NoRouting_No To3NF Check (Check_TID2, Description, Responsible_Person, Account_No) Account_Detail (Account_No, Routing_No)
Decomposing to BCNF Course CID Semester ProfessorFinal_Date DescriptionCtitle ToBCNF Course (CID, Semester, Professor, Final_Date, Section_Number, Description) Ctitle (CID,Ctitle)
Query 1: Demand Forecasting and EOQ Object Benefits
Query 1: Demand Forecasting - SQL SELECT Order.ProductName AS Product, sum(Order.Quantity) AS SepTotalQuantity FROM [Order] WHERE Order.Date like "8/*/2013” GROUP BY Order.ProductName; Step 1: Extract the Data from Access. Get the order quantity of a specificproduct over a period. Sampleoutput: A 3*3 matrix include all the product’s order quantity for a Specific period of a specific year
Query 1: Demand Forecasting - Process Step 2: Calculate the seasonalfactor and monthlydemandforecast with a calculator program wrote by Java. Step 3: Putthe result backinto SQL to get the Economic order quantity (EOQ) model.
Query 1: Demand Forecasting - Process Part of Java code
Query 1: EOQ - SQL Checkwhether the inventory is stockedout or not SELECT DISTINCT Order.ProductName, IIF(Item.Quantity=0,“Yes”,“No”) AS StockOut, Round(Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))) AS OptimalOrderQ, IIf(OnlineOrder.ShippingTime>0,Round((OnlineOrder.ShippingTime)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price)),0) AS ReorderPointQuantity, Round(((Sqr(2*(OnlineOrder.ShippingFee)*(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS OrderCycleDays, OnlineShop.Website AS Website, Order.Date AS OrderDate, Order.Date+Round(((Sqr(2*(OnlineOrder.ShippingFee) *(OnlineOrder.MonthlyDemand)/(0.1*Order.Price))/10))*30) AS NextOrderDate FROM Order, OnlineOrder, OnlineShop, Item WHERE Order.OID=OnlineOrder.OID AND OnlineOrder.OSID=OnlineShop.OSID AND Item.OID=Order.OID ORDER BY Order.Date; Calculate optimalorderquantity Calculate Reorderpoint Calculate ordercycleand dates Calculate Next order date
Query 2: Academic Performance Object Benefits
Query 2: Academic Performance - SQL Select specific variable and combine with coefficients to obtain result by linear regression model SELECT Student.SID,1.714+0.589*A.Indicator-0.00632*IIf(Student.Degree=“Undergraduate”,1,0)+0.0165*IIf(Student.Gender=“Male”,1,0)+0.000644*Student.SATScore-0.0147*B.NumberEventAttend+0.0528*IIF(Student.Research=“Yes”,1,0) AS ExpGPA From Student, (SELECT Student.SID,Count(internship.JID) AS Indicator FROM Student, Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID Group BY Student.SID Union Select Student.SID,0 From Student Where Student.PID NOT IN(SELECT student.PID From Student,Job,Internship Where Student.PID=Job.PID AND Job.JID=Internship.JID))As A, Determine whether a student have ever attend any internships or not (binary variable) Defined as table A
Query 2: Academic Performance – SQL (Cont.) (SELECT Student.SID,Count(StudentParticipateEvent.EventID) AS NumberEventAttend From Student,StudentParticipateEvent Where Student.PID=StudentParticipateEvent.PID Group BY Student.SID) AS B Where A.SID=Student.SID AND B.SID=Student.SID; Determine how many events a student have attend (numerical value) Defined as table B Step 2: Run linear regression over all the variables, then useAkaike Information Criterion to reduce the model to the most efficient model. Implement with R
Query 3: High School Comparison Object Benefits
Query 3: High School Comparison - SQL SELECT Student.PID, Student.HID, Student.CollegeGPA, Student.SATScore, Student.TOFELScore, Student.[HighSchoolEvents#], Student.[HighSchoolAward#] FROM Student WHERE (((Student.HID)=1)) OR (((Student.HID)=2)); Step 1: Extract the data from Access by SQL Step 2: Calculate the mean of each corresponding category of all admitted students from these two high schools, then use t-test with unequal variances to get the p-value Implement with Excel
Query 3: High School Comparison - Output Step3: Use Holm-Bonferroni method to find out if each difference is significant. Implement with MATLAB. Step4: Sample output from MATLAB [corrected_p, h]=bonf_holm([0.38 0.414 0.0513 0.334 0.257] ,0.5) corrected_p =1.0020 0.7600 0.2565 1.0280 1.0280 h =0 0 1 0 0
Query 4: Category Expense Object Benefits
Query 4: Category Expense - SQL SELECT Student.PID, Sum(Expense.Amount) AS AmountOfSum FROM (OfficeSupply INNER JOIN Expense ON OfficeSupply.EID = Expense.EID) INNER JOIN Student ON Expense.PID = Student.PID GROUP BY Student.PID ORDER BY Student.PID; Step 1: Find the total expense of each student SELECT Count([OfficeSupply Query].PID) AS CountOfPID, Partition([AmountOfSum],0,1100,100) AS Expr1 FROM [OfficeSupply Query] GROUP BY Partition([AmountOfSum],0,1100,100); Step 2: Generate the data for histogram Step 3: Use Report function to generate graphs, and use Access toolbox to generate the Mean, and Standard Deviation of the distribution. Step 4: If, in most cases, the distribution is bell-distributed, we could use 68-95,99.7 rule, aka Three-sigma rule, to set up expense constraints for students.
Query 4: Category Expense - Output Query Results after the first step & The toolbox that could be used to calculate average and variance quickly Results after the second step Count the number of PID to generate a histogram
Query 4: Category Expense - Output Expense Summary Report (Based on Sample Data)
Query 5: Alumni & Employment Object Benefits
Query 5: Alumni & Employment - SQL SELECT Alumni.PID, IIF(Alumni.Degree="Undergraduate",1,0) AS Degree,count(StudentParticipateEvent.EID) AS NumEventAttend, IIF(AlumniMajor.school_of_medicine = "Yes", 1, 0) As Sch_Medicine, IIF(AlumniMajor.school_of_law= "Yes", 1, 0) As Sch_Law, IIF(AlumniMajor.college_of_engineering = "Yes", 1, 0) As Sch_Engi, IIF(AlumniMajor.school_of_optometry = "Yes", 1, 0) As Sch_Opt IIF(AlumniMajor.college_of_natural_resource = "Yes", 1, 0) As Sch_Nat,IIF(AlumniMajor.college_of_letter_science = "Yes", 1, 0) As Sch_Science, IIF(AlumniMajor.school_of_information = "Yes", 1, 0) As Sch_Inf, IIF(AlumniMajor.school_of_social_welfare = "Yes", 1, 0) As Sch_welfare, IIF(AlumniMajor.haas_business_school = "Yes", 1, 0) As Sch_has, Company.c_latitude, Company.c_longitude FROM Alumni, StudentParticipateEvent, AlumniMajor, Company, FullTime WHERE FullTime.JID = Job.JID AND Company.CompanyID=Job.CompanyID AND Job.PID=Alumni.PID AND Alumni.PID=StudentParticipateEvent.PID AND AlumniMajor.PID=Alumni.PID; Step 1: Extract the data from Access by SQL
Query 5: Alumni & Employment - Process • Step 2: Fitting Logistic Regression Model with R • this model will give the result of the probability for predicted variable to be 1 ( which means this person will get employed or not) • Potentially 20 candidate models • Step 3: Model Selection • Cross-Validation • Employment = GPA + Event + Degree • AIC ( Akaike Information Criterion) • Employment = GPA + Event + Degree + School of Information • BIC ( Bayesian Information Criterion) • Employment = GPA • Deviance Selection • Employment = GPA + Event + Degree
Query 5: Alumni & Employment - Process • Step 4: Cut-off Selection • Find a cut of the predicted probability which will let us judge if the predicted value is 1 or 0 • Method: Building Confusion Matrix • Choosing a cut off probability first • Using confusion Matrix to find the best cut off • Base on the result we choose 0.58
Query 5: Alumni & Employment - Process Step 6: Plot
Query 5: Alumni & Employment - Process • Step 7: Creating XML concatenate with KML plot Alumni’s Company on Google Earth • Creating an KML plot for Alumni who graduated from different college • Plot those coordinates on Google earth • Get intuition employment status geographically for each college in UC Berkeley, which will give us an intuition where has higher employment rate for corresponding college’s current student. • Example for School of Medicine
Query 5: Alumni & Employment - Output Step 8: Implement into GoogleEarth
Future Work & Improvements • Create forms & reports • Make our database more user-friendly • Create additional queries: • Other useful Queries • Such as monthly balance check
Q&A Thank you for listening.