230 likes | 296 Views
Data management isu. By Sarah Clark. Thesis:. Competition to get into a Post-Secondary institution has greatly increased and the change in the school curriculum has, and will continue to, affect the graduates of 2003. Survey Facts. 54 % are females, 46 % are males
E N D
Data management isu By Sarah Clark
Thesis: Competition to get into a Post-Secondary institution has greatly increased and the change in the school curriculum has, and will continue to, affect the graduates of 2003.
Survey Facts • 54 % are females, 46 % are males • 54 % are in Grade 12, 46 % are in OAC • The majority of students surveyed want to go to University • College is the second highest choice • More females want to go to University than males • Most people applied to only 3 Universities • Of those people, the majority were males - Females typically applied to more than 3 Universities
Continued . . . • 75 % of those getting 90 - 99% are females • 75 % of those who have a 90 - 99% average are Gr. 12’s • 48 % of the students who have an 80 - 89 % average are OAC’s • Everyone who wants to go to college next year has an average of 70 to 79 % • All students with an average higher than 90 %, and most of those with an 80 - 89 % average, want to attend University • Most OAC’s fall into this group
The Survey Sampling Technique: My survey was basically based on a convenience level since I asked the students in my classes if they would respond to my survey. Bias: Since I received most of my data from this survey through my classes it has a bit of bias. This is because a large number of the students are taking University courses. It is also based on OAC ’s and Gr.12 ’s only. This could be classified as a Measurement bias since this method underestimated some characteristics of the population. Therefore, the results seemed to lean towards those who are interested in University and not so much those who may be working instead of attending a post-secondary institution.
Statistics of one variable • Frequency Distribution Table and Weighted Means • Medians and Modes • Standard Deviations • Z-scores • Percentiles
Marks Freq. Mid Pt. Marks Freq. Mid Pt. Frequency Distribution Table GRADE 12 OAC Weighted Mean = 67.5 % Weighted Mean = 70.8 %
Median and Mode GRADE 12 Median: 257 / 2 = 129th position = 70 - 79 % OAC Median: 229 / 2 = 115th position = 70 - 79 % Mode: Most students are part of the 70 - 79 % range in both grades. In Grade 12 there are 79 students (or 30.8 %) in this range and in OAC there are 63 students (or 27.5 %) with this average. * MMR stats. from January 2003
Standard Deviations Grade 12 = 16.6 OAC = 16.7 This shows that the Grade 12’s averages are slightly less spread than the OAC marks. According to the Binomial Distribution graph, 68 % of my data should lie within one standard deviation of my mean. Grade 12: OAC: 68.5 +/- 16.6 70.8 +/- 16.7 = 51.9 % to 85.1 % = 54.1 % to 87.5 % Therefore 68 % of the students under each grade fall into these ranges
Z- Scores Scenario: A Grade 12 student (“Student A”) and an OAC student (“Student B”) both receive a final average mark of 78 % in Math. Their grade averages are 67.5 % and 70.8 % and the standard deviations are 16.6 and 16.7. Student B z = 78 - 70.8 16.7 = 0.6325 Student A z = 78 - 67.5 16.6 = 0.4311 Therefore, this shows that Student A actually had a better score.
Percentiles In order to receive the 2002 MMR Scholarship last year a student had to be in the 75th percentile or higher. But due to the Double Cohort this year, students must now be in the 80th percentile or above to get the award for 2003. Scenario: Emma received a score of 30 last year and won the award. Based on the matrix below, would she have still earned it if she was in the Double Cohort this year? MATRIX Scores for Year 2003 (Double Cohort):
Solution: Percentile = (# of scores below x) + 0.5 (# of scores = to x) x 100 total # of scores = 30 + 0.5 (1) x 100 40 = 30.5 x 100 40 = 0.7625 = 77th percentile Therefore, Emma is in the 77th percentile based on the information in the Double Cohort year. Due to the increased competition, she does not qualify for the Scholarship this year.
Statistics of Two Variables • Correlation Coefficient • Classifying Linear Correlations • Non-Linear Regressions • Cause and Effect • Venn Diagram
This number means that there is a moderate and positive linear correlation between the number of hours spent on homework and the student’s avg. mark (between 0.33 and 0.67). Therefore, “Y” increases as “X” increases. Correlation Coefficient The correlation coefficient was calculated as 0.484 in Excel (or by taking the square- root of R squared.
Perfect Perfect Strong Moderate Weak Weak Moderate Strong -1 - 0.67 - 0.33 0 0.33 0.67 1 Correlation Coefficient “r” Classifying Linear Correlations Negative Linear Correlation Positive Linear Correlation
This graph shows the affect of the Double Cohort on the # of applications to University • The curve-of-best-fit shown is a Polynomial Regression • I chose this one because its R-squared value was closest to 1 (which means that it is • more accurate in terms of finding the relationship). • With this information we can predict the number of applications for 2004 Non-Linear Regressions
Cause & Effect • Both graphs have a “Cause and Effect” relationship. • Graph A (Homework Hours vs. Marks) shows this Cause-and-Effect relationship because, generally speaking, the more hours you spend doing homework, the better your mark • Graph B shows a Common-Cause Factor because as the population grows over the years, the more applications will be sent in with the growing number of students. • Outliers: • The Double Cohort (the jump in 2003 in Graph B) can be said to be an outlier or an ‘extraneous variable’ because it does not fit with the rest of the data and may skew it
Venn Diagram 50 students took this survey. The results are shown below. Construct a Venn Diagram to show the relationships. • RESULTS: • 19 students are in Athletics • 7 are on the Student Council • 8 participate in School Clubs • 2 are involved in both the Athletics and the Student Council • 1 student is involved with the Student Council and School Club • 2 are part of the Athletics and a School Club • 1 student does all three
Solution: Athletics Student Council 14 2 3 1 2 1 5 School Clubs 22 This shows that 56 % of the students surveyed are involved in the school in some way. 10 % participate in two activities and only 1 person (2 %) are engaged in all three.
Combinations Scenario: Queen’s University is selecting 5 people for their President’s Scholarship. They are choosing from an eligible group of 4 Grade 12’s and 5 OAC’s. a) How many ways can they do this? * Assumption: there are no restrictions and order does not matter * 9 C 5 = 126 ways b) How many ways can they do this by choosing at least 3 Gr. 12’s? Ways = (4 C 3 x 5 C 2) + (4 C 4 x 5 C 1) = 40 + 5 = 45
Probabilities Scenario: Scott applied to 3 Colleges. The probability of a student like him getting into College in 2003 is 75 %. What is the probability of him being accepted to at least one? X = {0,1,2,3} * Assumption: this is a “success”/ “failure” scenario P (x) = (n C x) (p^x) (q^n-x) P (0) = 0.02 P (1) = 0.14 P (2) = 0.42 P (3) = 0.42 P (x > 0) = P(1) + P(2) + P(3) = 98 %
"Graduation" (Conclusion) • There is increased competition, especially due to the New Curriculum (the Double Cohort) Why there may be differences in the marks of the 2 grades: • 11 % of Gr.12’s and 23 % of OAC’s are not involved in anything (in or out of school) • 55 % of OAC’s and 39 % of Gr. 12’s work and/or volunteer for 13 hrs or more per week • Only 4 % of Grade 12’s don’t work or volunteer while this is true for 14 % of the OAC’s • There are more OAC’s (55 %) than Grade 12’s (43 %) who spend 6 or more hours doing leisure activities a week