150 likes | 231 Views
Occupational Factors Affecting the Income of Canada ’ s Residents in the 1970 ’ s. Group 5 Ben Wright Bin Ren Hong Wang Jake Stamper James Rogers Yuejing Wu. Data Source: Census of Canada. Collected by Canadian Government in 1971 102 different occupational categories
E N D
Occupational Factors Affecting the Income of Canada’s Residents in the 1970’s Group 5 Ben Wright Bin Ren Hong Wang Jake Stamper James Rogers Yuejing Wu
Data Source: Census of Canada • Collected by Canadian Government in 1971 • 102 different occupational categories • 4 occupational categories had incomplete data • Categories represent data aggregated over 1000’s of employees Definition of variables - • Gender: % of women in occupation • Years of Education: Average number of years of education per worker • Job prestige: rating assigned based on social survey conducted in the mid-1960’s • Job types: • Blue collar (e.g. janitor) • Professional (e.g lawyer) • White collar (e.g. insurance agent)
What factors affected the occupational income of Canada’s residents in 1971? • Step1: Data preparation • Removal of incomplete observations • (4 types of employment were not classified into a type: baby sitters, athletes, newsboys, and farmers) • Removal of non-descriptive statistics • (Census code)
Step2: Exploratory data analysis • Professional occupations have higher average income, prestige scores, and years of education of than blue and white collar jobs • White collar jobs (on average) employ a larger percentage of women
Step3: pair-wise scatter plot to see the relationships between variables +.57 +.87 +.57 -.45 +.70 -.45 +.70 +.87
Step4: Linear regression • Data output R2 = 0.9023 F-stat: 120 P-value: < 0.00000000000000022
Step5: Test the validity of linear regression: Normality? Data is skewed towards higher incomes
Step5: Test the validity of linear regression: Heteroskedasticity? Variance is not constant R2 = .90 Data is heteroskedastic -> need to perform data transformation
Step6: Log Transformation (log income) Approximates a normal distribution
Results of linear regression on log transformation education is not a significant variable and can be removed from the model
Are different models needed for different ranges of variables? Linear relationship Linear relationship • Variables: • Women • Prestige • Type Linear model explains the entire range of observations
Outliers affecting the model Possible outliers Model may not account for a variable which explains these data points
Model disregarding outlier The total sum of squared residuals is further reduced by removing outliers
Final Model This means that regardless of your job type, if you switched between jobs with the same level of prestige (e.g 62) to one which had a lower percentage of women (e.g. 57% to 10%), you could increase you income substantially (~$3,500)
Conclusions The level of prestige (more than education) associated with a particular occupation best describes the income it will earn Occupations which employ a higher percentage of women will offer a lower income Job type (i.e. b.c., w.c., or prof) can be used to explain income differences between occupations