130 likes | 171 Views
Handling large datasets becomes a major problem in the Statistical Analysis because of the inferring invalid results. However, recently with the advent of computational strategies, researchers are involved in handling big data in easier way particularly using Hadoop and MapReduce techniques. There are lot of scope for the Data Scientist to handle big data through machine learning and deep learning techniques. Statswork offers statistical services as peru202ftheu202frequirementsu202fof the customers. When you Order statistical Services at Statswork, we promise you the following u2013 Always on Time, outstanding customer support, and High-quality Subject Matter Experts.u202fu202f<br><br>Contact Us:<br><br>Website: www.statswork.com<br><br>Email: info@statswork.com<br><br>UnitedKingdom: 44-1143520021<br><br>India: 91-4448137070t<br>tt<br>WhatsApp: 91-8754446690<br><br>
E N D
Research paper Analysis and Prediction of Income and Economic Hierarchy on Census Data using Data Analytics TAGS- Machine-learning, Data Analytics, data analytics, Statistical Analysis, Statistical data analysis, Data Interpretations, Cluster Analysis, Statistical Analysis, Data Analytics. SERVICES- Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics Copyright © 2019 Statswrok. All rights reserved
The greatest trouble in the machine- learning field is the availability of clean and high quality datasets. Demographic data constitutes a major role in the economic growth of the nation. It helps in finding the income growth of the people, how much are from urban and rural areas and how educated every person in the nation. INTRODUCTION Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
Example: The data analytics method to predict the income and economic hierarchy on the census data obtained from Kaggle Sharath et al (2016) . The dataset involve 3.5 million U.S. households consists of their education, work, transportation they use, usage of internet, etc. Before analysing the data, the main pre-requisite is that the data must be normalized for performing Statistical Analysis. Hadoop is used as a first stage for the large dataset and PIG MapReduce is adopted for the normalisation of the dataset. . DATA ANALYTICS IN PREDICTING THE INCOME AND ECONOMIC HIERARCHY ON CENSUS DATA ANALYTICS. . . Later, the statistical analysis is performed and the results are interpreted. . Copyright © 2019 Statswrok. All rights reserved MOCHA RINTO COACHING | 2020 Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
Copyright © 2019 Statswrok. All rights reserved Aim of the study Gender distribution against occupation Relationship between education and salary Economic hierarchy and prediction of classes Plotting theoretical versus the actual values for Benford’s Law Mean and Median of Income using Heatmap . . . . . Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
HADOOP LOAD Huge Dataset PIG for MapReduce Fig. 1: Step by step procedure NORMALIZED DATA Data Mining/ statistical Analysis Graphical Representation and Interpretation Final Processed Data Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
Personal Care Personal Care 80% 88% 78% 75% Building Cleaning Building Cleaning 75% 70% 68% 56% Farming, Fishing Food 65% 78% 58% 23% Farming, Fishing Food 55% 40% 35% 30% 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 (a) Before Normalization (b) After Normalization Fig. 2: The importance of the normalization before proceeding with the Statistical data analysis Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
RESULTS & DISCUSSIONS .... dataset and this can be handled to avoid invalid results. It is noted that the percentage of men in farming and fishing industry is found to be increased for 3.8% after normalization and the percentage of women in that field is also gets increased. This clearly satisfies the need and importance of normalization of the census data. Normalization is used to reduce the execution time and improves the efficiency of the results. Two normalization is done for this purpose but before that there exists few blank entries in the - - First level, the actual data is used without any modifications Second level of normalization, the actual data is inputted and then modified with a suitable mathematical methods. Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
Copyright © 2019 Statswrok. All rights reserved Data Interpretations Fig 3: Depicts the first objective of the study i.e. obtaining the percentage of gender distribution against occupation. Contd... Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
... Percentageofmeninthedalesfieldismoreinnumberthancomparedtoothers. TransportationfieldalsocontainshigherpercentageofmennexttotheSales. Inaddition, percentageofwomenisdistributedalmostequallyinallthe occupationalfields. . Further, inordertoachievethesecondobjective, boxplottechniqueisusedto identifytherelationshipbetweentheeducationandsalary. . Thishelpsinunderstandingtheincomegrowthunderdifferentlevelsofeducation. Contd... Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
. Usually, themoreeducatedpersonwill gethighersalary. However, fromthis graph, Professionaldegreeholdersare gettingmoresalarythanthedoctorate degreeholderswhichisquiteunusual. . Likewise, onecancomparethemedian andquartilesofeachfieldintheboxplot forbetterunderstandingofthelevelof educationandannualsalary. Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
K-MEANSCLUSTERING ... .... Cluster Analysis methods are the useful tool for analysing large dimensional dataset. However, K-means clustering is the most versatile technique for getting valid results. In that sense, in order to achieve the economic hierarchy, k-means clustering technique is adopted for economic income variable, and the distance between each data values and a set of clusters are measured using centroid clustering method and then plotted the cluster against the classes. Even though, the clustering technique is widely used in the literature, the problem of finding the number of clusters still persist. Furthermore, Benford’s law is discussed for plotting the actual versus the theoretical values. Finally, the mean and median of the income across the states is depicted using heat maps. In addition, the time complexity of the performance of the analysis gets decreased by including the level of normalization is also discussed and are tabulated below. Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
Summary Handling large datasets becomes a major problem in the Statistical Analysis because of the inferring invalid results. Recently with the advent of computational strategies, researchers are involved in handling big data in easier way particularly using Hadoop and MapReduce techniques. There are lot of scope for the Data Scientist to handle big data through machine learning and deep learning techniques. The development of the nation is analysed using the census data collected during certain periods. This helps in understanding the growth of the population, wealth of the nation, and understanding the needs for the improvement for the welfare of the nation and people. Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics
Work With Us Contact Us Freelancer Consultant Guest Blog Editor UK: +44-1143520021 INDIA: +91-4448137070 info@statswork.com ( hr@workfoster.com ) Copyright © 2019 Statswrok. All rights reserved Research Planning | Data Collection | Semantic Annotation | Business Analytics | Bio Statistics | Econometrics