220 likes | 376 Views
PROJECT ON STATISTICS. Viju Thomas Sridhar Srikanth A.
E N D
PROJECTON STATISTICS Viju Thomas Sridhar Srikanth A
STATISTICS PROJECT REPORT Goal The goal of doing this project is to empower ourselves and to get familiarized with the various statistical techniques used in data analysis . Thereby helping us to do various computations on a given set of data and to reach on various meaningful conclusions. So as to show an understanding in the basic concepts of statistics. In this project we have made an attempt to understand how different cars in the global market produced by various different auto makers vary from each other with respect to their engine capacity, horse power, mileage, transmission etc.
Data collection www.automotoportal.com www.carfolio.com www.autocarindia.com The manufacturers we considered were BMW VOLVO GENERAL MOTORS- CHEVEROLET MERCEDES NISSAN HONDA SUZUKI TOYOTA HYUNDAI FORD LEXUS
Quantitative Attributes Chosen. Engine Capacity (cc) Brake Horse power (BHP) Mileage (kilo meter/liter of fuel) Top Speed (Kilometer/hour) Qualitative Attributes Chosen . Gear Transmission (Automatic/ Manual/Both) Segment (Sedan/SUV/MUV) Fuel Type (Petrol/Diesel/Both) ATTRIBUTES CHOSEN
DATA ANALYSIS Frequency Distribution of engine capacity The above given table represents the frequency distribution of Engine Capacity measured in cubic capacity. Here the classes are chosen with class width of 600 units. With the first class starting from 0 to 1200 and going up to 6600 units The frequency distributions of the cars are done in respect to the above taken classes.
Measures of Central tendencies Mean = Σfx/Σf, where f is the frequency and x is the midpoint of the class intervals. where: L = lower limit of the interval containing the median I = width of the interval containing the median N = total number of respondents F = cumulative frequency corresponding to the lower limit f = number of cases in the interval containing the median Mode = Lmo +(d1/(d1+d2))*w Where: Lmo Lower limit of the modal class d1 frequency of the modal class minus the frequency of the class directly below it d2 frequency of the modal class minus the frequency of the class directly above it w width of the modal class interval
Histogram From the histogram we can infer that the maximum number of cars in the data collected belong to the 4th class i.e. with an engine capacity ranging between 2400 cc to 3000cc
The frequency polygon constructed helps us to sketch the distribution of the engine capacities of the cars much more clearly.
The ogive shown is constructed using the cumulative frequency. Here we are showing a less than ogive curve .If we take a point on the curve and connect it to the x- axis and then to the corresponding point on the y- axis. It helps us to infer the total number of cars that would lie below the corresponding class of engine capacity given in the x-axis.
Representation Of Frequency Distribution Of Qualitative Data Qualitative data if it has to be represented graphically, doing it on a pie- chart is the best way to do it. As this kind of representation clearly gives the reader an idea about what percentage of the data under study belongs to which category. Here in our data set we have taken totally three attributes which are qualitative. Out of which we have chosen the Fuel Type to be represented graphically.
Probability Distribution of Transmission with respect to the Horse power
Find the probability that the selected car has an automatic gear system? • Total number of cars with automatic gear system is =22 • Total number of cars =43 • Therefore, probability that a selected car has a gear system in it is =0.5116 • So there is a 51.16 % chance that the selected car has an automatic • gear system in it. • Find the probality that a selected car with a manual gear system • has a horse power of 175 bhp. • Total number of cars with manual gear system = 18 • Cars falling in the class with horse power of 175 bhp = 6 • Hence probability that a selected car with a manual gear has a horse power • Of 175= 0.3333 • 33.33% chances are there that a selected car would have a manual gear system with 175 bhp.
Binomial Distribution • Success defined as picking a car which has mileage above 13 km/l. From the data set we can find the values of the following. • Success event: p = 0.348 • Failure event: q = 0.651 Probability of picking up 6 cars with mileage more than 13 kmpl in 10 trails from the data set. • No of trials: n = 10 • Random variable x = 6 • Probability of (X = x) = nCx * px * q (n-x ) • Therefore, P(X=6) = 0.068 We can say that 6.8% of the time the selected random experiment is true.
Normal Distribution Probability that a randomly selected car from the data set will have a top speed less than 220 • Mean of Top speed =204.34 • Standard Deviation =38.70 • x= 220 • μ = 204.34 • σ = 38.70 • P (x <= 220) = 0.6570 65.70 % of the times a randomly selected car from the data will have a top speed less than 220.
APPLICATION OF CORRELATION
From the graph it is observable that there is a high degree of positive correlation between the two attributes. • The correlation coefficient was found out to be 0.91526. Which means that as the engine capacity increases the horse power also increases. This conclusion led us to apply the concept of regression in the current aspect. • As a result of which we were able to get the regression equation- Y=13.927X + 16.285 • Here Y represents engine capacity and X represents the horse power. • Using this equation we can predict what the engine capacity will be for a given value of horse power. • Eg:- What will the engine capacity be for a car with an horse power of 600 BHP • Y=13.927X+16.285 • Here X=600 • Therefore Y= 13.927*600+ 16.285 • Hence the engine capacity=Y=8372.485 cc • In turn the coefficient of determination was found to be R2 =0.8377