230 likes | 584 Views
Books, Books, and, Yes, More Books. Bryanne Vollmer Kyle Pollich Kim Lor. Books and Written English. Oldest written English that can be understood: 500 years old First “fiction book”: Epic of Gilgamesh (assumed to be written in 3,000 BC)
E N D
Books, Books, and, Yes, More Books. Bryanne Vollmer Kyle Pollich Kim Lor
Books and Written English • Oldest written English that can be understood: 500 years old • First “fiction book”:Epic of Gilgamesh (assumed to be written in 3,000 BC) • Longest (conventionally read) novel: “À la recherche du temps perdu” by Marcel Proust (translated in English to Remembrance of Things Past or In Search of Lost Time) • Average length of word: 5.1 letters • Average number of pages in a book: 360 pages • Originally books written to tell story of daily life • Eventually led to writing fantasies for entertainment • Can be seen that books reflect the world in how they come from imagination as well as facts
Purpose of Our Data Collection • We wanted to find out: • Is part of speech evenly distributed over books in different book stores? • χ² test for homogeneity • What is the true average amount of pages in books? • If rejected null hypothesis, what is the true interval? • Student’s t-test • Student’s t-interval • What is the true average length of word used in books? • If rejected null hypothesis, what is the true interval? • Student’s t-test • Student’s t-interval
Process of Our Data Collection • Using random number generator on the calculator • In the literature section of the bookstore • Number sections and randomly select one • Number shelves and randomly select one • Number rows and randomly select one • Number books and randomly select one • Take the page numbers within the book and randomly select a page number • On that page number record the first word
Exploratory Data: Parts of Speech Percentages for both stores roughly the same Can conclude the distribution of parts of speech in bookstores is relatively the same
Homogeneity Test • Conditions • Categorical Data • SRS • All expected cell counts are ≥ 5 • Checks • Borders vs. Barnes and Noble • SRS performed • All expected are not ≥ 5 Conditions not met, will proceed with test anyway χ² distribution χ² test for homogeneity
Homogeneity Test • Want to see if part of speech distribution in books in each bookstore are distributed evenly • Ho: μBorders = μBarnes and Noble • Ha: μBorders ≠ μBarnes and Noble • Test Statistic: • χ² = 16.3911 • P-Value: • 2P(χ² > 16.3911 | df = 15) = 0.3565 • Conclusion: • We fail to reject our Ho because our p-value of 0.3563 is greater than α = 0.05. • We have sufficient evidence that the mean distribution of parts of speech in Barnes and Noble is equal to the mean distribution of parts of speech in Borders.
Exploratory Data: Number of Pages Unimodal right skewed, center at mean: 343.84, range: (130, 850) Majority of data lies below the found average, 360 pages. Therefore, we can conclude that the average number of pages within a book is less than 360 pages.
Student’s t-test (number of pages) • Conditions • SRS • Population ≥ 10n • Normal population or n ≥ 30 • Checks • SRS performed • More than 1000 books to sample from • 100 ≥ 30 Conditions met Student’s t-distribution Student’s t-test
Student’s t-test (number of pages) • Using a site that gave average lengths of books, found average number of pages in a book to be about 360 pages • Ho: μx = 360 pages • Due to our observations: • Ha: μx < 360 pages • Test Statistic: • t = -1.176 • P-value • P(t < -1.176 | df = 99) = 0.12 • Conclusion: • We fail to reject our Ho because our p-value of 0.12 is greater than α = 0.05. • We have sufficient evidence that the average number of pages in a book is equal to 360 pages.
Exploratory Data: Length of Word Unimodal right skewed, center at mean: 3.59, range: (1, 11) Majority of data lies below the found average, 5.1 letters. Therefore, we can conclude that the average word length within a book is less than 5.1 letters.
Student’s t-test (length of word) • Conditions • SRS • Population ≥ 10n • Normal population or n ≥ 30 • Checks • SRS performed • More than 1000 books to sample from • 100 ≥ 30 Conditions met Student’s t-distribution Student’s t-test
Student’s t-test (length of word) • Using a site that gave average word length, found average length of a word to be 5.1 letters • Ho: μx = 5.1 letters • Due to our observations: • Ha: μx < 5.1 letters • Test Statistics: • t = -7.194 • P-value • P(t < -7.194 | df = 99) = <0.0001 • Conclusion: • We reject our Ho because our p-value of 0 is less than α = 0.05, • We have sufficient evidence that the average length of a word within a book is less than 5.1 letters.
Student’s t-interval (length of word) • Test Statistic • (3.17351, 4.00649) • Conclusion: • We are 95% confident that the true mean of word length is between 3.17 and 4.00 letters.
Application • Based on our tests we can conclude: • Within bookstores, parts of speech is evenly distributed within the books • Average length of a book is about 360 pages • Range of word length within books is between 3.17 and 4.00 letters
Possible Bias/Error • New release sections • Featured title/author sections • Some specific genres that fall under fiction were not in the fiction section • Mystery, romance, science fiction • Bookstores ordered differently
Personal Opinions • Collecting the data was annoying • Multiple stage randomization • Over randomization • People gave questionable looks • “Super nifty” • Fun to see random words • Pelican • Dishwasher • Blomkvist (an apparent last name) However, overall it was interesting to see the similarities between book chains and to apply the facts we found out in real life.
Class activity • Number books you have • Randomly select one with random integer on calculator • Check the number of pages • Randomly select page number with random integer on calculator • Find the first word on that page • Give us the data