200 likes | 415 Views
Wine Informatics. Dr. Bernard Chen Ph.D. University of Central Arkansas. Data science. Data science is the study that incorporates varying techniques and theories from distinct fields, such as Data Mining, Scientific Methods, Math and Statistics, Visualization ,
E N D
Wine Informatics Dr. Bernard Chen Ph.D. University of Central Arkansas
Data science • Data science is the study that incorporates varying techniques and theories from distinct fields, such as • Data Mining, • Scientific Methods, • Math and Statistics, • Visualization, • natural language processing, • and the Domain Knowledge, to discover useful information from domain-related data.
Domain Knowledge in Wine • The quality of the wine is usually assured by the wine certification, which is generally assessed by • Physicochemical, and • sensory tests • The existing data mining researches focus on the physicochemical laboratory tests much more than sensory tests.
Domain Knowledge in Wine • it is very interesting to mine useful information from those sensory testing notes for answering the questions such as • “What makes wine become a 90+ one?”, • “What is the common characteristics shared by 90+ Napa Cabernet sauvignon?”, • “What are the group of the wine share similarities?”, • “What are the characteristics differ the wine from France and Italy?”
Domain Knowledge in Wine • The key to the success of the wine sensory related data science research relays on the consistent reviews from prestigious experts. • Several popular wine magazines provide widely accepted sensory reviews toward wines produced every year, such as Wine Spectator [13], Wine Advocate [14], Decanter [15]
Wine Spectator Review Example • Kosta Browne Pinot Noir Sonoma Coast 2009 • Ripe and deeply flavored, concentrated and well-structured, this full-bodied red offers a complex mix of black cherry, wild berry and raspberry fruit that's pure and persistent, ending with a pebbly note and firm tannins. Drink now through 2018. 5,818 cases made.
Wine Spectator • Our first dataset is compiled from the list of “Top 100 Wines of 2011” [16] by Wine Spectator, a lifestyle magazine that focuses on wine and wine culture. • Their reviews are straight and to the point.
Review Example • Kosta Browne Pinot Noir Sonoma Coast 2009 • Ripe and deeply flavored, concentrated and well-structured, this full-bodied red offers a complex mix of black cherry, wild berry and raspberry fruit that's pure and persistent, ending with a pebbly note and firm tannins. Drink now through 2018. 5,818 cases made.
Our own wine wheel • Based on “Top 100 wines in 2011”, we analyzing all one hundred wine reviews and adding all necessary categories and subcategories, we came out with a total of 547 distinct attributes. • When looking at our finished list, we noticed many cases where groups of attributes were really just permeations of the same thing. • An example would be the following three attributes: FRESHLY-CUT APPLE, RIPE APPLE, and APPLE.
Hierarchical Clustering Venn Diagram of Clustered Data Dendrogram From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt
Clustering Results Suggestions This cluster represents the fruity aspect of new-world wines, focusing on powerful notes of blackberry and black cherry, as well as a commanding finish.
Conclusion • In this paper, we discuss Wine Reviews and how their attributes can play an integral role in grouping different wines together. • We show that when using only the attributes of a wine review, we can aggregate wines together that have similar world region, monetary value, vintage, type, and varietal.
Thanks • Questions?