370 likes | 505 Views
Social Networking Analytics for Calbee (SNAC). Team #6 Bill Cheng Sabina Del Rosso Stephen Hom Omede Firouz Stacy Hsueh Wei Jiang Thoranis Karnasuta. DATABASE. CLIENT. EER/SCHEMA. NORMALIZATION. QUERIES. Client Background: Calbee San Francisco.
E N D
Social Networking Analytics for Calbee (SNAC) Team #6 Bill Cheng Sabina Del Rosso Stephen Hom OmedeFirouz Stacy Hsueh Wei Jiang ThoranisKarnasuta DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES Professor Ken Goldberg. IEOR 115. December 9, 2011.
Client Background: Calbee San Francisco • CALBEE, Inc. is one of the largest snack companies in Japan • Company based on the premise of good health • Calbee, San Francisco is the company’s first US-based flagship store • Founded in early 2011 • Located in Westfield Mall • Active in social media • Website, Facebook, Twitter DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES Image from calbeeshop.com
Current Infrastructure • Currently do not keep track of social media hits on any site • Use Point of Sale for sales data and employee clock-ins DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES Image from http://www.unrealstudio.com
Database Objectives • Handle future expansion into e-commerce • Increase social media marketing in targeted demographics • View effect of promotions on sales and social media to help better cater future promotions • Provide a foundation to maximize profits • Logistic management using integer programming • Data mining and machine learning to predict sales DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
EER Diagram DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Relational Design Schema Relational Design Schema (46 relations) Promotion/Sales/Retail: Relations Numbered 0-9 1. PRODUCT(ProdID, Name, IsSour, IsSweet, IsSalty, IsSavory, ManufCost, RetailPrice) 2. PURCHASE(PurchaseID, ProdID1, PromoID3a, CustID6a, StoreID4a, EmpID5a, Timestamp, ipAddress) 3a. PROMOTION(PromoID, PromoCode, StoreID, StartDate, EndDate, Discount) 3b. PROMOTION_SPREAD_VIA_TWITTER(PromoID3a, TweetID10c) 3c. PROMOTION_SPREAD_VIA_F(PromoID3a, F_CID11c) 3d. PROMOTION_SPREAD_VIA_G+(PromoID3a, G_CID12c) 3e. PROMOTION_SPREAD_VIA_S(PromoID3a, S_DID13c) 3f. PROMOTION_SPREAD_VIA_B(PromoID3a, BPost_ID14a) 3g. PROMOTION_INFO_VIA_W(PromoID3a, url15) 4a. STORE(StoreID,AddressNo,StreetName, City, Country, ZipCode, PhoneNo) 4b. STORE_CARRIES(StoreID4a, ProdID1, Stock) 5a. EMPLOYEE(EmpID, LName, FName, Position, FavProdID1, StoreID4, AddressNo,StreetName, City, State, Country, ZipCode, SSN) 5b. EMPLOYEE_IS_FRIEND(EmpID5a, T_UID10a, F_UID11a, G_UID12a, S_UID13a) 5c. EMPLOYEE_IS_CUSTOMER(EmpID5a, CustID6a) 6a. CUSTOMER(CustID, LName, FName, AddressNo, StreetName, City, State, Country, ZipCode, FavProd1, BirthDate) 6b. CUSTOMER_IS_FRIEND(CustID6a, T_UID10a, F_UID11a, G_UID12a, S_UID13a) 8a. PRODUCT_AD(P_Ad_ID, ProductID1, DateBeginAd, DateEndAd, F_or_G_Ad)8b. STORE_AD(S_Ad_ID, Store_ID, DateBeginAd, DateEndAd, F_or_G_Ad) 8c. F_P_AD_CLICKED(P_Ad, F_UID, Timestamp, ipAddress) 8d. G_P_AD_CLICKED(P_Ad_ID, G_UID, Timestamp, ipAddress) 8e. F_S_AD_CLICKED(S_Ad_ID, F_UID, Timestamp, ipAddress) 8f. G_S_AD_CLICKED(S_Ad_ID, G_UID, Timestamp, ipAddress) DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Relational Design Schema Cont. Social Media: Relations Numbered 10-19 10a. T_USER(T_UID, T_Username, Fname, Lname, City, State, BirthDate, Email) 10b. T_FOLLOWING(T_UID10a, Follower_T_UID10a, DateBeganFollowing) 10c. TWEET(TweetID, T_UID10a, Auth_T_UID10a, TextStr, Timestamp) 11a. F_USER(F_UID, Fname, Lname, City, State, BirthDate, Email) 11b. F_FRIENDS(F_UID11a, Friend_F_UID11a, DateBecameFriends) 11c. F_COMMENT(F_CID, Auth_F_UID11a, On_F_CID11c, TextStr, Timestamp) 11d. F_LIKE(F_CID11c, F_UID11a, Timestamp) 12a. G_USER(G_UID, Fname, Lname, City, State, BirthDate, Email) 12b. G_FRIENDS(G_UID12a, Friend_G_UID12a, DateBecameFriends) 12c. G_COMMENT(G_CID, Auth_G_UID12a, On_G_CID12c, TextStr, Timestamp) 12d. G_LIKE(G_CID12c, G_UID12a, Timestamp) 13a. S_USER(S_UID, Fname, Lname, City, State, BirthDate, Email) 13b. S_FOLLOWING(S_UID13a, Follower_S_UID13a, DateBeganFollowing) 13c. S_DISCOVERY(S_DID, S_UID13a, url, Timestamp) 13d. S_REVIEW(S_DID13c, S_UID13a, TextStr, Like/Dislike, Timestamp) 14a. BLOG_POST(url, BPost_ID, Author_Emp_ID5a, TextStr, Timestamp) 14b. BLOG_COMMENT(BComment_ID, url, BPost_ID14a, TextStr, Timestamp, ipAddress) 14c. ASSOCIATE_IP_T(T_UID10a, Timestamp, ipAddress) 14d. ASSOCIATE_IP_FB(F_UID11a, Timestamp, ipAddress) 14e. ASSOCIATE_IP_G(G_UID12a, Timestamp, ipAddress) 14f. ASSOCIATE_IP_S(S_UID13a, Timestamp, ipAddress) 15. MAIN_WEBSITE(url, link_to_html_file, Timestamp) DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Relational Design Schema Cont. Other Data: Relations Numbered 20-29 20a. GOOGLE_TREND(GT_ID, word, city, country, day, hits) 20b. RELATED_TREND(word, Related_Prod_ID1) DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Access Table Relationships DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Access Table Relationships Cont. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Normalization Analysis: 1NF Removal of a multi-valued attribute (flavor): DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Normalization Analysis: 2NF Removal of a partial FD: {PromoID} {PromoCode, StoreID, StartDate, EndDate, Discount} DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Normalization Analysis: 3NF Removal of a transitive FD: {T_UID} {T_Username, Fname, Lname, City, State, BirthDate, Email} DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Normalization Analysis: BCNF Removal of a FD with a non-superkey attribute on the LHS: {PromoCode} {StartDate, EndDate, Discount} DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 1: Popular Product Stock Find out the most talked about products in a city and their quantities (stock). This will help us determine which products to move around to balance inventories in expectation of sale increases. Data can be exported to a solver to do a shipment problem. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 1: SQL SELECT Product.ProdID, Product.ProdName, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*') AS Hits, Store.City, Store_Carries.StoreID AS Store, Store_Carries.Stock AS Stock FROM Product, Store, Store_Carries WHERE (((Product.ProdID)=[Store_Carries].[ProdID]) AND ((Store.StoreID)=[Store_Carries].[StoreID])) ORDER BY Product.ProdName; DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 1: Output DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 1: Data Analysis • We have a list of stores and their stock of different products • Transportation problem to encourage similar levels of stock • Minimize shipments, shipping costs, etc. • Subject to: No outliers (stores with low stock) Possible shipment constraints Possible traffic constraints Etc. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 1: Data Analysis (AMPL) DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 1: Data Analysis DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 2: Promo Social Networking Consider a promotion. Compare product social network comments in a given city two weeks before, during, and two week after a promotion to judge its effectiveness. Order by the return on the investment. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 2: SQL SELECT Promotion.PromoID, (SELECT COUNT(*) FROM F_Comment, Product WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND Product.ProdID = Promotion.ProdID AND F_Comment.Timestamp < Promotion.StartDate AND F_Comment.Timestamp > Promotion.StartDate - 14) AS HitsBefore, (SELECT COUNT(*) FROM F_Comment, Product WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND Product.ProdID = Promotion.ProdID AND F_Comment.Timestamp < Promotion.EndDate AND F_Comment.Timestamp > Promotion.StartDate) AS HitsDuring, (SELECT COUNT(*) FROM F_Comment, Product WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND Product.ProdID = Promotion.ProdID AND F_Comment.Timestamp < Promotion.EndDate + 14 AND F_Comment.Timestamp > Promotion.EndDate) AS HitsAfter, (SELECT SUM(Promotion.Discount*Product.RetailPrice) FROM Product) AS PromoCost FROM Promotion ORDER BY PromoCost; DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 3: Priority Customers Use friendship data to rate friends by how many recommendations they have made. Determine how many of a person's friends became friends with us after they became friends with us. In this way, we identify possible priority customers of Calbee to target for special advertisements and promotions. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 3: SQL SELECT F.F_UID, ( SELECT COUNT(*) FROM F_Friends AS F2 WHERE F2.F_UID = F.F_UID AND EXISTS( SELECT F3.DateBecameFriends FROM F_FRIENDS F3 WHERE F3.Friend_F_UID = 1 AND F3.F_UID = F2.Friend_F_UID AND F3.DateBecameFriends > F.DateBecameFriends)) AS friendCount FROM F_Friends AS F WHERE F.Friend_F_UID = 1; DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 3: Data Analysis DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 4: Google Trends and Stocks Determine priority stores which don’t stock products that they should, as determined by google trend word popularity. For a given google trend word, find the top 5 cities in which the word is most searched in year 2011. Then, find stores in those cities and which related products they do not stock. This will help us identify how to improve inventory. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 4: SQL SELECT Store.StoreID AS Store, Store.City AS City, Product.ProdID AS Prod FROM Store, Store_Carries, Product WHERE Store.City IN (SELECT TOP 5 Google_Trend.City FROM Google_Trend WHERE Google_Trend.Word = 'test') AND Store_Carries.StoreID = Store.StoreID AND Store_Carries.ProdID = Product.ProdID AND Store_Carries.Stock = 0 AND Product.ProdID IN (SELECT Related_Trend.Related_Prod_ID FROM Related_Trend WHERE Related_Trend.Word = 'test'); DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 5: Social Network and Purchases Gather Social Networking, Google Trend, and Purchase data over time to formulate predictive models. For a given product, find the number of social network hits of a product, the related trend word hits, and the number of purchases in that product for a given city on a given day. In this way, we can use social network 'buzz' and trend data to predict purchases as a function of time and city. Order by product then timestamp. DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 5: SQL SELECT Product.ProdID, Product.ProdName, Purchase.Timestamp, (SELECT COUNT(F_Comment.F_CID) FROM F_Comment WHERE F_Comment.TextStr LIKE '*' + Product.ProdName + '*' AND F_Comment.Timestamp = Purchase.Timestamp) AS SocialNetworkHits, (SELECT SUM(Google_Trend.hits) FROM Google_Trend WHERE Google_Trend.word = Product.ProdName AND Google_Trend.Timestamp = Purchase.Timestamp) AS TrendHits FROM Product, Purchase WHERE Purchase.ProdID = Product.ProdID ORDER BY Purchase.ProdID, Timestamp; DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 5: Output DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 5: Data Analysis DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 5: Data Analysis • Social media networking, Google Trends, and Purchases data used predictively • Group into weekly vectors • Extract significant data using Principle Component Analysis to project onto 2 dimensions. • Cluster data using K-Means See if we can predict future sales using machine learning DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Query 5: Data Analysis DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Login Interface Employees login here: DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Switchboard Allows employees to insert data in forms or run selected query DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Forms: New Employee Enter information on new Calbee employees DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES
Questions? Thank you! DATABASE CLIENT EER/SCHEMA NORMALIZATION QUERIES