580 likes | 709 Views
Big Data Analysis. Chin- Chih Chang 張欽智 changc@chu.edu.tw Computer Science and Information Engineering Chung Hua University 2014/03/24. Big Data Analysis. What is Big D ata? Why is Big Data important? How to do with these data?
E N D
Big Data Analysis Chin-Chih Chang 張欽智 changc@chu.edu.tw Computer Science and Information Engineering Chung Hua University 2014/03/24
Big Data Analysis • What is Big Data? • Why is Big Data important? • How to do with these data? • Example: A Recommender System Combining Social Networks for Tourist Attractions
What is Big Data? • Big Data refers to datasets whose size are beyond the ability of typical database software tools to capture, store, manage and analyze. • This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data. • Big data in many sections today will range from a few dozen terabytes (1012) to multiple petabytes (1015).
What is big data? • Big Data is not just about the size of data but also includes data variety and data velocity. Together, these three attributes form the three V’s of Big Data.
Data types • Structured data: This type describes data which is grouped into a relational scheme (e.g., rows and columns within a standard database). The data configuration and consistency allows it to respond to simple queries to arrive at usable information, based on an organization's parameters and operational needs.
Data Types • Semi-structured data: This is a form of structured data that does not conform to an explicit and fixed schema. The data is inherently self-describing and contains tags or other markers to enforce hierarchies of records and fields within the data. Examples include weblogs and social media feeds. • Unstructured data: This type of data consists of formats which cannot easily be indexed into relational tables for analysis or querying. Examples include images, audio and video files.
How much data? • “We except to create 12.6exabytes of data every day in 2014 — so much that 90% of the data in the world today has been created in the last two years alone. • This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. • This data is “big data.”
Big Data is everywhere! • Lots of data is being collected and warehoused • Web data, e-commerce • purchases at department/grocery stores • Bank/credit card transactions • Social network • Instant messaging • Internet of things
Type of Data • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can only scan the data once
Why is Big Data important? • Successful Stories: • Netflix • Movies • Super markets • …
What to do with these data? • Aggregation and Statistics • Data warehouse and OLAP • Indexing, Searching, and Querying • Keyword based search • Pattern matching (XML/RDF) • Knowledge discovery • Data Mining • Statistical Modeling
What is Data Mining? • Discovery of useful, possibly unexpected, patterns in data • Non-trivial extraction of implicit, previously unknown and potentially useful information from data • Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
Data Mining Tasks • Classification [Predictive] • Clustering [Descriptive] • Association Rule Discovery [Descriptive] • Sequential Pattern Discovery [Descriptive] • Regression [Predictive] • Deviation Detection [Predictive] • Collaborative Filter [Predictive]
Example: A Recommender System Combining Social Networks for Tourist Attractions
Outline • Abstract • Introduction • Related Work • System Design and Mechanism • System Implementation and Experiments • Experimental Results • Conclusion and Future Work
Abstract • In this paper we present a recommender system combining social networks for tourist attractions. • Three mechanisms are analyzed: • Using similarity among users and their trustability. • Using information collected from social networks. • Combination of similarity and social networks.
Introduction • A recommender system is a system that suggests things which users might be interested in after learning their preferences. • A recommender system can help users cope with the problem of information overload. • Social networks have become a common platform for people to share their thoughts and extend their friendships into a virtual world.
Introduction • There is high potential to enhance recommender systems by incorporating social network information. • But how to effectively use social network information is still a research topic. • A tourist information system will be convenient to those who are preparing to travel or just on the road.
Introduction • Similar information overload could happen in these tourist information systems. • In this paper, we will present a tourist information system that combines recommender systems and social network.
Related WorkRecommender System • A recommender system is used to help users find items they prefer faster and more accurate by suggesting them the right things. • There are mainly four approaches for recommendation: content-based filtering, collaborative filtering, knowledge-based approaches, and hybrid approaches.
Related WorkRecommender System • Content-based filtering: The method recommends items that are similar to the ones that the user liked in the past. • Collaborative filtering: The method recommends the items that are likely used by those who have the similar interest to the user. • Knowledge-based approaches: One example of this type of approaches is to ask the user directly about her or his requirements. Based on the criteria provided by the user the items are recommended. • Hybrid approaches: The method is a hybrid of above methods.
Related WorkRecommender System • Comparison of Recommender Techniques
Related WorkSocial Network Sites • Social network sites are Web-based services which enable online social networks or relationships. • Social network sites are one type of social media which is any platform where people can create, share, and exchange their activities, views, interests, experiences, or information.
Related WorkSocial Network Sites • Social media have become a part of our daily life. • It is not easy for us not to notice people are focusing on their mobile device to use Facebook or LINE no matter where they are. • User profiles, friends, and comments are three key components of social network sites.
Related WorkSocial Network Sites • Social network users have been growing drastically. • There approximate 800 million users on Facebook. Some even called it Facebook country. • A social recommendation utilizes user's social network and related information for recommendation.
Related WorkSocial Network Sites • Social network users have been growing drastically. • There approximate 800 million users on Facebook. Some even called it Facebook country. • A common technique for social recommendations is collaborative filtering. • Based on two assumptions: people who are socially associated are more likely to share the common interests and users can be easily influenced by the friends they trust.
Related WorkTourist Information Systems • A tourist information system is a system that provides travel guides, maps, information of accommodation and transportation. • A system that can recommends tourist attractions will be very helpful to any tourist.
System Design and Mechanism • In our design we aim at building a tourist information system which lets users access the attraction information either from an information kiosk. • The system is associated with Facebook. • Whenever an interface device is equipped with a RFID reader, users can to log into the system without typing the account and password by using a RFID card.
System Design and Mechanism • The interactions between users and attraction information website are shown as follows.
System Design and Mechanism • The system operation is shown as follows:
System Design and MechanismSystem Operation • Facebook App interface is available to users. • Users can access Facebook App to share, like, comment on, and rate the attractions using their Facebook account. The first user needs to choose their interest and can register their RFID cards. • A Web server and a database management system (DBMS) are running on a server machine. • Users can directly log into the system through a RFID card if they have registered their RFID card.
Personalized Social Recommendation (PSR) • Acquire users’ appraisal on each attraction and activities on the social network site. • Use collaborative filtering or keep track of activities on the social network site for recommendation. • Calculate the score for each attraction. • Rank attractions based on the score. • If the scores are same, then check the appraisal time. The evaluation done in the more recent time obtains the higher rank. • Recommend the attractions with top 3 scores and show the attraction of the top 1 on the main page.
Recommendation Methods Collaborative filtering • First calculate the average appraisal of kth attraction from all users. • Then evaluate the difference between a user and the mean value by the equation where is the appraisal of kth attraction from the userj. • The average trustability of the userj is calculated using Equation (1).
Recommendation Methods Collaborative filtering • (1) • k indicates kth attraction; • m indicates the total number of the appraisal that userj gave; • C is a constant value used to control the difference degree; and the default value is set to 1.5. The larger C is, the less trustability is.
Recommendation Methods Collaborative filtering • Trustability of users
Recommendation Methods Collaborative filtering • Similarity matrix among users
Recommendation Methods Collaborative filtering • (2) • Si,jis the similarity between user i and user j to an attraction; • is the average similarity between user i and user j; • nis the number of attractions that both user i and user j recommend.
Recommendation Methods Collaborative filtering • Average similarity between User 0 and other users
Recommendation Methods Collaborative filtering • (3) • is the average appraisal weighting of user i for each user j. • Average appraisal weighting of User 0 for each user j
Recommendation MethodsSocial Network Activities • A user’s preference is evaluated using Equation (4). • The normalization of user’s rating is calculated using Equation (5) where Ri is the appraisal of kth attraction from the user i. • P = R + S + L + I (4) • (5)
Recommendation MethodsCF plus Social Recommendation • We then combine Equation (3) for collaborative filtering and Equation (4) for social recommendation into Equation (6) where each method are given the weight 0.5. • T = 0.5R + 0.5P (6)
System Implementation and Experiments • Development environment