E N D
Big Data Analytics(20MCA352) Lecture #2 Module-1- Big Data and Analytics
Example Applications • Mail box –Response Modeling Analytics • Banks /Loan – Behavioral Scoring Model • Mobile/Telephone Service Provider – Calling Behavior • Facebook • Twitter –Social Media Analytics • Loyalty Card – Market basket Analysis • Debit/ Credit Card – Fraud Detection Model
Big Data in retail • To stay competitive, retailers make better buying decisions, must offer relevant discounts, convince customers to hop on new trends, and remember their customers’ birthdays—all while making the business run behind the scenes. • Big data in retail is essential to target and retain customers, streamline operations, optimize supply chain, improve business decisions, and ultimately, save money.
4 big data benefits for retail Big data analysis can predict emerging trends, target the right customer at the right time, decrease marketing costs, and increase the quality of customer service. Common benefits of using big data in retail include: • Maintaining a 360-degree view of each customer — Create the kind of personal engagement that customers have come to expect by knowing each individual, at scale. • Optimize pricing — Get the most value out of upcoming trends and know when, and how much, to decrease off-trend product prices. • Streamline back-office operations — Imaging maintaining perfect stock levels throughout the year and gathering data from registered products in real-time. • Enhanced customer service — Unlock the customer service data hiding in recorded calls, in-store security footage, and social media comment.
Analytics Process Model • Step 1As a first step, a thorough definition of the business problem to be addressed is needed. • Step 2Next, all source data that could be of potential interest need to be identified. The golden rule here is: the more data, the better! • Step 3After we move to the analytics step, an analytical model will be estimated on the preprocessed and transformed data.
Analytics Process Model • Step 4Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. • Step 5Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine).
Sources of Big Data • Machines • People • Organization
Sources of Data • Transactions are the first important source of data. • Transactional data consist of structured, low‐level, detailed information capturing the key characteristics of a customer transaction (e.g., purchase, claim, cash transfer, credit card payment).
Sources of Data • Unstructured data embedded in text documents (e.g., emails, web pages, claim forms) or multimedia content can also be interesting to analyze. • However, these sources typically require extensive preprocessing before they can be successfully included in an analytical exercise.
Sources of Data • Another important source of data is qualitative, expert‐based data. • An expert is a person with a substantial amount of subject matter expertise within a particular setting (e.g., credit portfolio manager, brand manager).
Sources of Data • Data poolers are becoming more and more important in the industry. • Popular examples are Dun & Bradstreet, Bureau Van Dijck, and Thomson Reuters. • The core business of these companies is to gather data in a particular setting (e.g., credit risk, marketing), build models with it, and sell the output of these models (e.g., scores), possibly together with the underlying raw data, to interested customers.
Sources of Data • Publicly available data can be included in the analytical exercise. • A first important example is macroeconomic data about gross domestic product (GDP), inflation, unemployment, and so on.
Sampling • Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the larger data set being examined.
Sampling • Key requirement for a good sample is that it should be representative of the future customers on which the analytical model will be run. • Hence, the timing aspect becomes important because customers of today are more similar to customers of tomorrow than customers of yesterday.
Sampling • Choosing the optimal time window for the sample • The sample should also be taken from an average business period • Sampling bias should be avoided as much as possible
Credit Scoring Example • Assume one wants to build an application scorecard to score mortgage applications. • The future population then consists of all customers who come to the bank and apply for a mortgage—the so‐called through‐the‐door (TTD) population. • One then needs a subset of the historical TTD population to build an analytical model.
Credit Scoring Example Bureau based inference • Sample of past customers is given to the credit bureau to determine their target label (good or bad payer) • Customers who were offered credit but decided not to take it (despite the fact that they may have been classified as good by the old scorecard). • To be representative, these customers should also be included in the development sample.
Stratified Sampling • In stratified sampling, a sample is taken according to predefined strata. • Consider, for example, a churn prediction or fraud detection context in which data sets are typically very skewed (e.g., 99 percent nonchurnersand 1 percent churners). • When stratifying according to the target churn indicator, the sample will contain exactly the same percentages of churners and nonchurners as in the original data.
Types of Data Elements • Continuous : Defined on an interval that can be limited or unlimited. • Categorical: • Nominal : Only take on a limited set of values with no meaningful ordering in between. • Ordinal: Only take on a limited set of values with a meaningful ordering in between. • Binary: Only take on two values.