600 likes | 614 Views
Explore the various applications of Slow Intelligence Systems (SIS) including social influence analysis, product and service optimization, topic and trend detection, and high dimensional feature selection. Discover how these systems can be used to analyze social networks, predict influential nodes, personalize products and services, detect and track hot topics, and select the most relevant features.
E N D
Outline • Application: Social Influence Analysis • Application: Product & Service Optimization • Application: Topic/Trend Detection • Application: High Dimensional Feature Selection • Discussion
Outline • Application: Social InfluenceAnalysis • Application: Product & Service Optimization • Application: Topic/Trend Detection • Application: High Dimensional Feature Selection • Discussion
Application to Social Influence Analysis In large social networks, nodes (users, entities) are influenced by others for many different reasons. How to model the diffusion processes over social network and how to predict which node will influence which other nodes in network have been an active research topic recently. Many researchers proposed various algorithms. How to utilize these algorithms and evolutionarily select the best one with the most appropriate parameters to do social influence analysis is our objective in applying the SIS technology.
The Social Influence Analysis SIS System Input data stream is first processed by the Pre-Processor. The Enumerator then invokes the super-component that creates the various social influence analysis algorithms such as Linear Threshold LIM, Susceptible-Infective-Susceptible SIS, Susceptible-Infective-Recovered SIR and Independent Cascading. The Tester collects and presents the test results.
LIM Results of concept 1 and concept 3 with two combinations of parameters in Plurk dataset
LIM Results of concept 1 and concept 3 with two combinations of parameters in Facebook dataset
The SIA/SIS System The Timing Controller will restart the social influence analysis cycle with a different SIA super component such as the Heat Diffusion algorithms, or with different pre-processor. The Eliminator eliminates the inferior SIA algorithms, and the Concentrator selects the optimal SIA algorithm.
Outline • Application: Social Influence Analysis • Application: Product & Service Optimization • Application: Topic/Trend Detection • Application: High Dimensional Feature Selection • Discussion
SIS Application to Product Configuration Productionof personalized or custom-tailored goods or services to meet consumers' diverse and changing needs
Figure 6 - Ontological Filter and the Slow Intelligent System Ontological Filter and Slow Intelligence System
A Scenario • A customer would like to buy a Personal Computer in order to play videogames and surf on the internet. • He knows that he needs an operating system, a web browser and an antivirus package. • In particular, the user prefers a Microsoft Windows operating system. He lives in the United States and prefers to have a desktop. He also prefers low cost components.
Outline • Application: Social Influence Analysis • Application: Product & Service Optimization • Application: Topic/Trend Detection • Application: High Dimensional Feature Selection • Discussion
Topic Detection and Tracking (TDT) System Overview • Detect current hot topics and predict future hot topics based on data collected from the internet • TDT System composes of • Crawler & Extractor: • Collect latest data from Internet for user’s needs • Restrict range of data collection from web data (focus crawler) • Topic Extractor • Discover current hot topics from a set of text documents • Topic Detector • Predict hot topics 15
Topic/Trend Detection System • Crawler & Extractor Social Media HTML documents User’s Keywords of Interests Web Crawler Text documents Web data DB Topic Extractor Information Extractor * Extract articles and metadata (title, author, content, etc) from semi-structured web content Crawler & Extractor
Focused Crawler : Classification Yahoo! Open Directory Project Taxonomy Creation Example Collection • URLs • Browsing • System proposes the most common classes • User marks as GOOD • User change trees Taxonomy Selection and Refinement • System propose URLs found in small neighborhood of examples. • User examines and includes some of these examples. Interactive Exploration Training • Integrate refinements into statistical class model • (classifier-specific action). 17
Focused Crawler: Distillation • Identify relevant hubs by running a topic distillation algorithm. • Raise visit priorities of hubs and immediate neighbors. Distillation • Report most popular sites and resources. • Mark results as useful/useless. • Send feedback to classifier and distiller. Feedback 18
Extractor Given a Web page: Build the HTML tag tree Mine data regions Mining data records directly is hard Identify data records from each data region Learn the structure of a general data record A data record can contain optional fields Extract the data 19
TDT Petri Net Simulation Topic Detection and Tracking 20
Crawler 22
Extractor 29
Extract data 34
Slow Intelligence Steps in blue color:Accept user requestSend request data to TDTEnumerator generates combinationsEliminator selects the best method to fit our needEvaluate combinationsUse concentrator to highlight the selected resultsSend the result to TDTGenerate the instructions to the serverDispatcher gets the instructionDecide where we are going to send the instructionsSend the instructions to the serverEnd of simulation run 37
Outline • Application: Social Influence Analysis • Application: Product & Service Optimization • Application: Topic/Trend Detection • Application: High Dimensional Feature Selection • Discussion
Introduction High-dimensional feature selection is a hot topic in statistics and machine learning. Model relationship between one response and associated features , based on a sample of size n. 39
Math formulation Let be a vector of responses and be their associated covariate vectors where . When for the classification problem, we assume a Logistic model: We estimate the regression coefficient and the bias by minimizing the loss function: 40
Application Supervised learning: gene selection problem in bioinformatics one wants to eliminate those irrelevant genes (features) to obtain a robust classifier. one wants to know which genes are the most critical factors to the disease. each sample’s data with p gene expression levels n samples, patients or healthy ones Important genes selected 41 each Gene expression level
Challenges Dimensionality grows rapidly with interactions of the features Portfolio selection and networking modeling: 2000 stocks involve over 2 millions unknown parameters in the covariance matrix. Protein-protein interaction: the sample size may be in the order of thousands, but the number of features can be in the order of millions. To construct effective method to learn relationships between features and responses in high dimension for scientific purposes. 42
Feature Selection Approach • Main SIS procedure • main_Enumerator • main_Eliminator • main_Adaptator • main_Propagator • main_Concentrator • time controller • Sub procedure • sub_enumerator • sub_concentrator • knowledge base 43
Main Enumerator • Enumerate p features • Among these features, some are relevant to the responses while others not. 44
Main Eliminator • Apply Pearson Correlation between each feature and response , then rank the value from high to low and eliminate the lowest features. • is a pre-defined constant. • is selected top feature set. 45
Sub Enumerator • Enumerate all feature selection algorithms in Knowledge base by applying them to feature set . And select top features as set from for each algorithm. • Knowledge Base: stores the existing candidate algorithms. • We add L1-regularized regression, elastic-net regularized regression • and forward stepwise regression. In principle, any feature selection • algorithms can be put into the knowledge base. 46
Sub Concentrator • For each selected feature set , we compute the loss function: • and choose the best algorithm with the minimum loss. • Then the sub system selects features from . • We denote the feature set 47
Main Adaptor • For all other features in the total p features, • we add each one to and compute the loss function: 48
Main Concentrator • Ranking all with from low to high, and select the top features with the smallest . • top features 49
Main Propagator • Add these top features to to form the new feature set . • top features 50