680 likes | 820 Views
C ROWD S EARCHING (And Beyond). Stefano Ceri Politecnico di Milano Dipartimento di Elettronica , Informazione e BioIngegneria. Crowd-based Applications. Emerging crowd-based applications : opinion mining localized information gathering marketing campaigns
E N D
Crowdsearcher CROWDSEARCHING(And Beyond) Stefano Ceri Politecnico di Milano Dipartimento di Elettronica, Informazione e BioIngegneria
Crowdsearcher Crowd-based Applications • Emerging crowd-based applications: • opinion mining • localized information gathering • marketing campaigns • expert response gathering • General structure: • the requestor poses some questions • a wide set of respondersare in charge of providing answers (typically unknown to the requestor) • the system organizes a response collection campaign • Include crowdsourcing and crowdsearching
Crowdsearcher The “system” is a wide concept • Crowd-based applications may use social networks and Q&A websites in addition to crowdsourcing platforms • Our approach: a coordination engine which keeps an overall control on the application deployment and execution CrowdSearcher API Access
Crowdsearcher A simpleexample of crowdsearching
Crowdsearcher Example: Findyour job (social invitation)
Crowdsearcher Example: Findyour job (social invitation) Selected data items can be transferred to the crowd question
Crowdsearcher Findyour job (responsesubmission)
Crowdsearcher Crowdsearcherresults (in the loop)
Crowdsearcher Deploymentalternatives • Multi-platform deployment Native
Crowdsearcher Deployment: search on a social network • Multi-platform deployment
Crowdsearcher Deployment: search on the social network • Multi-platform deployment
Crowdsearcher Deployment: search on the social network • Multi-platform deployment
Crowdsearcher Deployment: search on the social network • Multi-platform deployment
Crowdsearcher The MODEL ANDTHE PROCESS
Crowdsearcher CrowdSearcher • Combines a conceptual framework, a specification paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring applications on top of crowd-based systems • Design is top-down, platform-independent • Deployment turns declarative specifications into platform-specific implementations which include social networks and crowdsourcing platforms • Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability • Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013)
Crowdsearcher The Design Process • A simple task design and deployment process, based on specific data structures • created using model-driven transformations • driven by the task specification • Task Specification: task operations, objects, and performers • Task Planning: work distribution • Control Specification: task control policies
Crowdsearcher DEMO !
Crowdsearcher Valuable ideas: 1. Operation types • In a Task, performers are required to execute logical operations on input objects • e.g. Locate the faces of the people appearing in the following 5 images • CrowdSearcher offers pre-defined operation types: • Like: Ask a performer to express a preference (true/false) • e.g. Do you like this picture? • Comment: Ask a performer to write a description / summary / evaluation • e.g. Can you summarize the following text using your own words? • Tag: Ask a performer to annotate an object with a set of tags • e.g. How would you label the following image? • Classify: Ask a performer to classify an object within a closed-set of alternatives • e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema • e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano? • Modify: Ask a performer to verify/modify the content of one or more input object • e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects • e.g. Order the following books according to your taste
Crowdsearcher 2. Platform-independent Meta-Model
Crowdsearcher 3. Reactive Crowdsourcing • A conceptual framework for controlling the execution of crowd-based computations. Based on: • Control Marts • Active Rules • Classical forms of controls: • Majority control (to close object computations) • Quality control (to check that quality constraints are met) • Spam detection (to detect / eliminate some performers) • Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers)
Crowdsearcher Why Active Rules? • Ease of Use: control is easily expressible • Simple formalism, simple computation • Power: arbitrarily complex controls is supported • Extensibility mechanisms • Automation: active rules can be system-generated • Well-defined semantics • Flexibility: localized impact of changes on the rules set • Control isolation • Known formal properties descending from known theory • Termination, confluence
Crowdsearcher 4. Control Mart • Data structure for controlling application execution, inspired by data marts (for data warehousing); content is automatically built from task specification & planning • Central entity: MicroTask Object Execution • Dimensions: Task / Operations, Performer, Object
Crowdsearcher Auxiliary Structures • Object : tracking object responses • Performer:tracking performer behavior (e.g. spammers) • Task:tracking task status
Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm
Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD before state of a row • NEW after state of a row e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]
Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD before state of a row • NEW after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c:NEW.ClassifiedParty== ’Republican’
Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD before state of a row • NEW after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) • Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan) e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c:NEW.ClassifiedParty== ’Republican’ a: SET ObjectControl[oID== NEW.oID].#Eval+= 1
Crowdsearcher Rule Example 1 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty== ’Republican’ a: SET ObjectControl[oID== NEW.oID].#Eval+= 1
Crowdsearcher Rule Example 1 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty== ’Republican’ a: SET ObjectControl[oID== NEW.oID].#Eval+= 1
Crowdsearcher Rule Example 1 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty== ’Republican’ a: SET ObjectControl[oID== NEW.oID].#Eval+= 1
Crowdsearcher 5. Rule Programming Best Practices • We define three classes of rules
Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables;
Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);
Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Top-to-bottom, left-to-right, evaluation • Guaranteed termination
Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning • Termination must be proven (rule precedence graph has cycles)
Crowdsearcher 6. Dealing with interoperability • Adaptationis any change of allocation of the application to crowd-based systems or to their performers. • Migrationis the moving of the application from a given system to a different one. (Migration is a special case of adaptation) • Cross-Platform Interoperability: applications change the underlying social network or crowdsourcing platforms, e.g., from Facebook to Twitter or to AMT. • Cross-Community Interoperability: applications change the performers' community, e.g., from the students to the professors of a university.
Crowdsearcher Adaptation options Adaptation may require: • Re-planning: the process of generating new micro-tasks. • Re-invitation: the process of generating new invitation messages for existing or re-planned micro-tasks, with the aim of getting new performers for them. Adaptation occurs at different levels of granularity • Task granularity: re-planning or re-invitation occurs for the whole task • Object granularity: re-planning or re-invitation is focused on one (or a few) objects (for instance, objects on which it is harder to achieve an agreement among performers, with a majority-based decision mechanisms).
Crowdsearcher EXPERIMENTS
Crowdsearcher Politician Affiliation • Given the picture and name of a politician, specify his/her political affiliation • No time limit • Performers are encouraged to look up online • 2 set of rules • Majority Evaluation • Spammer Detection
Crowdsearcher Movie Scenes • users can select the screenshot timeframe and whether it is a spoiler or not • 20 still images each from 16 popularmovies • each micro-task consists of evaluating one image • Results are accepted, and the corresponding request is closed, when an agreement between 5 performers is reached both on the temporal category and the spoiler option, independently on the number of executions.
Crowdsearcher Professors’ images • 16 professors within two research groups in our department (DB and AI groups) • The top 50 images returned by the Google Image API for each query • Each microtaskconsisted of evaluating 5 images regarding a professor. • Results are accepted (and thus the corresponding object is closed) when enough agreement on the class of the image is reached • Closed objects are removed from new executions.
Crowdsearcher SINGLE PLATFORM
Crowdsearcher Query Type • Engagement depends on the difficulty of the task • Like vs. Add tasks:
Crowdsearcher Comparison of ExecutionPlatforms • Facebookvs. Doodle
Crowdsearcher PostingTime • Facebookvs. Doodle
Crowdsearcher Majority Evaluation_1/3 30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations
Crowdsearcher Majority Evaluation_2/3 Final object classification as total majority after 3 evaluations Otherwise, re-plan of 4 additional evaluations. Then simple majority at 7
Crowdsearcher Majority Evaluation_3/3 Final object classification as total majority after 3 evaluations Otherwise, simple majority at 5 or at 7 (with replan)
Crowdsearcher Spammer Detection_1/2 New rule for spammer detection without ground truth Performer correctness on final majority. Spammer if > 50% wrong classifications
Crowdsearcher Spammer Detection_1/2 New rule for spammer detection without ground truth Performer correctness on current majority. Spammer if > 50% wrong classifications