C ROWD S EARCHING (And Beyond)

Crowdsearcher CROWDSEARCHING(And Beyond) Stefano Ceri Politecnico di Milano Dipartimento di Elettronica, Informazione e BioIngegneria

Crowdsearcher Crowd-based Applications • Emerging crowd-based applications: • opinion mining • localized information gathering • marketing campaigns • expert response gathering • General structure: • the requestor poses some questions • a wide set of respondersare in charge of providing answers (typically unknown to the requestor) • the system organizes a response collection campaign • Include crowdsourcing and crowdsearching

Crowdsearcher The “system” is a wide concept • Crowd-based applications may use social networks and Q&A websites in addition to crowdsourcing platforms • Our approach: a coordination engine which keeps an overall control on the application deployment and execution CrowdSearcher API Access

Crowdsearcher A simpleexample of crowdsearching

Crowdsearcher Example: Findyour job (social invitation)

Crowdsearcher Example: Findyour job (social invitation) Selected data items can be transferred to the crowd question

Crowdsearcher Findyour job (responsesubmission)

Crowdsearcher Crowdsearcherresults (in the loop)

Crowdsearcher Deploymentalternatives • Multi-platform deployment Native

Crowdsearcher Deployment: search on a social network • Multi-platform deployment

Crowdsearcher Deployment: search on the social network • Multi-platform deployment

Crowdsearcher The MODEL ANDTHE PROCESS

Crowdsearcher CrowdSearcher • Combines a conceptual framework, a specification paradigm and a reactive execution control environment • Supports designing, deploying, and monitoring applications on top of crowd-based systems • Design is top-down, platform-independent • Deployment turns declarative specifications into platform-specific implementations which include social networks and crowdsourcing platforms • Monitoring provides reactive control, which guarantees applications’ adaptation and interoperability • Developed in the context of Search Computing (SeCo, ERC Advanced Grant, 2008-2013)

Crowdsearcher The Design Process • A simple task design and deployment process, based on specific data structures • created using model-driven transformations • driven by the task specification • Task Specification: task operations, objects, and performers • Task Planning: work distribution • Control Specification: task control policies

Crowdsearcher DEMO !

Crowdsearcher Valuable ideas: 1. Operation types • In a Task, performers are required to execute logical operations on input objects • e.g. Locate the faces of the people appearing in the following 5 images • CrowdSearcher offers pre-defined operation types: • Like: Ask a performer to express a preference (true/false) • e.g. Do you like this picture? • Comment: Ask a performer to write a description / summary / evaluation • e.g. Can you summarize the following text using your own words? • Tag: Ask a performer to annotate an object with a set of tags • e.g. How would you label the following image? • Classify: Ask a performer to classify an object within a closed-set of alternatives • e.g. Would you classify this tweet as pro-right, pro-left, or neutral? • Add: Ask a performer to add a new object conforming to the specified schema • e.g. Can you list the name and address of good restaurants nearby Politecnico di Milano? • Modify: Ask a performer to verify/modify the content of one or more input object • e.g. Is this wine from Cinque Terre? If not, where does it come from? • Order: Ask a performer to order the input objects • e.g. Order the following books according to your taste

Crowdsearcher 2. Platform-independent Meta-Model

Crowdsearcher 3. Reactive Crowdsourcing • A conceptual framework for controlling the execution of crowd-based computations. Based on: • Control Marts • Active Rules • Classical forms of controls: • Majority control (to close object computations) • Quality control (to check that quality constraints are met) • Spam detection (to detect / eliminate some performers) • Multi-platform adaptation (to change the deployment platform) • Social adaptation (to change the community of performers)

Crowdsearcher Why Active Rules? • Ease of Use: control is easily expressible • Simple formalism, simple computation • Power: arbitrarily complex controls is supported • Extensibility mechanisms • Automation: active rules can be system-generated • Well-defined semantics • Flexibility: localized impact of changes on the rules set • Control isolation • Known formal properties descending from known theory • Termination, confluence

Crowdsearcher 4. Control Mart • Data structure for controlling application execution, inspired by data marts (for data warehousing); content is automatically built from task specification & planning • Central entity: MicroTask Object Execution • Dimensions: Task / Operations, Performer, Object

Crowdsearcher Auxiliary Structures • Object : tracking object responses • Performer:tracking performer behavior (e.g. spammers) • Task:tracking task status

Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm

Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD before state of a row • NEW  after state of a row e: UPDATE FOR μTaskObjectExecution[ClassifiedParty]

Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD before state of a row • NEW  after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c:NEW.ClassifiedParty== ’Republican’

Crowdsearcher Active Rules Language • Active rules are expressed on the previous data structures • Event-Condition-Action paradigm • Events: data updates / timer • ROW-level granularity • OLD before state of a row • NEW  after state of a row • Condition: a predicate that must be satisfied (e.g. conditions on control mart attributes) • Actions: updates on data structures (e.g. change attribute value, create new instances), special functions (e.g. replan) e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c:NEW.ClassifiedParty== ’Republican’ a: SET ObjectControl[oID== NEW.oID].#Eval+= 1

Crowdsearcher Rule Example 1 e: UPDATE FOR μTaskObjectExecution[ClassifiedParty] c: NEW.ClassifiedParty== ’Republican’ a: SET ObjectControl[oID== NEW.oID].#Eval+= 1

Crowdsearcher 5. Rule Programming Best Practices • We define three classes of rules

Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables;

Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task);

Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Top-to-bottom, left-to-right, evaluation • Guaranteed termination

Crowdsearcher Rule Programming Best Practice • We define three classes of rules • Control rules: modifying the control tables; • Result rules: modifying the dimension tables (object, performer, task); • Execution rules: modifying the execution table, either directly or through re-planning • Termination must be proven (rule precedence graph has cycles)

Crowdsearcher 6. Dealing with interoperability • Adaptationis any change of allocation of the application to crowd-based systems or to their performers. • Migrationis the moving of the application from a given system to a different one. (Migration is a special case of adaptation) • Cross-Platform Interoperability: applications change the underlying social network or crowdsourcing platforms, e.g., from Facebook to Twitter or to AMT. • Cross-Community Interoperability: applications change the performers' community, e.g., from the students to the professors of a university.

Crowdsearcher Adaptation options Adaptation may require: • Re-planning: the process of generating new micro-tasks. • Re-invitation: the process of generating new invitation messages for existing or re-planned micro-tasks, with the aim of getting new performers for them. Adaptation occurs at different levels of granularity • Task granularity: re-planning or re-invitation occurs for the whole task • Object granularity: re-planning or re-invitation is focused on one (or a few) objects (for instance, objects on which it is harder to achieve an agreement among performers, with a majority-based decision mechanisms).

Crowdsearcher EXPERIMENTS

Crowdsearcher Politician Affiliation • Given the picture and name of a politician, specify his/her political affiliation • No time limit • Performers are encouraged to look up online • 2 set of rules • Majority Evaluation • Spammer Detection

Crowdsearcher Movie Scenes • users can select the screenshot timeframe and whether it is a spoiler or not • 20 still images each from 16 popularmovies • each micro-task consists of evaluating one image • Results are accepted, and the corresponding request is closed, when an agreement between 5 performers is reached both on the temporal category and the spoiler option, independently on the number of executions.

Crowdsearcher Professors’ images • 16 professors within two research groups in our department (DB and AI groups) • The top 50 images returned by the Google Image API for each query • Each microtaskconsisted of evaluating 5 images regarding a professor. • Results are accepted (and thus the corresponding object is closed) when enough agreement on the class of the image is reached • Closed objects are removed from new executions.

Crowdsearcher SINGLE PLATFORM

Crowdsearcher Query Type • Engagement depends on the difficulty of the task • Like vs. Add tasks:

Crowdsearcher Comparison of ExecutionPlatforms • Facebookvs. Doodle

Crowdsearcher PostingTime • Facebookvs. Doodle

Crowdsearcher Majority Evaluation_1/3 30 object; object redundancy = 9; Final object classification as simple majority after 7 evaluations

Crowdsearcher Majority Evaluation_2/3 Final object classification as total majority after 3 evaluations Otherwise, re-plan of 4 additional evaluations. Then simple majority at 7

Crowdsearcher Majority Evaluation_3/3 Final object classification as total majority after 3 evaluations Otherwise, simple majority at 5 or at 7 (with replan)

Crowdsearcher Spammer Detection_1/2 New rule for spammer detection without ground truth Performer correctness on final majority. Spammer if > 50% wrong classifications

Crowdsearcher Spammer Detection_1/2 New rule for spammer detection without ground truth Performer correctness on current majority. Spammer if > 50% wrong classifications

C ROWD S EARCHING (And Beyond)