CS 178H Introduction to Computer Science Research

CS 178HIntroduction to Computer Science Research What is CS Research?

What is CS Research? • Discovery of new knowledge of computing through mathematical analysis and experimental evaluation of algorithms and computer software.

Epistemology(definitions from Wikipedia) • Epistemology (from Greekεπιστήμη - episteme, "knowledge" + λόγος, "logos") or theory of knowledge is the branch of philosophy concerned with the nature and scope (limitations) of knowledge.It addresses the questions: • "What is knowledge?" • "How is knowledge acquired?" • "What do people know?" • "How do we know what we know?"

Rationalism • Rationalism is "any view appealing to reason as a source of knowledge or justification" (Lacey 286). In more technical terms it is a method or a theory "in which the criterion of the truth is not sensory but intellectual and deductive" (Bourke 263). • Originated with Socrates (469 BC–399 BC) and Plato (428/427 BC – 348/347 BC).

Empiricism • Empiricism is a theory of knowledge which asserts that knowledge arises from experience. Empiricism emphasizes the role of experience and evidence, especially sensory perception, in the formation of ideas. • Originated with Aristotle (384 BC – 322 BC)

Rationalism in CS(Theoretical CS) • Programs are formal mathematical objects. • Therefore, important properties of algorithms/software can be proven mathematically. • Termination • Correctness (satisfies a formal specification) • Computational Complexity (time and space requirements)

Theoretical CS Research • Algorithm Design and Analysis • Design a new (more efficient) algorithm for some well-defined problem (e.g. sorting, longest-common-subsequence) • Mathematically prove the correctness and improved complexity of the new algorithm. • Theoretical Analysis • Form a mathematical conjecture about a computational problem (e.g. graph isomorphism is NP-complete) • Mathematically prove the conjecture as a theorem.

Limits of Rationalism in CS • Sometimes software is too complex to analyze theoretically. • Sometimes correctness cannot be characterized formally and depends on natural or human behavior. • Protein folding • Handwriting/speech recognition • Sometimes software behavior on real data depends on unknown natural properties of this data. • Locality affecting paging performance

Empiricism in CS(Experimental CS) • Behavior of software can be studied experimentally. • Anecdotal evidence (running a few sample cases) is insufficient. • Collect data (e.g. accuracy, run-time) on running programs many times on large, real-world benchmark collections. • Verify hypotheses about behavior using controlled experiments. • Statistically analyze results for significance.

Scientific Method(steps from Wikipedia) • 1) Define the question • 2) Gather information and resources (observe) • 3) Form hypothesis • 4) Perform experiment and collect data • 5) Analyze data • 6) Interpret data and draw conclusions that serve as a starting point for new hypothesis • 7) Publish results • 8) Retest (frequently done by other scientists)

1) Define the question • Example from My Research: Search Query Disambiguation from Short Sessions • Can a web search engine disambiguate queries? scrubs Search ?

2) Gather information and resources • Obtained web search session data from Microsoft • Find instances of ambiguous queries • Find contextual clues that might help disambiguate queries

98.7 fm huntsville hospital www.star987.com www.huntsvillehospital.com kroq ebay.com www.kroq.com www.ebay.com scrubs scrubs ??? ??? scrubs.com scrubs-tv.com Context can Aid Disambiguation

3) Form Hypothesis • Previous queries and clicks in a session can help disambiguate queries by relating them to previous sessions involving the same query (where we know what result was clicked).

4) Perform Experiment and Collect Data • Build system that uses prior context and previous session data to predict clicked results for new user. • Reorder results from existing search engine based on predicted probability of clicking on a result. • Should reduce number of results user needs to examine before finding a relevant one. • Test on unseen data and compare predictions to actual results clicked.

huntsville hospital huntsvillehospital.org ebay ebay.com scrubs ??? Using Relational Information with aMarkov Logic Network (MLN) huntsville school . . . scrubs scrubs.com . . . hospitallink.com scrubs scrubs-tv.com … ebay.com

Controlled Experiment • Performance of experimental system must be compared to some baseline or control. • Controls are necessary to demonstrate the system is improving over some naïve method (strawman) or current best system for a problem. • For example, in the old joke, someone claims that they are snapping their fingers "to keep the tigers away"; and justifies this behavior by saying "see - its working!" While this "experiment" does not falsify the hypothesis "snapping fingers keeps the tigers away", it does not really support the hypothesis - not snapping your fingers does not keep the tigers away as well (Wikipedia: Experiment)

Control for Query Disambiguation • Simple control is to order results from search engine randomly. • Another baseline is to just use ordering from existing (non-personalized) search engine.

Performance Metrics • Need quantitative measure of system’s performance (runtime or accuracy). • Compare quantitative performance of experimental system to baseline control system. • To measure accuracy of ordering of web search results we measure AUC-ROC • Percentage of irrelevant results not seen by user before finding a relevant result (if scan results from top)

5) Analyze Data • Do results support the hypothesis? • Are differences statistically significant? • Use statistical test to determine if observed differences are unlikely to be due only to random variation, i.e. probability of null hypothesis < .05.

Results (AUC-ROC) * Indicates statistically significant improvement over previous result * * *

6) Interpret data and draw conclusions that serve as a starting point for new hypothesis • Is random ordering the best baseline to compare to? • What if just order results based on popularity (i.e. how many people clicked on a particular result after submitting a given ambiguous query).

New Baseline Results

Refine System • Develop MLN that incorporates popularity information. • Rerun experiment to obtain results for revised version and verify the hypothesis that it performs better than the popularity baseline.

Results for Revised System

7) Publish Results • Paper submitted to the international data mining conference. • KDD-09: Paris, June 28 – July 1, 2009

CS 178H Introduction to Computer Science Research