470 likes | 487 Views
Explore effective paper organization and storytelling techniques to highlight surprising results and engage readers. Understand the scope and audience to optimize impact. Mathematics and algorithms are key topics.
E N D
Outline • The scope of a paper • Storytelling • Paper Organization • Mathematics • Algorithms
The scope of a paper • Which results are the most surprising? • What is the one result that other researchers might adopt in their work? • Does it make sense to explain the new algorithms first, followed by description of the previous algorithms in terms of how they differ from the new work? • Or is the contribution of the new work more obvious if the old approaches are described first, to set the context? • What is the key background work that has to be discussed? • Who is the readership? For example, are you writing for specialists in your area, your examiners, or a general computer science audience?
The scope of a paper • an investigation of external sorting in database systems • a large relation-tens of millions of records, constituting several gigabytes-must be sorted on a field specified in a query. • Costs include processing time for sorting and merging,transfer time to and from disk, and temporary space requirements. • The balance between these costs is governed by available in-memory buffer space, as large blocks are expensive to sort but cheap to merge.
The content of a paper is determined by the readership. • A paper on machine learning for computer vision may have entirely different implications • for the two fields, and thus different aspects of the results might be emphasized. • an expert on vision cannot be assumed to have any experience with machine learning
The publish venue determines the scope of the paper • Is there a page limit? • Are there specific conventions to be observed? • Are the other papers in that venue primarily theoretical or experimental? • What prior knowledge or background is a reader likely to have? • Do the editors require that your code be available online?
Telling a story • A paper is a sequence of concepts, building from a foundation of knowledge • assumed to be common to all readers up to new ideas and results. • There are several common ways for structuring the body of a paper, including • as a chain, by specificity, by example, and by complexity.
compression for fast external sorting • The problem statement consists of an explanation of external sorting and an argument that disk access costs are a crucial bottleneck. • The review explains standard compression methods and why they cannot be integrated into external sorting. • The new solution is the compression method developed in the research. • The demonstration is a series of graphs and tables based on experiments that compare the cost of sorting with and without compression.
Telling a story • Structure by specificity • an explanation of a retrieval system. • Such systems generally have several components: • Parsing, indexing, query, … • Structure by example • Structure by complexity
Organization • Describe the work in the context of accepted scientific knowledge. • State the idea that is being investigated,often as a theory or hypothesis. • Explain what is new about the idea, what is being evaluated, or what contribution the paper is making. • Justify the theory, by methods such as proof or experiment.
Organization • Title and author • Abstract • Introduction • Body • Literature review • Conclusions • Bibliography • Appendices
Body • Introduction-Methods-Results-Discussion • use of fixed headings may prohibit development of a complex explanation in stages • "compression for the external sorting“ • 1. Introduction • 2. External sorting • 3. Compression techniques for database systems • 4. Sorting with compression • 5. Experimental setup • 6. Results and discussion • 7. Conclusions
Literature review • A literature review, or survey, is used to compare the new results to similar previously published results, • to describe existing knowledge, and to explain how it is extended by the new results. • In many papers the literature review material is not gathered into a single section, but is discussed where it is used
From draft to submission • brain storm • writing down in point form what has been learnt, what has been achieved, and what the results are • prepare a skeleton, choosing results to emphasize and discarding material that on reflection seems irrelevant • choose the section titles before writing any text • When the structure is complete, each section can be sketched in perhaps 20 to 200 words
From draft to submission • When the body and the closing summary are complete, the introduction usually needs substantial revision • With a reasonably thorough draft completed, it is time to review the paper content and contribution • For a novice writer who doesn't know where to begin, a good starting point is imitation
Mathematical Clarity • Mathematics gives solidity to abstract concepts. • There are well-established conventions of presentation for mathematics and mathematical concepts. • Reading In mathematical writing it is essential to be precise.
Clarity • X An inverted list for a given term is a sequence of pairs, where the first element in each pair is a document identifier and the second is the frequency of the term in the document to which the identifier corresponds . • √An inverted list for a term t is a sequence of pairs of the form (d, f) , where each d is a document identifier and f is the frequency of t in d.
Mathematical terms • Normal, usual • Definite, strict, proper, all, some Avoid "definite", "strict", and "proper" in their non-mathematical meanings, and be careful with "all" and "some" • Intractable An algorithm or problem is "intractable" only if it is NP-hard • Formula, equation • A "formula" is not necessary an "equation"; the latter involves an equality. • Equivalent, similar • Average, mean. "Average" is used loosely to mean typical. Only use it in the formal sense-of arithmetic mean-if it is clear to the reader that the formal sense is intended. Otherwise use "mean" or even "arithmetic mean".
Theorems • the details of the proof may not be important to the reader and can often be omitted. • A common mistake is to unnecessarily include mechanical algebraic transformations • Theorems, definitions, lemmas,and propositions should be numbered
state the main theorem first, then state and prove the lemmas before giving the main proof • Explain the structure of long proofs before getting to the detail, and explain how each part of the proof relate to the structure.
Readability (1) mathematics does not, and so should not be used at the start of a sentence Give the type of each variable every time it is used, so that the reader doesn‘t have to remember as many details X The values are represented as a list of numbers L. √ The values are represented as a list L of numbers.
Readability (2) breaking down expressions to make them more readable, especially if doing so enlarges small symbols. Mathematical expressions should not run together.
Notation • Ensure that the symbols you use will be correctly understood by, and familiar to, the reader • The symbols ∽and ≈are all used to mean approximately equal to • The symbol ≌ means lScongruent to, not approximately equal. • Use ≤, not < =, for less than or equal to.
Ranges and sequences • Ranger for Real number • [a,b], [a,b), … • Ranger for integer: • It is common practice to use an ellipsis to describe a sequence of integers; thus m,...,n represents all integers between m and n inclusive.
Alphabets • Use of characters from the Greek alphabet to denote variables and quantities can add clarity to mathematical writing Some mathematical symbols and characters from other alphabets have a superficial resemblance to more familiar symbols.
Numbers • In technical writing, numbers should usually be written as figures, not spelt out. • The common exceptions are • approximate numbers • numbers up to twenty, unless they are literal values or part of an expression of measurement • Numbers at the start of a sentence, although it is generally better to recast the sentence so that the number is elsewhere • Percentages should always be in figures
Numbers • X 1024 computers were linked into the ring. • X Partial compilation gave a 4-foldimprovement. • X The increase was over five per cent. • X The method requires 2 passes . • √There were 1024 computers linked into the ring. • √ Partial compilation gave a four-fold improvement. • √ The increase was over 5 per cent. • √ The increase was over 5%.
Numbers, Percentages • X There were between four and 32 processors in each machine . • √There were between 4 and 32 processors in each machine. • X There were 14 512-Kb sets . • √There were fourteen 512-Kb sets. • Avoid the phrase "orders of magnitude". • X The new algorithm is at least two orders of magnitude faster. • √ The new algorithm is at least a hundred times faster.
Numbers, Percentages • In this example, is the unit of magnitude binary or decimal? It would be better to be explicit. “there are 10 kinds of people in the world, those that understand binary and those that don' t.” • X The likelihood of failure is 2: 1 . • √The likelihood of failure is one in three . • √The likelihood of failure is about 30%.
Units of measurement The larger units, especially "Pb", "Eb", "Zb", and "Yb", are unfamiliar to most readers and should be written in full at least once, preferably with an explanation.
Presentation of algorithms • You must demonstrate that the algorithm is a worthwhilecontribution • show that it is correct (given appropriate input, it terminates with appropriate results) • show, by proof, experiment, or both,that it meets some claimed performance bound.
Presentation of algorithms • The steps that make up the algorithm. • The input and output, and the internal data structures used by the algorithm. • The scope of application of the algorithm and its limitations. • The properties that will allow demonstration of correctness, such as preconditions, post-conditions, and loop invariants. • A demonstration of correctness. • A complexity analysis, for both space and time requirements. • Experiments confirming the theoretical results.
Formalisms common formalisms for presenting algorithms • the list style, in which the algorithm is broken down into a series of numbered or named steps and loops • pseudocode, in which the algorithm is presented as if written in a block -structured language • A better option is to use what might be called prosecode number each step, never break a loop over several steps, use sub-numbering for the parts of a step, and include explanatory text.
prosecode • WeightedEdit(s,t) compares two strings s and t , of lengths ks and respectively, to determine theedit distance-the minimum cost in insertions, deletions….
Notation • Mathematical notation is preferable to programming notation for presentation of algorithms. • Use "xi" rather than "x [i] “ • Mathematics provides many handy conventions and symbols that can be used in description of algorithms, including set notation, subscripts and superscripts
Environment of algorithms • the data structures on which it operates • input and output data types • factors such as properties of the underlying operating system and hardware. • Describe data structures carefully. • use, say, a simple mathematical notation to unambiguously specify the structure . • √Each element is a triple (string , length , positions) in which positions is a set of byte offsets at which string has been observed.
Performance of algorithms • Basis of evaluation • Processing time • Memory and disk requirements • Disk and network traffic • Power and Energy Consumption
Performance of algorithms • Basis of evaluation. The basis of evaluation should be made explicit. • Processing time. Time (or speed) over some given input is one of the principal resources used by algorithms • Memory and disk requirements • Disk and network traffic • Applicability. Algorithms can be compared not only with regard to their resource requirements. but with regard to functionality.
Asymptotic complexity • Big-O notation • a function f(n) is said to be O (g (n ) )-that is, g (n) is an upper bound of f (n) if for some constants c and k we have f(n) ≤c . g( n) for all n > k.
Asymptotic complexity • If f(n) is O(g(n)) and g(n) is O(f(n)) , then f(n) is a certain algorithm might require O(nlogn) comparisons and O(n)disk accesses. In principle the complexity of the algorithm is O(nlogn) , but, given that a disk access may require 5 milliseconds and a comparison less than a nanosecond, in practice the cost of the disk accesses might well dominate for any possible application.
the logic of asymptotic claims • Amdahl's law states that the lower bound for the time taken for an algorithm to complete is determined by the part of the algorithm that is inherently sequential. • it has been claimed that Amdahl's law was broken by, for a certain algorithm, increasing both the size of the input data and the number of processors. • These changes had minimal impact on the sequential part of the algorithm
Sometimes a formal analysis is inappropriate or only a minor consideration. • Analytical results often say nothing about constant factors • or behavior in practice where CPU, cache can interact in unpredictable ways