160 likes | 266 Views
Dissociated Web. Craig S. Kaplan @ PoCSci ‘02. Dissociated Press n. … Here is a short example of word-based Dissociated Press applied to an earlier version of this Jargon File:
E N D
Dissociated Web Craig S. Kaplan @ PoCSci ‘02
Dissociated Press n. … Here is a short example of word-based Dissociated Press applied to an earlier version of this Jargon File: wart: n. A small, crocky feature that sticks out of an array (C has no checks for this). This is relatively benign and easy to spot if the phrase is bent so as to be not worth paying attention to the medium in question.
Dissociated Press n. … Here is a short example of word-based Dissociated Press applied to an earlier version of this Jargon File: wart: n. A small, crocky feature that sticks out of an array (C has no checks for this). This is relatively benign and easy to spot if the phrase is bent so as to be not worth paying attention to the medium in question. Do this for web pages!
Applications • HTML/XML rsch, esp. LDAP, XSL/T, XX-TRC, ISO/GWM, etc. • Something to work on in the middle of the night before you give a keynote that you haven’t prepared for, when you really should be sleeping or, god forbid, working on your dissertation • There aren’t enough web pages yet • And now, a lamp:
Implementation Start with a random web page
Implementation • First idea: binary Markov chain… 6 bits of context 2 bits of context
Implementation • First idea: binary Markov chain… …sucks. 6 bits of context 2 bits of context
Implementation • Second idea: symbolic Markov chain… 1 symbol of context = Old English
Implementation • Second idea: symbolic Markov chain… 5 symbols of context = web wacko
Implementation • Second idea: symbolic Markov chain… 8 symbols of context = already above average!
Implementation • A better idea: tree-structured Markov chains (Markov trees) • HTML tags form a tree • Each tag contains a list of children • Markov model generates lists of children • Use different model for every “vertical context” (suffix of path in the tag tree)
For more “information”: http://www.cs.washington.edu/homes/csk/disweb/