160 likes | 255 Views
My Website Was Lost, But Now It’s Found. Frank McCown CS 110 – Intro to Computer Science April 23, 2007. Frank McCown. Education Ph.D. in Computer Science – Old Dominion Univ. (2007 expected) M.S. in Computer Science – Univ of Arkansas in Little Rock (2002)
E N D
My Website Was Lost, But Now It’s Found Frank McCown CS 110 – Intro to Computer Science April 23, 2007
Frank McCown • Education • Ph.D. in Computer Science – Old Dominion Univ. (2007 expected) • M.S. in Computer Science – Univ of Arkansas in Little Rock (2002) • B.S. in Computer Science – Harding University (1996) • Work Experience • 1997-2004 – Instructor of CS at Harding University (Searcy, AR) • 1996-1997 – Software Eng for Lockheed Martin (Denver, CO) • 1995 – Software Engineer Intern for Auto-trol (Denver, CO) • Honors • 2007 – Outstanding Graduate Research Assistant • 2006 – College of Sciences Dissertation Fellowship • 2005 – Outstanding Graduate Assistant • 2004 – Dominion Scholar
No preference Academia Industry Industry vs. Academia 2000 survey by The Scientist magazine asked their readers: Overall which environment do you prefer? 73% of survey respondents had held research positions in industry and academia. http://www.the-scientist.com/2001/4/16/28/2/
Industry vs. Academia • Movement • Academia Industry is common • Industry Academia very uncommon • Flexibility • Schedule • Focus • Compensation
Research Interests • Digital preservation • Will we be able to see our websites 20 years from now? • Web crawling • How can search engines and web archives duplicate/ download our websites more efficiently and effectively? • Search engines • How much/what content do commercial search engines index and cache? • How synchronized are search engines APIs with what the general user sees?
Black hat: http://img.webpronews.com/securitypronews/110705blackhat.jpgVirus image: http://polarboing.com/images/topics/misc/story.computer.virus_1137794805.jpg Hard drive: http://www.datarecoveryspecialist.com/images/head-crash-2.jpg
First developed in fall of 2005 • Available for download at http://www.cs.odu.edu/~fmccown/warrick/ • www2006.org – first lost website reconstructed (Nov 2005) • DCkickball.org – first website someone else reconstructed without our help (late Jan 2006) • www.iclnet.org – first website we reconstructed for someone else (mid Mar 2006) • Internet Archive officially endorses Warrick (mid Mar 2006)
Warrick-related Publications • Frank McCown, Norou Diawara, and Michael L. Nelson. Factors Affecting Website Reconstruction from the Web Infrastructure. JCDL 2007. June 2007. Vancouver, British Columbia, Canada. • Catherine C. Marshall, Frank McCown, and Michael L. Nelson. Evaluating Personal Archiving Strategies for Internet-based Information. IS&T Archiving 2007. May 2007. Arlington, Virginia. • Frank McCown and Michael L. Nelson. Characterization of Search Engine Caches. IS&T Archiving 2007. May 2007. Arlington, Virginia, USA. • Frank McCown, Joan A. Smith, Michael L. Nelson, and Johan Bollen. Lazy Preservation: Reconstructing Websites by Crawling the Crawlers. WIDM 2006. November 2006. Arlington, Virginia. • Frank McCown and Michael L. Nelson. Evaluation of Crawling Policies for a Web-Repository Crawler. HYPERTEXT 2006. August 2006. Odense, Denmark.
Search Engine APIs Frank McCown and Michael L. Nelson. Poster: Search Engines and Their Public Interfaces: Which APIs are the Most Synchronized? WWW 2007 Frank McCown and Michael L. Nelson. Agreeing to Disagree: Search Engines and their Public Interfaces. JCDL 2007
Thank You Questions?