590 likes | 605 Views
MyLifeBits: Personal archive issues Archiving persons & things, past & future in cyberspace not cardboard boxes Imaging Science & Technology Conference on Archiving San Antonio, TX. Gordon Bell http://research.microsoft.com/~gbell http://www.mylifebits.com Microsoft Bay Area Research Center
E N D
MyLifeBits: Personal archive issuesArchiving persons & things, past & futurein cyberspace not cardboard boxesImaging Science & Technology Conferenceon Archiving San Antonio, TX Gordon Bell http://research.microsoft.com/~gbell http://www.mylifebits.com Microsoft Bay Area Research Center 22 April 2004
Overview • “Just digitize”-- have “meta-data” sans the morass • Archive.org: “Access to all human knowledge” • Challenge: Archiving Corporate & Personal Lives • California Finder; e.g. Apple Collection • Einstein, Allen Newell, Joshua Lederberg • What small fraction of their lives? • ChM: Collecting companies, computers, & people • MyLifeBits: Realizing Bush’s Memex o(1TB/life) • Dear “appy” and other problems
Some aspects…bottom line • Storage is free: Just move “it*” there with meta-data; Many others are doing it. But will anyone ever find “it”? • Projects: archive.org, million book project, prof. orgs. • Born Digital; LofC & Google; library & institutional capture • When will “born digital” helps archiving? What is needed? • Distributed scanning & meta-data creation at ChM • Finding aids; authored web sites for boxes of paper. Moving beyond computerizing “card catalogs” • How do you segment & D.coreize a paper archive? • Value beyond year, title, author, genre? Algorithms needed! • Automatic Dublin Coreization is critical to scale! • Many issues: IP, longevity aka “dear appy”, privacy, …because of the ubiquity of the technology & we can *Our cyber content
The “dear appy” problem Dear Appy, How committed are you?Please come back to me.Forever yours truly, Lost and forgotten data • Who’s responsible? • Media: the 8 track cassette, 8” floppy problem • Platform, file, and maybe a database • Encodings: evolving, incompatible format standards for legacy data that disregard ancestors • App: evolving and/or disappearing apps
By Gordon Bellhttp://research.microsoft.com/~gbell Dear Appy, How committed are you? Signed, Lost and Forgotten Data Dear Appy, I'm having trouble with long-term commitment -- not on my end, heaven knows, but from the apps that created me and with whom I like to associate. Over time, these pesky apps evolve and they simply don't recognize the data that they once helped create! But, we data progeny -- and there are lots of us -- feel that as our creators, should be responsible for eternal support. But the little problem with recognition isn't the worst of it – sometimes the apps even disappear altogether. I ask you, is it expecting too much for 20-something year old data like me to be interpretable by my app (e.g. Acrobat, DB2, Draw, Eudora, Office, Quicken, or RealNetworks), or am I just associating with irresponsible apps? If things continue on their current path, it seems I will be completely gone and un-interpretable within 20 to 50 years! My apps will move to other platforms, or evolve to be more Internet- or Next-Big-Thing-centric...
LofC book library o(18Mbooks) in o(50TBytes) or on 150 drives Our 2004 home media centers are 8 TBytes!
Book picture Capturing content from the physical world
Why preserve an “original” reprint? “Xerox” copy? Or laser printer output? At $25/cu.ft./yr. for 2500 pages ($0.01/page per year)
Archive.org • “Universal access to all human knowledge” • Started as archiving the internet. • Includes 20 WW TV channels • Book scanning as part of million book project • Bookmobile “print anything on demand” • 100s of rock n’ roll bands • 20K accessible movies & video lectures
Archive.org$2K/terabyte/year $0.0004/page/year Archive.org
Million book project US/English:1M, France:200 K, Japan:300 K $10 to scan, $100 to buy, scan, match with a catalog, and endow for future format changes Scanner picture Courtesy Brewster Kahle Archive.org
Rent’s cheap, but it’s hard to get there • Good news: it costs very little to live in cyberspace • Cyberspace: $0.0004/page/yr. You can spend 10-100x! • Physical: $0.01/page/yr. $25/cu.ft./yr. for 2500 pages,provided you don’t access it! • Bad news: it costs too much to move to cyberspace • Books at $10/book or <$0.10/page • Docs $0.10/page scan + $0.10 meta-data (manual); • $250-500/ft. • Cross-over… x years depending on interest rate, etc. • Obvious solutions: • Capture all new material before it gets into physical space • Automatically create meta-data (time, title, genre, author) • Just “Google”
Archiving persons and things… • www.oac.cdlib.org for 0(1K) corporations, people, places, things. • List of finders, usually -> paper boxes! • E.g. Apple collection at Stanford points to 600’ or say $1K/ft. • www.AlbertEinstein.org Einstein’s papers, etc. • diva.library.cmu.edu/Newell/ for Allen Newell • profiles.nlm.nih.gov/ Nobel Prize winners, Lederberg • www.ComputerHistory.org computing artifacts • www.MyLifeBits.com project to capture entire life
A giv Number of document segments
:f X/diary.nobel This is a transcript of JL diary note for October 26, 1958 announcement of Nobel Prize I was not keeping a diary in those days but this particular event led me to make notes on it just at the time. Joshua Lederberg. handwritten letter transcribed Mon Ott 5 13:02:49 EDT 1998 Sunday, 26 Oct. (1958 About 1l:OO this a.m. I had gone to the lab to clean up the grant applications * I’ve been working on (to the essential exclusion of my lab work in recent weeks!). I’d gotten up rather early, had some coffee for breakfast and left, while Esther was having hers. Last night: an Australian party at home -- the Crawfords from Melbourne (history); Phyllis Rowntree; Maggie Blackwood and the Leslie Osborns (Psychiatry here). I was to work at the lab until about 12:30, then pick up Phyllis and Margaret for lunch and then see Phyllis off to her plane: --> Columbus-->Denver--> SFO-->Sydney. At I1:30 + or there was a call from a Mr. Lindquist of the “Tijding...” newspaper in Stockholm -- the New York correspondent. He explained his call to my astonishment that Beadle, Tatum, and I were to be the co-recipients of the Nobel prize in medicine this year. I was rather incredulous: he insisted the AP was quoting the rumors and he was quite sure it would be announced Thursday. It’s no surprise, of course, that Beadle should be honored this way and it is a perceptive courtesy for Tatum but I am still quite astonished (as I was for the NAS last year) to be added on. I just had the impression that this kind of dignification in biology should go to the venerables and veterans and it is a bit of a shock to be classed that What a mixed list it is! The “distinction” works out to the cash and to the public fuss that somehow has grown up around it. 1908 was Ehrlich Metchnikov; Muller was 1946 and to think of it did NP give him such a fuss!?? ? have to think about scheduling trip to ST0 in December -- by jet? I suppose just have to concede that all our plans will be upset. . 4 lines deleted, family private
Abstracts Agendas Announcements m Application forms Articles m Autobiographies m Bibliographies m Biographies m Brochures m Certificates m Correspondence m Diaries m Drafts (documents) Drawings m Electronic images m Essays m Eulogies Excerpts Grant proposals Interviews m Invitations Laboratory notebooks m Laboratory notes Lecture notes Lectures m Legal documents m Legislative records Lists Manifestoes Memoirs m Minutes Monographs m Narratives Newsletters Newspaper columns m Notebooks m Notes Obituaries Official reports Oral histories m Petitions Photographic prints m Lederberg genre or artifact types Press releases m Procedures Proceedings m Programs m Proposals m Questionnaires Reminiscences Reports m Resolutions Resumes Reviews m School records Speeches m Summaries Tables (documents) Technical reports m Transcripts m Typescripts Video recordings m
Email as a carrier for many document types Any personal info Calendar, contact Clipping… biographical Correspondence (all) Diary, log, scrapbook Financial, forms, legal Photo, music, video Property Recommendation … Personal library Professional Plan, project, proposal Computer source code Correspondence Org chart Presentation & speeches Ad, announcement, cards (many kinds), certificate, ephemera & memorabilia, instruction, What we don’t know about Lederberg!
More aspects of personal archiveswill exacerbate content capture • Many new media…besides email • In effect, email is conversation. This adds tremendous noise for retrieval! • Who owns a person’s lives? Another person? A company? E.g. VAX Strategy? • Tablets to come will enhance notebook capture • ACM CARPE: Continuous Archival and Retrieval of Personal Experiences
Computer History Museum • 1401 Shoreline, Mountain View
Archiving computing artifacts • Charles Babbage Institute …Smithsonian is similar • 135 collections 8K cu.ft. (20 M pages; 2 TB) • 160 oral histories (30MB/hr =6000 MB) • 150 K photos (@1MB, 150 GB) • Computer history Museum • 6 K physical objects: world’s best artifact collection • 10 K photos • 2 K videos (<1 TB); including recent DV taped interviews • 12 M pages books, manuals, brochures, papers, (1.2 TB) • ?? Of executable source & object codes • 200 volunteers & many more world-wide Amateurs versus professionals.
Artifact (“the machine”) Dormant or operating Hardware or software Project, people, plan Timeline of project Plan, schedule Specification, manuals Design Organization Communication Articles, books Interviews, talks, etc. Business aspects Plan, sales, marketing Ads, brochures, etc. Competitors Use User experience Video about it’s use Accessibility Raw bits, finding aid Interpreted story Exhibit Computer History MuseumArtifact Collecting… the world is bits
MemexAs We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility” • Full-text search, text & audio annotations, and hyperlinks
The guinea pig • Gordon Bell is digitizing his life • Has now scanned virtually all: • Books written (and read when possible) • Personal documents (correspondence including memos and email, bills, legal documents, papers written, …) • Photos • Posters, paintings, photo of things (artifacts, …medals, plaques) • Home movies and videos • CD collection • And, of course, all PC files • Now recording: phone, radio, TV (movies), web pages… conversations and meetings to come • Paperless throughout 2002. 12” scanned, 12’ discarded. • Only 30 GB!!!
Wearable & interactive jewellery LEDs flash according to sensor type triggered
MyLifeBits organization: time and space Archival (time) Working Timeline/ Context(space) Personal (some $s) GB Co.(angel, etc.) Professional ACM, etc., … @Microsoft.com, New co’s.
Radio capture tool Telephone capture tool PocketPC transfer tool PocketRadio player TV capture tool Radio EPG tool TV EPG download tool MAPI interface Legacy email client Browser tool Internet files Legacy applications MyLifeBits Shell IM capture Voice annotation tool Text annotation tool Import files MyLifeBits Software MyLifeBits store database
Value of media depends on annotations • “Its just bits until it is annotated”