1 / 10

The Lifecycle of “Transcripts”

The Lifecycle of “Transcripts”. Day 1: Interview taped; Harry enters in database for everything but URL. Day 12 to 24: Still photos for header image produced: URL created DVCAM sent to UCTV URL created (usually before the transcript)  “dummy page”

argyle
Download Presentation

The Lifecycle of “Transcripts”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Lifecycle of “Transcripts” Day 1: Interview taped; Harry enters in database for everything but URL. Day 12 to 24: Still photos for header image produced: URL created DVCAM sent to UCTV URL created (usually before the transcript)  “dummy page” Audio tape sent to transcriber off campus. ~ 14 days after audio sent: Transcript done. ~ 14 days after transcript done: Then light edit; posted. ~ 30 to sixties after interview taped: UCTV broadcast. THE SECRET LIFE OF THE TRANSCRIPT Contractor transcribes audio  raw Word file Harry annotates (adds outline and chunks the text) Letitia edits the html version (DreamWeaver) and posts Additional changes made by interviewee (up to 4 weeks after initial post)

  2. A Typical Transcript:Dec. 7, 2000: James FallowsOutline + Linked pages

  3. What’s in a Transcript? STATES: • Transcripts in a state of being: “Dummy pages” (transcripts in “gestation”) • Transcript exists but not posted (e.g. when interviewee hasn’t okay-ed it yet) • Transcript not produced yet (e.g. audio not sent to transcriber – due to time and/or funding) • Transcripts posted • Transcripts edited after being posted • Also: • Transcripts in limbo (not scheduled for transcription– postponed) • With dummy page (e.g. 2005:SEN – see next page) • Without dummy page (e.g. 1987: LUTTWAK) FORMS: • Full-text form (1 piece – no outline or chunking): 2003: Terkel http://globetrotter.berkeley.edu/people3/Terkel/terkel-con0.html 1983: Thompson http://globetrotter.berkeley.edu/conversations/Thompson/thompson-con0.html • Regular form (chunked into pieces according to an index) EXAMPLES: • 2004: Englehardt page 1 has 2 embedded pics... http://globetrotter.berkeley.edu/people4/Engelhardt/engelhardt-con0.html • 2002: Lustick see photos (whole page!) -- also Norman Myers – also Wendy Ewall http://globetrotter.berkeley.edu/people2/Lustick/lustick-con0.html • 1999: Shinoda & Iwashita: use of different colors in the interview http://globetrotter.berkeley.edu/conversations/Shinoda_Iwashita/shinoda_iwashita0.html

  4. More on Transcripts “in limbo” ("T"=transcript posted, "V"=video link present). T V __ ___ n y 2005 - Sen --> transcript exists but hasn’t been posted yet n y 2004 - Joffe --> no time for producing transcript yet n y 1990 - Howard --> historical - no time for producing transcript yet n y 1989 - Zumwalt --> ditto n y 1988 - Lewis --> ditto n y 1986 - Atherton --> ditto n y 1985 - Fraser --> ditto n y 1984 - Warnke --> ditto n y 1984 - Carrington --> ditto Andriessen through Campo were put up as is because of a grant from the EU and the particulars are: T V __ ___ n n 2004 – Wales --not done n n 2003 – Zhou --not done n n 1999 -*Krenzler --not done n n 1998 -*Donnelly --not done n n 1992 – Andriessen --not done n n 1990 - Lord Colesshill --not done n n 1990 – Hanrieder --worth doing not yet done n n 1988 – Tornudd --worth doing not yet done n n 1988 - a Campo --worth doing not yet done

  5. Transcript Typology TYPES of transcripts REGULAR Transcripts: • 1 transcript - 1 video (i.e. 2000:Leon Panetta) IRREGULAR Transcripts: • 1 transcript - 2 videos (i.e. 2003:Josef Joffe) • 1 transcript - 1 video - 1 translated transcript (i.e. 1999:Alice Karekezi - French 2002:Massimo D’Alema - Italian) • 2 transcripts - 1 video (i.e. 1999:Mark Danner) • 0 transcripts - 1 video (i.e. 1987:Edward Luttwak) • 1 transcript – 0 videos (i.e. 2002:Henri Peretz)

  6. “Smart Harvesting of Transcripts” Algorithm • For each chronological listing page (chron & chron2): • Extract all semantically valid showID:transcriptURL pairs -> video-link prefix name + transcript URL based on “transcript type” and “transcript and video-link dates”(see 6 cases shown on previous slide) • For each pair, crawl the main transcript URL, using the transcript outline to inline the linked pages(tables and images are filtered out) • Save each “blended” .html page to a separate file

  7. Example Using “chron2.html”

  8. 1. Extract “showID:transcriptURL” Pairs Retrieve webpage: http://globetrotter.berkeley.edu/conversations/chron2.html ... Retrieve all the URLs and parse them... 11291:http://globetrotter.berkeley.edu/people/Fallows/fallows-con0.html 6223:http://globetrotter.berkeley.edu/people/Haas/haas-con0.html 6796:http://globetrotter.berkeley.edu/people/Hoffman/hoffman-con0.html 9159:http://globetrotter.berkeley.edu/conversations/Haglund/haglund-con0.html 6233:http://globetrotter.berkeley.edu/people/Herman/herman-con0.html 7790:http://globetrotter.berkeley.edu/conversations/Stark/stark-con0.html 7984:http://globetrotter.berkeley.edu/people/Heyman/heyman-con0.html 7133:http://globetrotter.berkeley.edu/people/Panetta/panetta-con0.html … 7128:http://globetrotter.berkeley.edu/people/Joffe/joffe-con0.html 7129:http://globetrotter.berkeley.edu/people/Joffe/joffe-con0.html … 9143:http://globetrotter.berkeley.edu/people/Kreisler/kreisler-con0.html 6013:http://globetrotter.berkeley.edu/people/Jacobs/jacobs-con0.html 9178:http://globetrotter.berkeley.edu/people/Tarnoff/tarnoff-con0.html 7782:http://globetrotter.berkeley.edu/people/Karekezi/karekezi-con.e0.html … 4946:http://globetrotter.berkeley.edu/conversations/Patten/patten99-con0.html 4944:http://globetrotter.berkeley.edu/conversations/Podhoretz/podhoretz-con0.html 8042:http://globetrotter.berkeley.edu/people/Danner/danner-con1.00.html … 9169:http://globetrotter.berkeley.edu/conversations/BeilinHusseini/ 8038:http://globetrotter.berkeley.edu/people/Berdahl/berdahl-con0.html 7062:http://globetrotter.berkeley.edu/people/Ellsberg/ellsberg98-0.html … 7134:http://globetrotter.berkeley.edu/Peress/peress-con0.html … 7900:http://globetrotter.berkeley.edu/conversations/Patten/patten0.html … 9146:http://globetrotter.berkeley.edu/conversations/Zumwalt/zumwalt-con0.html … 11290: … 9148:http://globetrotter.berkeley.edu/conversations/Atherton/atherton-con0.html 9164:http://globetrotter.berkeley.edu/conversations/Fraser/fraser-con0.html 9162:http://globetrotter.berkeley.edu/conversations/Carrington/carrington-con0.html 9150:http://globetrotter.berkeley.edu/conversations/Warnke/warnke-con0.html … 9165:http://globetrotter.berkeley.edu/conversations/Pauling/pauling-con0.html 9144:http://globetrotter.berkeley.edu/conversations/Habib/habib0.html

  9. 2. Resulting Blended & Filtered Transcript e.g.: James Fallows – showID=11291

  10. 3. List of Archival “Blended” .html transcripts

More Related