140 likes | 223 Views
Build the NY Times Subject Headings and Topics in the Cloud. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011. Preface.
E N D
Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011
Preface • For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, they began to publish this vocabulary as linked open data. The New York Times also uses approximately 30,000 tags to power their Times Topics Pages. It is their intention to publish all of these tags as linked open data. • Today AOL Government publishes both of those together as linked open data in Spotfire so our readers can more readily browse, search, and download these invaluable data sets!
data.nytimes.com These can be screen scrape into Excel! People is a 14 MB RDF file! See next slide http://data.nytimes.com/
Build Your Own NYT Linked Data Application • March 30, 2010, 1:21 PM Build Your Own NYT Linked Data Application ByEVAN SANDHAUS • That’s It?: • So there you have it — all it takes to build a simple linked data application with New York Times Linked Open Data. But remember: this post just focuses on the highlights. We encourage you to take a closer look at the code and dig into some of the more advanced features we didn’t discuss. We hope that you share our excitement about the possibilities of linked data, and we look forward to seeing what you create! http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/
Alumni in the News Opens and Closes Snippet http://select.nytimes.com//2005/10/15/business/15nocera.html http://topics.nytimes.com/top/reference/timestopics/people/l/frank_lorenzo/index.html http://data.nytimes.com/schools/schools.html
“Who Went Where” Code 833 lines of code! http://data.nytimes.com/code/schools.html
Subject Headings See next slide http://data.nytimes.com/home/a.html
Subject Headings See next slide http://data.nytimes.com/86075200336035840002
Using Our Linked Data http://data.nytimes.com/home/about.html
Times Topics The New York Times uses approximately 30,000 tags to power our Times Topics Pages. It is our intention to publish all of these tags as linked open data. See next page http://topics.nytimes.com/topics/reference/timestopics/index.html
Times Topics See next page http://topics.nytimes.com/topics/reference/timestopics/all/a/index.html
Times Topics http://topics.nytimes.com/top/news/business/companies/a-m-castle-and-company/index.html
Spotfire • Describe the chart, how it’s made: • The Spotfire chart was made by screen scraping the NY Times Subject Headings and Topics into an Excel spreadsheet and importing it into Spotfire. The author decided to place the two listings side-by-side as Tufte suggests to facilitate comparisons. The author also decided to re-create the summary table of Subject Heading categories to see how much change had occurred between January 13, 2010, and July 4, 2011 (very little). • How it succeeds or falls short • This single Spotfire chart makes the two lists at the NY Times sortable (click on column headers), searchable (use Filters and facets), and downloadable (click on the down arrow in the table header in the Spotfire Web Player). • Add any tips for improving: • The NY Times Topics need URLs (25,389) and the author will find a way to automate that task and will soon finish adding the URLs for NY Time Reporters by-hand.
Spotfire PC Desktop Spotfire