300 likes | 463 Views
News research in a multilingual Europe. News Division SLA Wil Roestenburg Head Information & Documentation PCM Landelijke Dagbladen. News research in a multilingual Europe. Some facts about PCM / Dutch newspapers Some facts about PCM news libraries (I&D) Dutch content in LNX
E N D
News research in a multilingual Europe News Division SLA Wil Roestenburg Head Information & Documentation PCM Landelijke Dagbladen Voordracht SLA DNWS / wr
News research in a multilingual Europe • Some facts about PCM / Dutch newspapers • Some facts about PCM news libraries (I&D) • Dutch content in LNX • Dealing with a multilingual web • Language technology and experiments in multilingual searching Voordracht SLA DNWS / wr
News research in a multilingual Europe • Some facts about PCM / Dutch newspapers • Some facts about PCM news libraries (I&D) • Dutch content in LNX • Dealing with a multilingual web • Language technology and experiments in multilingual searching Voordracht SLA DNWS / wr
Some facts about PCM Publishers • PCM Uitgevers / PCM Publishers • 4 National dailies (>1,041,000 circ.) • 3 Regional newspapers • Database FactLANE (till 2003) • Books • Education • www.pcmuitgevers.nl (English profile available in pdf) Voordracht SLA DNWS / wr
PCM Publishers organizational chart Voordracht SLA DNWS / wr
PCM 4 National dailies Algemeen Dagblad • Started in 1946 • Daily circulation: 320,303 NRC Handelsblad • Started 1828 • Daily circulation: 268,486 Trouw • 30 January 1943 • Daily circulation: 123,788 de Volkskrant • Since 1921 • Daily circulation: 328,931 Het Parool • Raised during WWII • Sold 2003 Voordracht SLA DNWS / wr
PCM in the Dutch market • Market shares main Dutch newspaper companies: • Telegraaf: 31% • PCM: 30% • Wegener: 28% • NDC: 7% • Others (<60.000): 4% • (Source: Oplagen Dagbladen 2002) Voordracht SLA DNWS / wr
News research in a multilingual Europe • Some facts about PCM / Dutch newspapers • Some facts about PCM news libraries (I&D) • Dutch content in LNX • Dealing with a multilingual web • Language technology and experiments in multilingual searching Voordracht SLA DNWS / wr
PCM Information & Documentation • 4 news libraries (front offices)- Amsterdam - Rotterdam • 1 dept. I&D Data (back office for content management, techniques & support) - Amsterdam • 2 project managers / coordinators / staff • 1 head I&D • total 58 fte (64 empl.) Sector Information & Documentation (I&D) Voordracht SLA DNWS / wr
PCM Information & Documentation Sector Information & Documentation (I&D) • Common back office I&D Data • central databases text / images EDDA (DC4) / FRS (Verity) • common contracts with vendors, training, CE, etc. • common project management / F&A / HRM • common intranet (ID-net) • common deals about exploitation / data sales Database FactLANE (spin off EDDA) • FactLANE (1988 – 2002) • Library • Alert • Data delivery • FactLane was sold to Lexis Nexis (Dec. 2002) Voordracht SLA DNWS / wr
News research in a multilingual Europe • Some facts about PCM / Dutch newspapers • Some facts about PCM news libraries (I&D) • Dutch content in LNX • Dealing with a multilingual web • Language technology and experiments in multilingual searching Voordracht SLA DNWS / wr
Judy Vezmar (LNX) on Dutch content Vol.27 No.2 - March/April 2003 The European Legal Information Market: An Interview with Judy Vezmar, CEO, LexisNexis Butterworths Tolley By Marydee Ojala • Editor The Netherlands is another strategic market. If you look at Dutch content, you would say, well, who cares other than people in the Netherlands, but, in fact, you have South Africa and other countries around the world where the Dutch language is understood. There'll be announcements in that area shortly that I can't really talk about today. [On December 16, 2002, LexisNexis Group announced the acquisition of FactLANE, a Dutch online news service, from PCM Uitgevers. FactLANE adds significant, unique content to the LexisNexis business intelligence product, including NRC Handelsblad and de Volkskrant.] Voordracht SLA DNWS / wr
News research in a multilingual Europe • Some facts about PCM / Dutch newspapers • Some facts about PCM news libraries (I&D) • Dutch content in LNX • Dealing with a multilingual web • Language technology and experiments in multilingual searching Voordracht SLA DNWS / wr
The Multilingual Challenges - 1 Is the online-world on the threshold of a “multilingual revolution”? • In 5 years time the number of internet users will double • From 500 million to 1 billion • Average growth of almost 275.000 new internet users per day • More than 80% have not mastered English at all! • The online world is a multilingual world: lingual diversity might affect our daily work Voordracht SLA DNWS / wr
The Multilingual reality in Europe Voordracht SLA DNWS / wr
The MultilingualOnline population - 1 http://www.global-reach.biz/globstats/evol.html Voordracht SLA DNWS / wr
The MultilingualOnline population - 2 March 2003 English: 35,2 % Chinese: 11,9 % Japanese: 10,3 % Spanish: 8,1 % Sept. 2002 English: 36,5 % Chinese: 10,9 % Japanese: 9,7 % Spanish: 7,2 % Voordracht SLA DNWS / wr
World population (6 billion) The Multilingual Challenges - 2 English 57% Non-English 43% Internet usage 1998 English / non-English (total 147 million) Internet usage 2002 English / non-English (total 619 million) Voordracht SLA DNWS / wr
The Multilingual Challenges - 3 Chart of Web content, by language English 68.4% Japanese 5.9% German 5.8% Chinese 3.9% French 3.0% Spanish 2.4% Russian 1.9% Italian 1.6% Portuguese 1.4% Korean 1.3% Other 4.6% Total non-English 31.8% Total Web pages: 313 B Source: Vilaweb.com, as quoted by eMarketer Voordracht SLA DNWS / wr
News research in a multilingual Europe • Some facts about PCM / Dutch newspapers • Some facts about PCM news libraries (I&D) • Dutch content in LNX • Dealing with a multilingual web • Language technology and experiments in multilingual searching Voordracht SLA DNWS / wr
The Multilingual Challenges - 4 • How do we deal with foreign content? • Language Technology and (global) news research • Experiments on multilingual / cross lingual searching / retrieval (e.g. www.newsallovertheworld.nl) • Language Technology and information management • Automatic indexing and categorization (Irion / Seybold report, Vol. 2, No. 6, June 17, 2002) • “Automatic Enhanced Searching” • Automatic translation (MT / TM / concepts & semantic network) Voordracht SLA DNWS / wr
www.newsallovertheworld.nl • Started as a joke • Multilingual search engine “21” • Crawls and indexes 50-100 newspapers > 250-300 • Search with: “all the words”, “best phrase” • “Fahrad = fiets = bicyclette = bicycle = ?? …” • Search on compounds • Automatic translation (semantic network Wordnet) Voordracht SLA DNWS / wr
Experiment: ml search engine Voordracht SLA DNWS / wr
Experiment: ml search engine Voordracht SLA DNWS / wr
Examples Voordracht SLA DNWS / wr
Examples Voordracht SLA DNWS / wr
Examples Voordracht SLA DNWS / wr
News research in a multilingual Europe News Division SLA Wil Roestenburg Head Information & Documentation PCM Landelijke Dagbladen Voordracht SLA DNWS / wr
THANK YOU – DANK U WEL - GRACIAS Voordracht SLA DNWS / wr