1 / 41

网络信息体系结构 Web-based Information Architecture

网络信息体系结构 Web-based Information Architecture. http://net.pku.edu.cn/~wbia 黄连恩 hle @net.pku.edu.cn 北京大学信息工程学院 09/10 /201 3. 本次课大纲. WBIA 是什么? WBIA 课程内容 WBIA 课程安排. WBIA 是?. 网络信息体系结构 Web-based Information Architecture. WBIA 不是 …. Web Information Architecture (Web 信息结构 )

skyla
Download Presentation

网络信息体系结构 Web-based Information Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 网络信息体系结构Web-based Information Architecture http://net.pku.edu.cn/~wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 09/10/2013

  2. 本次课大纲 WBIA是什么? WBIA课程内容 WBIA课程安排

  3. WBIA是? 网络信息体系结构Web-based Information Architecture

  4. WBIA不是… Web Information Architecture (Web信息结构) 如何构建大规模复杂的Web站点,有效的进行信息组织 Network Architecture (网络体系结构) 网络体系结构是关于完整的计算机通信网络的一幅设计蓝图,是设计、构造和管理通信网络的框架和技术基础。比如OSI,TCP/IP等 Semantic Web (语义网) "The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." 1

  5. WBIA是 Web Information ???

  6. Information是? “信息科学与技术学院” “IT” “信息时代” “信息化” “信息公开” “信息太少” “信息不畅” “信息高速公路” • 深圳证券市场9月9日中小企业板交易公开信息 • 国土资源政务信息网上公开情况如何 国土部将检查并公布 • 多渠道让房产信息更畅通 • …….

  7. History of “Information” Latin origin: a representation implanted in the mind-> idea Language and Coding:hide information in messages and then decode them。 莫尔斯电码 Mathematics: Shannon在channel transmission工作中,定义了一个message所包含的信息量为它在source中出现概率的log2 ,单位为’bits’。 Logic and linguistics:communication-oriented sense of information涉及到semantic meaning语义, knowledge知识 Society:information as something that is contained in the message used to inform. “information is the tennis ball of communication”

  8. Information Age & World Wide Web

  9. Web的支撑技术 用超文本技术(HTML)实现信息与信息的连接 用统一资源定位技术(URI)实现全球信息的精确定位 用新的应用层协议(HTTP)实现分布式的信息共享。 这三个特点无一不与信息的分发、获取和利用有关。Tim Berners-Lee说:"Web是一个抽象的(假想的)信息空间。"也就是说,作为Internet上的一种应用架构,Web的首要任务就是向人们提供信息和信息服务。

  10. Web增长 网站数目↑ ↑ ↑ 1993-1996, from 130 to 600.000 sites Netcraft said that In the August 2008 survey we received responses from 176,748,506 sites. (135,166,473 sites one year before) Exponential Growth

  11. Tide of the age Web2.0 Web搜索大战 DotCom泡沫 浏览器大战

  12. AfterMath – Flourish of the Web 15年里改变世界的15个网站 www.eBay.com(电子港湾):在线拍卖和购物 www.wikipedia.com(维基百科) :免费百科全书 www.napster.com(纳帕斯特) :音乐共享 www.youtube.com:视频共享 www.blogger.com(博客网) www.friendsreunited.com(友聚网) :校友录 www.drudgereport.com(德拉吉报道) :个人媒体

  13. 丰富的web应用 www.myspace.com(我的空间):社交网络 www.amazon.com(亚马逊书店) :网上书店 www.slashdot.org:科技论坛 www.salon.com(沙龙网) :在线杂志 www.craigslist.org(克雷格列表) :分类广告 www.google.com(谷歌) :搜索引擎 www.yahoo.com(雅虎) :门户网站 www.easyjet.com(易航网) :廉价航空

  14. Web2.0 Buzzwords Web作为平台 DoubleClick .vs. AdSense Facebook Mash-up 利用集体智慧 Wikipedia Yahoo,ebay,amazon del.icio.us , Flickr 软件发布周期的终结 the perpetual beta

  15. WEB2.0

  16. WBIA关心…

  17. 我们面临的问题 “We are currently preparing our students for jobs that don’t yet exist …” “It is estimated that a week’s worth of the New York Times contains more information than a person was likely to come across in a lifetime in the 18th century” “The amount of new technical information is doubling every 2 years” “So what does IT ALL MEAN?”

  18. “We are living in exponential times “

  19. 信息过载 "As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almostas difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an immense multitude of bound volumes." 狄德罗(1713-1784, 法国哲学家, 批评家, 百科全书编者)

  20. 信息过载可能导致的后果 “I’m defining information overload as a state of having more information available than one can readily assimilate, that is, people have difficulty absorbing the information into their base of knowledge. This hindersdecision-making and judgment by causing stress and cognitive impediments such as confusion, uncertainty and distraction” Steve Beller,

  21. 造成信息过载的原因 A rapidly increasing rate of new information being produced The ease of duplication and transmission of data across the Internet An increase in the available channels of incoming information (e.g. telephone, e-mail, instant messaging, rss) Large amounts of historical information to dig through Contradictions and inaccuracies in available information A low signal-to-noise ratio A lack of a method for comparing and processing different kinds of information

  22. Political theorist Neil Postman spoke to the German Informatics Society in 1990, claiming that we are informing ourselves to death.  He argued that the development of computer technology is not as positive as it has been heralded to be.  With our focus on technology, we are forfeiting our humanity.  We are drowning in information that contains empty promises of improving our lives. (Postman 1990).

  23. 怎样应对信息过载?

  24. 两种不同的“观念” a “thrower-awayer” MyLifeBits Jennifer Widom Gordon Bell “丢弃,必要时再找回来的代价 要比维护它们要小得多” “trying to live an efficient life so that one has time to work and be with one’s family. “

  25. The Rise of Search Engine Web Search Engine成为目前最“热”的topic Web信息搜索和挖掘技术fight -> information overloading

  26. WBIA 核心问题是:Web时代的信息过载问题 Web搜索和挖掘成为了人们共同关注的热点领域 这个活跃领域里的重要问题、思想、方法和技术的介绍和讨论 “我”怎样应对这个问题,从这些知识学习中“我”能学到怎样的能力,“我”可以怎样去Fight!

  27. WBIA课程内容

  28. WBIA有点像… 信息检索 智能信息检索与Web搜索 数据挖掘 数据挖掘 机器学习 机器学习 模式识别 自然语言处理 计算语言学 其它 Web仓储技术

  29. 书名 价格 作者 出版日期 内容简介

  30. How to extract information ? “模版“提取技术 可以自动发现“模版”吗? 非结构化信息,信息在文字当中又如何提取呢? 书名 作者 内容简介

  31. How to do recommendation? 畅销书目 音乐排行榜 好友推荐 购买了此商品的人,也买了以下商品 …… • Common insight: personal tastes arecorrelated: • If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y • especially (perhaps) if Bob knows Alice

  32. What can be found when treat the data as network?

  33. What can be found when treat the data as network? 如何度量数据的重要性? 如果图链接关系意味着“推荐”,那么… 对每一篇网页,得到一个独立于查询词的相对“重要性”指标,将这个指标和查询匹配情况结合起来(以及其他因素),形成网页的排序。

  34. Advanced topics 事件追踪 Anti-Spamming 社会网络 ……

  35. WBIA课程安排

  36. 课程的组织与安排 课堂时间 讲课老师:黄连恩 13-14次讲课时间 1-2次报告和讨论 教学环节 (Meditation)思考:课后练习题,随堂测验 (Practice)实践:编程练习、课程项目 课程网站 主页 http://net.pku.edu.cn/~wbia

  37. 课程的要求 背景知识要求 线性代数,概率论和数理统计 程序设计( Java 、C/C++ 、Python、Matlab、whatever...) 成绩构成 随堂测验(Quiz,2 次 ),20% 课程项目(course project,2 次),40% 期末考试,40% 其它要求 每节课课堂3小时 课后时间(阅读,思考题和编程练习)

  38. 教材和参考材料 主要参考书 W.Bruce Croft, Donald Metzler, Trevor Strohman. 2009. Search Engines: Information Retrieval in Practice, Pearson Education [SE] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, 2008.Introduction to Information Retrieval, Cambridge University Press. [IIR]

  39. 本次课小结 核心问题是:Web时代的信息过载问题 “我”怎样应对这个问题,从这些知识学习中“我”能学到怎样的能力,“我”可以怎样去Fight! 通过Web搜索和挖掘领域里的重要问题、思想、方法和技术的介绍和讨论,我们来一起探索

  40. Thank You! Q&A

More Related