1 / 53

LOD 를 말하다 !

LOD 를 말하다 . LOD 를 나누다 ( Linked Data Party 5). LOD 를 말하다 !. 2014.6.27 김 우 주 연세대학교 정보산업공학과. 목차. LOD 를 말하다 !. 빅데이터 시대와 정보의 홍수 빅데이터 활용 사례 빅데이터의 한계와 극복 방안 Linked Data 의 구축과 활용 LOD 2 - 시맨틱 기술의 미래. LOD 를 말하다 !. I. 빅데이터 시대와 정보의 홍수. An Instrumented Interconnected World. 빅데이터 시대와

kynan
Download Presentation

LOD 를 말하다 !

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LOD를 말하다. LOD를 나누다 (Linked Data Party 5) LOD를 말하다! 2014.6.27 김 우 주 연세대학교 정보산업공학과

  2. 목차 LOD를 말하다! 빅데이터 시대와정보의 홍수 빅데이터 활용 사례 빅데이터의 한계와 극복 방안 Linked Data의 구축과 활용 LOD 2 - 시맨틱 기술의 미래

  3. LOD를 말하다! I. 빅데이터 시대와 정보의 홍수

  4. An Instrumented Interconnected World 빅데이터시대와 정보의 홍수 4.6 billion camera phones world wide 30 billion RFID tags today (1.3B in 2005) 12+ TBsof tweet data every day 100s of millions of GPS enabled devices sold annually ? TBs ofdata every day 2+ billion people on the Web by end 2011 25+ TBs oflog data every day 76 million smart meters in 2009… 200M by 2014

  5. Information Overflow on the Web 빅데이터 시대와 정보의 홍수 • Growth of the Web • The amount of information available on the Web grows so fast. • The February 2014 survey shows there exist at least 920,120,079sites (http://news.netcraft.com/archives/category/web-server-survey/).

  6. Information Overflow on the Web 빅데이터 시대와 정보의 홍수 • The Indexed Web contains at least 19.8 billion pages (Sunday, 02 March, 2014). • http://www.worldwidewebsize.com/

  7. 빅데이터란? 빅데이터 시대와 정보의 홍수 • 빅데이터란? (07/11/2013, European Commission) • Every minute the world generates 1.7 million billion bytes of data, equivalent to 360,000 standard DVDs. • The big data sector is growing at a rate of 40% a year. • 무엇이 빅데이터를 중요하게 하는가? • Big data is already affecting all areas of the economy. • Data-driven decision making leads to 5-6% efficiency gains in the different sectors observed. • Intelligent processing of data is also essential for addressing societal challenges.

  8. IBM의예측: 2014년 6대 빅데이터트렌드 빅데이터 시대와 정보의 홍수 • 직감보다는 더 분석적인 경영 방식 • Companies will grow increasingly data driven and willing to apply analytics-derived insights to key business operations. • 빅데이터 프라이버시와 보안 문제 • Organizations will make a greater effort to build security, privacy, and governance policies into their big data processes.    • 빅데이터에 대한 투자 확대 • CDO(Chief Data Officer)의 등장 • More organizations will bring a chief data officer (CDO) on board. • 보다 유용한 빅데이터 응용 시스템 • 외부 데이터에 대한 관심 증대

  9. LOD를 말하다! II. 빅데이터 활용 사례

  10. 구글의 독감 트렌드 빅데이터 활용 사례 • ‘독감’ 관련 검색어 분석을 통한 독감 예보 가능성 확인 • 구글 검색 사이트에 사용자가 남긴 검색어의 빈도를 조사, 독감 환자의 분포 및 확산 정보 제공

  11. 샌프란시스코, 범죄 예방 시스템 빅데이터 활용 사례 • 과거 범죄 발생 지역과 시각 패턴 분석을 통한 경찰 인력 배치 • 과거 발생한 범죄 패턴을 분석하여 후속 범죄 가능성 예측 • 과거 데이터에서 범죄자 행동을 분석하여 사건 예방을 위한 해법 제시

  12. 미국 국세청, 탈세 방지 시스템 빅데이터 활용 사례 • 빅데이터 분석을 통한 탈세 및 사기 범죄 예방 시스템 구축 • 사기 방지 솔루션, 소셜 네트워크 분석, 데이터 통합 및 마이닝 등 활용 • 세금 누락 및 불필요한 세금 환급 절감의 효과 발생

  13. KT, 서울특별시 – 빅데이터 기반심야버스 노선 정책 지원 빅데이터 활용 사례 • 심야버스 노선 결정을 위한 유동인구 분석 및 노선 분석 • 서울시의 교통 환경(정류장/전용차로/환승)기반 지역별 최적 정류장 위치를 도출하고 KT의 CDR데이터 기반 심야시간 유동인구 및 목적지 통계를 융합하여 노선 검증

  14. 비씨카드, 점포 평가 서비스 빅데이터 활용 사례 • 소상공인 창업 성공률 제고를 위한 상가데이터 및 신용카드거래데이터 기반의 빅데이터 분석 • 점포이력, 상권분석, 업종추천 등이 이루어지는 과거현황분석, 추천 업종 또는 사용자 선택 업종 매출예측, 수익예측 등의 서비스 제공

  15. LOD를 말하다! III. 빅데이터의 한계와 극복 방안

  16. Information Overflow Problems 빅데이터의한계와 극복 방안 Not data (search), but integration, analysis and insight, leading to decisionsanddiscovery • Problems • How to cover all available information? - Recall • How to find the relevant information? - Precision

  17. Example Query to Google 빅데이터의 한계와 극복 방안 ‘iPad’ 검색 사례

  18. Information Silo Problem 빅데이터의 한계와 극복 방안 Stove-piped Systems and Poor Content Aggregation

  19. Semantic Interoperability 빅데이터의 한계와 극복 방안 • To cope with the problems mentioned in the preceding slide, we need Semantic Interoperability. • Semantics • “The meaning or the interpretation of a word, sentence, or other language form.” • What is Semantic Interoperability? • “Processing or Integration of resources based on the understanding what’s intended or expressed by other systems or parties.’’

  20. Front-endedness? 빅데이터의 한계와 극복 방안

  21. What if I want to ... 빅데이터의 한계와 극복 방안 • Movemy content from one place to another? • RSS ? Not enough • Aggregatemy data • An open FriendFeed? • Re-usemy Flickr friends on Twitter? • Invite. Again and again ... • The Semantic Web and Ontology can help ! • By providing a common framework to interlink data from various providers in an open way.

  22. How is it Possible? 빅데이터의 한계와 극복 방안 Ontology: Agreement with Common Vocabulary & Domain Knowledge Semantic Annotation: metadata (manual & automatic metadata extraction) Reasoning: semantics enabled search, integration, analysis, mining, discovery

  23. Semantic Web Layer Cake 빅데이터의 한계와 극복 방안

  24. Three Technical Building Block 빅데이터의 한계와 극복 방안 • Basic Building Block • URIs for unambiguous names for resources, • RDF for common data model for expressing metadata, • Ontology(OWL) for common vocabularies. • Semantic Web becomes: • web of data/things/concepts • What is a Thing/Concept? It can be anything in the world - a movie, a person, a disease, a location… • Machines will be able to understand the concept behind a html page. • This page is talking about ‘Barack Obama’, He is a ‘Person’ and he is the ‘President of USA’ ?

  25. Who borrowsthis Idea? 빅데이터의 한계와 극복 방안 • Facebook • Facebook Open Graph Protocol and Graph Search • Google • Knowledge Graph • Twitter • Real-time Semantic Web with Twitter Annotations

  26. LOD를 말하다! IV. Linked Data의 구축과 활용

  27. Linked Data Linked Data의 구축과 활용 • Building a “Web of Data” to enhance the current Web • The Linking Open Data (LOD) project: • http://linkeddata.org/ • Translating existing datasets into RDF and linking them together. • For example, DBpedia (Wikipedia) and GeoNames, Freebase, BBC programmes, etc. • Government data also available as Linked Data • DATA.gov • DATA.gov.uk

  28. The LOD cloud Linked Data의 구축과 활용 2007 2008

  29. The LOD cloud Linked Data의 구축과 활용 2008 2009

  30. Web of Data Linked Data의 구축과 활용

  31. Web of Data (Statistics) Linked Data의 구축과 활용 • The size of the Web of Data • The size of the Web of Data can be estimated based on the data set statistics that are collected by the LOD community in the ESW wiki. • According to these statistics, the Web of Data currently consists of 31 billion RDF triples, which are interlinked by around 500 million RDF inter-links (09/19/2011).

  32. Types of Linked Data Applications Linked Data의 구축과 활용 Linked Data의 활용 방안

  33. Semantic Search Engines Linked Data의 구축과 활용 • Top 7 Semantic Search Engines as An Alternative to Google • Kngine • Hakia • Kosmix: now is part of @WalmartLabs • DuckDuckGo • Evri: specialized for iPad and iPhone • Powerset: now is part of Bing • Truevert: focus only on environmental concerns.

  34. LOD를 말하다! V. LOD 2 - 시맨틱기술의 미래

  35. LOD2 : What is LOD2? LOD 2 - 시맨틱 기술의 미래 • LOD2(Linked Open Data) • LOD2 isthe large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. • Started in September 2010 • Partners • 14 partners (11 European Country)

  36. LOD2 : Objectives of LOD2 LOD 2 - 시맨틱 기술의 미래 • LOD2 Project Objectives • Achieving visualization, deployment, sharing, accessibility for linked open data by software technology. • Increase visibility of Linked Data activities [Visualization] • Support deployment Linked Data components [Deployment] • Improve information sharing between Linked Data components so that publishing Linked Data is eased. [Sharing] • Improve access to the content: the online Linked Open Data [Accessibility] • Improve the software technology which support it [By software technology]

  37. LOD2 Stack : Overview LOD 2 - 시맨틱 기술의 미래 • LOD2 Stack • LOD2 project provides LOD2 Stack for the sake of easy access to linked data software. • the LOD2 software stack is an integrated distribution of aligned tools supporting the life-cycle of Linked Data from extraction, authoring/creation over enrichment, interlinking, fusing to visualization and maintenance

  38. LOD2 Stack 3.0 LOD 2 - 시맨틱 기술의 미래

  39. LOD2 Stack : The overview of tools LOD 2 - 시맨틱 기술의 미래 • Apache Stanbol • In the LOD2 Stack, Apache Stanbol can be used for NLP serviceswhich rely on the stack internal knowledge bases, such as named entity recognition and text classification. • CubeViz • CubeViz is a facetted browser for statistical data utilizing the RDF Data Cube vocabulary which is the state-of-the-art in representing statistical data in RDF.

  40. LOD2 Stack : The overview of tools LOD 2 - 시맨틱 기술의 미래 • Dbpedia Spotlight • DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. • D2RQ • D2RQ is a system for accessing relational databases(RDBMS) as virtual RDF graphs.

  41. LOD2 Stack : The overview of tools LOD 2 - 시맨틱 기술의 미래 • DL-Learner • The DL-Learner software learns concepts in Description Logics (DLs) from user-provided examples. (Supervised-learning) • ORE • The ORE (Ontology Repair and Enrichment) tool allows for knowledge engineers to improve an OWL ontology by fixing inconsistencies and making suggestions for adding further axioms to it.

  42. LOD2 Stack : The overview of tools LOD 2 - 시맨틱 기술의 미래 • Poolparty • The PoolParty Extractor (PPX) offers an API providing text mining algorithms based on semantic knowledge models.

  43. LOD2 Stack : The overview of tools LOD 2 - 시맨틱 기술의 미래 • SemMap • SemMap allows to visualize knowledge bases having a spatial dimension. • Silk • The Silk Link Discovery Framework supports data publishers in accomplishing the second task. Using the declarative Silk - Link Specification Language (Silk-LSL), developers can specify which types of RDF links should be discovered between data sources as well as which conditions data items must fulfill in order to be interlinked.

  44. LOD2 Stack : The overview of tools LOD 2 - 시맨틱 기술의 미래 • Sieve • Sieve allows Web data to be filtered according to different data quality assessment policies and provides for fusing Web data according to different conflict resolution methods. • LIMES • LIMES is a link discovery framework for the Web of Data. It implements time-efficient approaches for large-scale link discovery based on the characteristics of metric spaces.

  45. Silk : Link Discovery Framework LOD 2 - 시맨틱 기술의 미래 • Interlinking and Fusion Stage Component of LOD2 Stack • Can be used by data providers to generate RDF links between data sets on the web of data • Especially, to set explicit RDF links between data items within different data sources • “Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web”

  46. Silk : Silk – Link Specification Language Example LOD 2 - 시맨틱 기술의 미래 Confidence value is the average of two compared weight Numeric differences between parameters • Aggregation Example: • Combines multiple confidence values into a single value (average)

  47. DL-Learner LOD 2 - 시맨틱 기술의 미래 • Introduction • The goal of DL-Learner is to provide a DL/OWL based machine learning tool to solve supervised learning tasks. • The DL-Learner software learns concepts in Description Logics (DLs) from examples.

  48. DL-Learner : Features LOD 2 - 시맨틱 기술의 미래 • Learning Problems • Positive and Negative Examples (=previous example) • Class Learning • Find out Class Expression for given class • father

  49. Demo of SDT Plug-in to Protégé LOD 2 - 시맨틱 기술의 미래

  50. SWCL - Sample Example LOD 2 - 시맨틱 기술의 미래 PopulationValue Country positiveInteger hasPart ? PopulationValue Province positiveInteger

More Related