1 / 8

CyberMiner Software Architecture Group

Kimberly West, Nadia Noori, Stanislav Minkevych. CyberMiner Software Architecture Group. Basic Goal : Web Search Engine that : Accepts list of keywords Returns list of URLs whose description contains any of the given keywords

Download Presentation

CyberMiner Software Architecture Group

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kimberly West, Nadia Noori, Stanislav Minkevych CyberMiner Software Architecture Group Basic Goal : Web Search Engine that : Accepts list of keywords Returns list of URLs whose description contains any of the given keywords Uses KWIC Key Word In Context to maintain database of URL & description

  2. Requirements Specification Functional : After input, the descriptor part of the line is circularly shifted by repeatedly removing the first word and appending it to the end of the line Outputs a list of all circular shifts of the descriptor parts of all lines in alphabetically ascending order, together with their corresponding URLs No noise words such as “a”, “the”, or “of” at the start of output list lines Grow indices with possible later additions

  3. Requirements Specification Non-Functional : Easily Understood & Used – clear use capabilities, features, simplicity to design Portability/ Reuse – not restricted to certain operating systems, machines, or certain developers, anyone can use the system & understand its architecture to adapt it to their environment, few system limitations Traceability – object oriented style using abstract data types, each process is linked to a specific individual module Good Performance & Responsive – readily & easily reacts to changes, output to input ratio, time factor

  4. Components & Connections : Indexing Repository contains the full HTML of every web page documents are stored one after the other and are prefixed by ID, length, and URL requires no other data structures to be used in order to access it (helps with data consistency and makes development easier) Index keeps information about each document, is a fixed width index, ordered by docID contains current document status, pointer into the repository, a document checksum, various statistics If the document has been crawled, also contains a pointer into a variable width file called docinfo which contains its URL and title Otherwise the pointer points into the URL list which contains just the URL

  5. Line Storage • Create, access, and possibly delete character, words, and lines • listens for InputEvent using the interface LSListener • Store the lines • LineStorage generates eventcalled LSEvent

  6. Line Storage • Procedure setchar (l-line, w-word, c-char, a) • Function char (l-line, w-word, c-char) • returns an character representing the c-th character in the w-th word of l-th line • return blank if out-of-range • Function word ( l-line) • returns the number of words in line l

  7. Subprogram call System I/O Implicit invocation Master Control Output Input Line Storage Circular Shift Alphabetizing Searcher Control Input medium Output medium

  8. CyberMiner Engine • Searches indexed keywords • Uses Boolean arguments • Case-sensitivity selector

More Related