80 likes | 103 Views
Kimberly West, Nadia Noori, Stanislav Minkevych. CyberMiner Software Architecture Group. Basic Goal : Web Search Engine that : Accepts list of keywords Returns list of URLs whose description contains any of the given keywords
E N D
Kimberly West, Nadia Noori, Stanislav Minkevych CyberMiner Software Architecture Group Basic Goal : Web Search Engine that : Accepts list of keywords Returns list of URLs whose description contains any of the given keywords Uses KWIC Key Word In Context to maintain database of URL & description
Requirements Specification Functional : After input, the descriptor part of the line is circularly shifted by repeatedly removing the first word and appending it to the end of the line Outputs a list of all circular shifts of the descriptor parts of all lines in alphabetically ascending order, together with their corresponding URLs No noise words such as “a”, “the”, or “of” at the start of output list lines Grow indices with possible later additions
Requirements Specification Non-Functional : Easily Understood & Used – clear use capabilities, features, simplicity to design Portability/ Reuse – not restricted to certain operating systems, machines, or certain developers, anyone can use the system & understand its architecture to adapt it to their environment, few system limitations Traceability – object oriented style using abstract data types, each process is linked to a specific individual module Good Performance & Responsive – readily & easily reacts to changes, output to input ratio, time factor
Components & Connections : Indexing Repository contains the full HTML of every web page documents are stored one after the other and are prefixed by ID, length, and URL requires no other data structures to be used in order to access it (helps with data consistency and makes development easier) Index keeps information about each document, is a fixed width index, ordered by docID contains current document status, pointer into the repository, a document checksum, various statistics If the document has been crawled, also contains a pointer into a variable width file called docinfo which contains its URL and title Otherwise the pointer points into the URL list which contains just the URL
Line Storage • Create, access, and possibly delete character, words, and lines • listens for InputEvent using the interface LSListener • Store the lines • LineStorage generates eventcalled LSEvent
Line Storage • Procedure setchar (l-line, w-word, c-char, a) • Function char (l-line, w-word, c-char) • returns an character representing the c-th character in the w-th word of l-th line • return blank if out-of-range • Function word ( l-line) • returns the number of words in line l
Subprogram call System I/O Implicit invocation Master Control Output Input Line Storage Circular Shift Alphabetizing Searcher Control Input medium Output medium
CyberMiner Engine • Searches indexed keywords • Uses Boolean arguments • Case-sensitivity selector