130 likes | 230 Views
Google for storage within an enterprise _____________________. Ibrahim El-Dewak 7/24/2004 Pace University School of computer Science and information Systems. Background.
E N D
Google for storage within an enterprise_____________________ Ibrahim El-Dewak 7/24/2004 Pace University School of computer Science and information Systems
Background • Google and other search engines are available for the web and internet but there is nothing within an enterprise that is focused on storage that utilizes user context specific and how it relates to storage objects within the enterprise • The idea is that metadata (name, creation date, last accessed date, size, owner access rights) related to storage objects/files within an enterprise is gathered and merged with user profile data (name, job type, position, etc) so that views could be made available to Administrators and users based on their credentials and search criteria
Research Problem As more and more data is deposited into storage device, it is becoming increasingly difficult to locate/search for previously saved files. People tend to forget and misplace files within their own folder environment…. How do you look for a file on a Petabyte storage device? There is a need to reconsider how data should be organized, partitioned and stored.
Research Summary • The Research idea here is how can we automate to “google” all the files on a large storage device such as NAS (Network Attached Storage) or Symmetrix device. • Offer a simple search mechanism personalized to each user of the device. • Gather Metadata related to storage objects and merge it with user profile data. • Generate data views based on user credentials.
Continue Research Summary • Going further, the data could be indexed and views ranked based on file contents, file metadata, user profile and context from previous searches • As we work with Metadata a value may be assigned to file storage object based on file metadata and user profile • Create policies based on value assigned to Metadata
Continue Research Summary • Policy can be used to ensure that the files are stored most cost effectively • Policy can be enforced to meet data retention regulations and enterprise requirements
Research Results (How to do it) • Design an extensible tag search method and resorted data views based on user preferences . The research will revolve around data stream process and tagging • Develop a method for hashing a file to a unique key, validating this key against what is already in the system (global store) with the links to who owns the file
Research Results (How to do it) The real research here is to determine the appropriate method and keying to collect “hints” on file coming into the box and cross-matching these hints into a per-user search match store.
Relevance and significance of the Research • The idea of “google” for storage takes advantage of enterprise environment . • Where unlike the internet, user profile is available and employee’s job function is known and can be taken into account when listing, or searching. • Determine highest ROI (Return Of Investment) of sharing data storage • Provide most cost-effective options for storage • Deliver maximum value at the lowest TCO ( total cost of ownership)
Make storage most cost effective Example: A trader transaction logs might be kept on high speed storage such as a Symmetrix with RAID -1 for 30 days, then moved to a RAID-5 NAS device for 6 months and then to ATA disks (Cheap Store) for 3 years before being migrated to tape. Conversely, traders might not be allowed to store MP3s at all although a person in marketing working on advertisements might be allowed to store media files
Related Work “Metadata’s Role in a scientific Archive”:
References Metadata’s role in a Scientific Archive Judi Thomson, Dan Adams, Paula J. Cowley, Kevin walker Publication Date: December 2003, pp. 27-34 Visualising document Content with Metadata to Facilitate Goal-directedSearch Mischa Weiss-Lijn, Janet T. McDonnell, Leslie James University of college London, London Publication Date: July 2001 The Skinny on Metadata IEEE Intelligent Systems Giovanni Flammiam Publication Date: July 1999, pp. 20-22 A visual Representation of Search-engine queries and their results Ratvomder Singh Grewal, Mike Jackson, Peter Burdenm Jon Wallis Publication date: June 2000, pp. 0352 Tag Insertion complexity Yeates, I.H. Witten, D. Bainbridge Publication date: march 2001, pp. 0243
References A similarity search Method of Time series Data with Combination of Fourier and Wavelet Transforms Kyojj Kawagoe, Tomohiro Ueda Publication Date: july 2002 A meta-Search method Reinforced bu cluster Descriptors Yipeng, Shen, Dik Lun Lee Publication Data Cecember 2001, pp. 0125 An efficient Hash-based method for Discovering the Maximal Frequest Set Don-lin Yang, ching-ting Pan, Yeh-Ching Chung Publication Date: October 2001, pp. 511