Prepared By: Dr. Suresh Jain smj2005123@rediffmail

Prepared By: Dr. Suresh Jain smj2005123@rediffmail.com M M COLLEGE OF PHARMACY, MM UNIVERSITY, MULLANA-AMBALA (HARYANA)

Information retrieval is a scientific and managerial approach to obtain relevant informations from a variety of information resources. There are many ways to get information. The most common research methods are: literature searches, talking with people, focus groups, personal interviews, telephone surveys, mail surveys, email surveys, and internet surveys.

A literature search involves reviewing all readily available materials like newspapers, magazines, annual reports, company literature, on-line data bases, internal company information, relevant trade publications, and any other published materials. Although It is a very inexpensive method of gathering information but it often does not yield timely information. Literature searches using web are the fastest mode of literature survey.

INFORMATION RETRIVAL MODEL

In the mid 90s, visiting libraries – as means of retrieving the latest literature – was still a common necessity among professionals. Nowadays, professionals simply access information by ‘googling’. Indeed, the name of the Web search engine market leader “Google” became a synonym for searching and retrieving information.

INFORMATION RETRIVAL SYSTEM

As far as Pharmaceuticals are concerned ,when data meets the US Food and Drug Administration’s definition of an electronic record, the agency requires the record to be readily retrievable and available for review. Regulatory requirements aside, if an organization is unable to quickly retrieve relevant data, business crises may ensue.

Software vendors serving the pharmaceutical industry are now infusing IR technology into their software applications. The feature set is commonly referred to as full text search, and most use familiar paradigms such as the search box and advanced search forms that have become ubiquitous on the World Wide Web. Specific implementations and their capabilities vary widely, however.

Some allow only for simplistic keyword searches within the structured data maintained by the particular software application, while others allow sophisticated concept searches using natural language syntax to be performed against structured and unstructured data that may exist anywhere within the enterprise. These wide variances in information retrieval capabilities among different software applications lead to disparities in the ease of use and overall utility of the software.

An IR system consists of several interconnected modules which enable two basic processes: Building an index and querying the index. The index, an integral part of the IR system, contains the searchable “features” and enables fast query answering. • Once the index is constructed, the IR system can be queried by its users. Analogous to the text processing step, the query is first analyzed by a text processing engine in exactly the same way, before it is transformed into an internal representation. The processed query is then transmitted to the searcher. • In the final step, the retrieved documents are passed to the ranking module. This module plays a crucial role as its task is to order the results by relevance. The documents matching best the user’s information need to be located at the top of the result list.

Traditional retrieval models: • Formally, an information retrieval model is a quadruple (D, Q, F, R], where……….. • D is a representation of documents, • Q is a representation of queries (user information needs), • F is a modeling framework for D, Q, and the relationships among them, • R is a ranking function which defines an ordering among the documents for the given query.

The top 10 attributes of a good information retrieval solution: • Global reach via open, standards-based extensibility to information inside and outside the enterprise. • Aggregates and sorts results from many sources • Natural language concept and synonym searching via thesauri, related words collections, or both. • Regular expression searching • Fuzzy searching to “see through” typographical or spelling errors • Accommodates fielded data constraints within searches. • Stemming searches with international language support. • Proximity searches with proximity precisely definable. • Variable term weighting • Phonic (homonym) searching.

Web search engines: • In the Web two types of search engines can be found: General, all purpose search engines (e.g. Google or Yahoo!) covering a broad spectrum of content and specialized search engines (such as Scirus, Bionity, Entrez or ExPASy) which restrict to a small domain of interest. While the general search engines are known by the majority of users, the specialized search tools are only known by domain experts.

Search in an intranet environment: • Huge amounts of data are not exposed to the Internet but are kept behind an organization’s firewall, accessible only by its members. Data within such a private network (intranet) can encode business processes, administrative tasks, domain expertise, etc. and is often of special value to the organization.

Differences between intranet search and Internet search: • An intranet environment is different in many aspects from the Internet. Even though both are using the TCP/IP protocol for communications, the structure in which data is stored differs a lot. • The Internet search has the World Wide Web as the main structure for storing and interlinking information. • Intranet search is defined as search over all electronic text content of an organization, including search of the organization’s external websites, search of internal websites, search of other electronic text held by the organization in the form of e‐mail, database records, documents on file shares and the like.

Virtually all IR implementations allow users to search for literal words or phrases, and to look for any or all of the words and phrases in the corresponding data source. These rudimentary capabilities are similar to traditional SQL-based query techniques for structured data in that one or more literal values may be sought, in one or more locations, via Boolean logic expressions.[13] • A less common and more powerful technique is the concept search. Instead of searching for literal data values, concept searches look for terms with similar or related meanings. IR systems use thesauri or related word collections to process concept searches.

Extending the reach: • Deployments of enterprise-class software solutions for enterprise resource planning (ERP), customer relationship management (CRM), laboratory information management systems (LIMS), quality management systems (QMS), etc., typically seek to eliminate silos of information through system integration. At the same time, software vendors are beginning • to introduce powerful IR features (full text search) into their enterprise-class software products. Since most software vendors limit the reach of their IR features to the data maintained by their own applications, new types of silos are starting to emerge: silos of searchable data (maintained by new applications) and silos of opaque, non-searchable data (maintained by legacy applications). Surely there has to be a better way.

Conclusion: • Recent advances in information retrieval technology can help pharmaceutical manufacturers stem the tides of their ever increasing seas of data. Decision making, problem solving, and regulatory compliance are all optimized when relevant information can be retrieved faster. As software vendors continue to infuse IR technology into their products, it is imperative that information technology professionals in the pharmaceutical industry understand the capabilities and limitations of various techniques if they are to fulfill their mission of providing strategic technology guidance to their organizations. Product and technology selections should be made with an eye on the future. Optimal search and retrieval solutions extend to data in any location, whether inside or outside the enterprise.

References: • AGOSTI, M. AND SMEATON, A. 1996. Information Retrieval and Hypertext. Kluwer Academic Publishers, Hingham, MA. • AGRAWAL, R., GEHRKE, J., GUNOPULOS, D., AND RAGHAVAN, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Seattle, WA, June). ACM Press, New York, NY, 94–105. • AHLBERG, C. AND SHNEIDERMAN, B. 1994. Visual information seeking: Tight coupling of dynamic query filters with starfield displays. In Proceedings of the ACM Conference on Human Factors in Computing Systems: Celebrating Interdependence (CHI ’94, Boston, MA, Apr. 24–28). ACM Press, New York, NY, 313–317. • AI MAG. 1997. Special issue on intelligent systems on the internet. AI Mag. 18, 4. • ANDERBERG, M. R. 1973. Cluster Analysis for Applications. Academic Press, Inc., New York, NY. • BALABANOVIC, M. AND SHOHAM, Y. 1995. Learning information retrieval agents: Experiments with automated web browsing. In Proceedings of the 1995 AAAI Spring Symposium on Information Gathering from Heterogenous Distributed Environments (Stanford, CA, Mar.). AAAI Press, Menlo Park, CA. • BALABANOVIC, M., SHOHAM, Y., AND YUN, T. 1995. An adaptive agent for automated web browsing. Stanford Univ. Digital Libraries Project, working paper 1995-0023. Stanford University, Stanford, CA.

8. Fox, C., 1992, 'Lexical analysis and stoplists', in Information retrieval: data structures and algorithms, W. B. Frakes & R. Baeza‐Yates, eds., Prentice‐Hall, Inc., Upper SaddleRiver, NJ, USA, pp. 102‐130. 9. Fox, E., Betrabet, S., Koushik, M., & Lee, W., 1992, 'Extended boolean models', in Information retrieval: data structures and algorithms, W. B. Frakes & R. Baeza‐Yates,eds., Prentice‐Hall, Inc., Upper Saddle River, NJ, USA, pp. 393‐418. 10. Frakes, W. B., 1992, 'Stemming algorithms', in Information retrieval: data structures and algorithms, W. B. Frakes & R. Baeza‐Yates, eds., Prentice‐Hall, Inc., Upper SaddleRiver, NJ, USA, pp. 131‐160. 11. Hawking, D., 2004, 'Challenges in enterprise search', in Proceedings of the 15th Australasian database conference ‐ Volume 27, Dunedin, New Zealand, AustralianComputer Society, Inc., Darlinghurst, Australia, pp. 15‐24. 12. Sugiyama, K., Hatano, K., & Yoshikawa, M., 2004, 'Adaptive web search based onuser profile constructed without any effort from users', in Proceedings of the 13thinternational conference on World Wide Web, New York, NY, USA, ACM, New York, NY, USA, pp. 675‐684. 13. Speretta, M. & Gauch, S., 2005, 'Personalized search based on user search histories', in Proceedings of the 2005 IEEE/WIC/ACM International Conference on WebIntelligence, France, IEEE Computer Society, Washington, DC, USA, pp. 622‐628.

THANK YOU

Prepared By: Dr. Suresh Jain smj2005123@rediffmail