1 / 35

InterLab 2002 12/5/02 Marsha Luevane National Renewable Energy Laboratory

Content Rules Again: Evolution of the NREL Search Engine And Search Engine Services. InterLab 2002 12/5/02 Marsha Luevane National Renewable Energy Laboratory. What this presentation covers. Introduction Twelve-step evolution of the NREL search engine and search engine services

ingo
Download Presentation

InterLab 2002 12/5/02 Marsha Luevane National Renewable Energy Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Content Rules Again: Evolution of the NREL Search Engine And Search Engine Services InterLab 2002 12/5/02 Marsha Luevane National Renewable Energy Laboratory

  2. What this presentation covers • Introduction • Twelve-step evolution of the NREL search engine and search engine services • What’s next

  3. 1: Commitment • NREL used the Harvest search engine until 1997 • Harvest had a lot of problems • We needed a better search engine • We were ready to commit the time and money to research, evaluate, and implement a good search engine • We used Excite in the interim

  4. 2: Research and evaluation • NREL developed a list of criteria • We researched which search engines were available that met the criteria • We evaluated Verity and Infoseek • We selected and implemented Infoseek Ultraseek • For details, see our InterLab ’98 presentation “After the Harvest, Getting Excited About Infoseek”

  5. 3: Collection development • We needed Ultraseek to collect and index content from three mothership sites • SOURCE intranet - thesource.nrel.gov • NREL - www.nrel.gov • EREN - www.eren.doe.gov • SOURCE and NREL were easy because most of the content lives on two servers

  6. 3: Collection development (cont’d) • Energy Efficiency and Renewable Energy Network (EREN) was a different matter • EREN is the official site for DOE Office of Energy Efficiency and Renewable Energy • Integrates information from Web sites at NREL, other labs, and DOE • It is also a portal for information on energy efficiency and renewable energy technologies • Includes information from government sites, state energy offices, universities, trade associations, research organizations, etc. • EREN includes content from >600 Web sites

  7. 3: Collection development (cont’d) • EREN was a different matter (cont’d) • Portal complicates things • Portal indexes content from hundreds of sites, most of which are produced outside of NREL • There are lots of content challenges and surprises • We have filters on some non-DOE content to keep costs down

  8. 4: Optimizing content for Ultraseek • Ultraseek collected and indexed 75K documents • For the first time, we got a good look at our content • We needed more descriptive titles and summaries for search results • We made “meaningful and unique” titles standard on SOURCE, NREL, EREN • We developed optimizing guidelines

  9. 4: Optimizing content for Ultraseek (cont’d) • Basic optimizing guidelines • Focus on your content • Determine key terms that describe the “aboutness” of the document • Think about terms that people use in searches

  10. 4: Optimizing content for Ultraseek (cont’d) • Basic optimizing guidelines (cont’d) • Position key terms in headers, beginning text, and throughout your content • Position key terms in titles and make sure titles describe the content • Meta tag your home pages • Meta tags for other important pages are optional

  11. 5: Optimizing content for Web-wide search engines • We also wanted our pages to rank high and display well in Web-wide search engines • Techniques we used to optimize pages for Ultraseek work well in Web-wide search engines • We were getting lots of traffic from Web-wide search engines, and started monitoring and reporting our page rankings • We do searches on terms related to key content • We report to site managers how their pages fare in searches

  12. 6: Content classification • By 2000, EREN portal was so large – over 80K documents – that we needed a better way to get users into content • We implemented the Content Classification Engine (CCE), an add-on to Ultraseek that helps organize content into browsable topics • We spent a year developing 850 topics for eleven energy efficiency and renewable energy technologies

  13. 6: Content classification (cont’d) • Content classification teams and tasks • Content team – NREL science writers • Researched technologies; determined topics; wrote text and technology scope notes; coordinated NREL and DOE topic reviews • CCE team – me • Consulted on topics; developed topic structure; created, tested, and edited topic rules (>3K searches); reviewed topic results

  14. 6: Content classification (cont’d) • Topic considerations • We developed topics for several audiences, including energy professionals, homeowners, and students • We listed topics on technology pages, not on a search page • Energy professionals know the terminology but homeowners, students, and other users don’t • We want technology pages to educate users as well as guide them to content

  15. 6: Content classification (cont’d) • Benefits • EREN content is organized, so users don’t have to know the terminology • Users learn about energy technologies from topics and scope notes • Topics get users into our content • Topics bring lots of traffic to our sites • Topics give site managers a good look at their content

  16. 6: Content classification (cont’d) • Benefits (cont’d) • Site managers can use statistics on topic usage for content development • Webmasters find topics useful for responding to user inquiries • EREN Webmaster inquiries have decreased because users need less help finding information

  17. 6: Content classification (cont’d) • Benefits (cont’d) • Need for optimization is reinforced • Optimized pages rank higher and display better in topic search results • Site managers who have optimized their pages see the rewards • Site managers who have not optimized see how their pages fare in searches and understand better why they need to optimize

  18. 7: Optimizing audits • By 2001, Ultraseek (now called Inktomi) was indexing >100K documents for SOURCE, NREL, and EREN • Web wide-search engines were indexing our content and billions of other documents • Content optimization was more important than ever before • To help site managers know where to concentrate their optimizing efforts, we developed optimizing audits

  19. 7: Optimizing audits (cont’d) • Basic optimizing audits • How many documents are on your site • What are your document formats • What content should you focus on • How do your pages fare out of context • Do your pages have descriptive titles • Does beginning text on pages tell users what the page is about

  20. 8: Optimizing PDFs and native documents • We learned from optimizing audits that we had lots of important content in PDFs and Word, Excel,and PowerPoint documents • Our search engine indexed these formats but most results were not pretty • We researched how Inktomi indexes and displays these formats, then developed optimizing standards and guidelines

  21. 8: Optimizing PDFs and native documents (cont’d) • PDFs • Titles • Serve as captions for search results • Should contain key terms and describe content • Boost ranking if they contain search terms • Are required per SOURCE, NREL, and EREN standards • Subjects • Display as result summaries in Inktomi • Must be at least 71 characters or won’t display • Should contain key terms and describe content • Boost ranking in Inktomi if they contain search terms • Are required per standards • Authors and keywords are optional

  22. 8: Optimizing PDFs and native documents (cont’d) • Word, Excel, and PowerPoint documents • Titles • Serve as captions for search results • Should contain key terms and describe content • Boost ranking if they contain search terms • Are required per SOURCE standards and recommended practices for NREL and EREN • Subjects • Display as result summaries in Inktomi • Should contain key terms and describe content • Boost ranking in Inktomi if they contain search terms • Are required per SOURCE standards and recommended practices for NREL and EREN • Authors and keywords are optional

  23. 9: Dynamic URL conversion • We created a lot of dynamic Web pages using ColdFusion • Our search engine can index dynamic URLs • Web-wide search engines either don’t index dynamic URLs, want sites to individually submit dynamic URLs, or limit the number of dynamic URLs they index • Best solution is to convert URLs to a format that all search engines can index • Before: www.oit.doe.gov/cfm/fullarticle.cfm?id=355 • After: www.oit.doe.gov/cfm/fullarticle.cfm/id=355

  24. 10: Optimizing classes • We had a great deal of optimizing information to share • We went from doing lots of one-on-one consulting to offering formal, regular classes • We offer entire classes on optimizing content, titles, and PDFs • We cover the basics of optimizing Word, Excel, and PowerPoint documents in the content and title classes

  25. 11: Search log and statistics analysis • Search logs and site statistics provide great information about user interests, and we started offering analysis services • Search log analysis • Inktomi automatically creates query logs • We developed a basic tool to manipulate query information • We tell site managers how people are using their search features • How people search • What terms people use in searches • What topics people search for

  26. 11: Search log and statistics analysis (cont’d) • Site statistics analysis • WebTrends provides information on popular documents, popular paths, search engine referrals, search terms, etc. • We tell site managers • What topics are covered in their popular documents • What paths people follow to their popular documents • What terms people use in Web-wide search engines • What topics people search for in Web-wide search engines • Which Web-wide search engines send the most traffic to their sites

  27. 11: Search log and statistics analysis (cont’d) • We correlate the analyses and tell site managers • What topics are popular in site searches and Web-wide searches • Which topics are popular in site searches but not Web-wide searches (and vice versa) • What terms people use in searches • What relationships exist between popular topics and popular documents • We also do searches on popular topics to see what results (documents) people get

  28. 11: Search log and statistics analysis (cont’d) • Some of the things we have learned • Most people do basic searches using 1-3 terms • Searchers rarely use advanced techniques • On some sites, users search for general info more frequently than technical info • Documents that rank high in popular searches are also popular documents in statistics • Google sends a lot of traffic to our sites (from pages that rank high in search results) • On SOURCE and NREL, people use Web search to look for employee information, publications, photos, etc.

  29. 11: Search log and statistics analysis (cont’d) • How we have applied information from analyses • We use information on search terms and popular topics for content development, optimization, home page redesigns, etc. • We list popular topics on search pages • On SOURCE and NREL search pages, we make it clear where to search Web sites, employee information, publications, photos, etc.

  30. 12: De-optimizing content • After studying search logs and statistics, we have de-optimized some content • For example, old NREL fact sheets on renewable energy that ranked high in searches and were popular documents • We removed the fact sheets • We targeted other content, i.e., we optimized other content that we want to rank high in searches and bring people to the NREL site

  31. Summary of the twelve steps

  32. Summary of the twelve steps (cont’d)

  33. Summary of the twelve steps (cont’d)

  34. What’s next • Evolution of NREL search engine and search services will continue • New Inktomi features and products will drive new services • Search and optimization for an enterprise portal? • Search and optimization integrated with a content management system? • Search personalization – search my email or hard drive?

More Related