300 likes | 470 Views
22. Screen Scraping Application Introducing String Processing. Outline. 22.1 Test-Driving the Screen Scraping Application 22.2 Fundamentals of String s 22.3 Analyzing the Screen Scraping Application 22.4 Locating Substrings in String s 22.5 Extracting Substrings from String s
E N D
22 • Screen ScrapingApplication • Introducing String Processing
Outline • 22.1Test-Driving the ScreenScraping Application • 22.2Fundamentalsof Strings • 22.3Analyzing the ScreenScraping Application • 22.4Locating Substrings in Strings • 22.5Extracting Substrings from Strings • 22.6Replacing Substrings in Strings • 22.7Other String Methods
In this tutorial you will learn: Manipulate String objects. Use properties and methods of class String. Search for substrings within Strings. Extract substrings within Strings. Replace substrings within Strings. Objectives
Introduction • HTML (HyperText Markup Language) is a technology for describing web pages. • Extracting desired information from HTML is called screenscraping.
22.1 Test-Driving the ScreenScraping Application • An online European auction house wants to expand its business to include bidders from the United States. However, all of the auction house’s web pages currently display their prices in euros, not dollars. The auction house wants to generate separate web pages for American bidders that display the prices of auction items in dollars. These new web pages will be generated by using screen-scraping techniques onthe already existing web pages.
22.1 Test-Driving the Screen ScrapingApplication (Cont.) • You have been asked to build a prototype application that tests the screen-scraping functionality. The application must search a sample string of HTML and extract information about the price of a specified auction item. For testing purposes, a ComboBox should be provided that contains auction items listed in the HTML. The selected item’s amount must then be converted to dollars. Assume the exchange rate is one euro to 1.58 dollars (that is, one euro is equivalent to $1.58).
Test-Driving the ScreenScraping Application • Run the completed application (Fig. 22.1). Label containing HTML Figure 22.1|Screen Scraping application’s Form.
Test-Driving the ScreenScrapingApplication (Cont.) • Select an item name from the ComboBox, as shown in Figure 22.2. ComboBox’s drop-down list Figure 22.2|Selecting an item name from the ComboBox.
Test-Driving the ScreenScrapingApplication (Cont.) • Click the SearchButton to display the price for the selected item (Fig. 22.3). Extracted price (converted to dollars) Price located in HTML string (specified in Euros) Figure 22.3|Searching for the item’s price.
22.2 Fundamentals of Strings • A string is a series of characters treated as a single unit. "This is a string!" • These characters can be uppercase letters, lowercase letters, digits and various specialcharacters, such as +, -, *, /, $ and others. • String property Length returns the length of the String.
22.2 Fundamentals of Strings (Cont.) • String property Chars returns the character located at a specific index in a String: string1.Chars(0) • Any String method or operator that appears to modify a String actually returns a new String that contains the results. • Strings are immutable objects—that is, characters in Strings cannot be changed after the Strings are created.
22.2 Fundamentals of Strings (Cont.) • Figure 22.4 lists several Stringmethods. Figure 22.4|String methods introduced in earlier tutorials.
22.3 Analyzing the ScreenScraping Application When the Form loads: Display the HTML that contains the items’ prices in a Label When the user clicks the Search Button: Search the HTML for the item the user selected from the ComboBox Extract the item’s price Convert the item’s price from euros to dollars Display the item’s price in a Label
Action/Control/Event (ACE) Table forthe ScreenScraping Application • Use an ACE table to convert pseudocode intoVisual Basic (Fig. 22.5). Figure 22.5| ACE table for ScreenScraping application.
Locating the Selected Item’s Price • Double click the SearchButton on the template application’s Form to generate an event handler (Fig. 22.6). Figure 22.6|searchButton_Click event handler.
Locating the Selected Item’s Price (Cont.) • Add lines 17–21 of Figure 22.7 to the searchButton_Click event handler. Figure 22.7|searchButton_Click event-handler declarations.
Locating the Selected Item’s Price (Cont.) • String method IndexOf (Fig. 22.8) locates the first occurrence of the specified item in the HTML string. • If IndexOf finds the specified item name, the index at which the substring begins in the String is returned. • If IndexOf does not find the substring, it returns –1. Search for the SelectedItem in the Stringhtml Figure 22.8|Locating the desired item name.
Locating the Selected Item’s Price (Cont.) • This version of method IndexOf (Fig. 22.9) takes two arguments—the substring to find and the index in the String to begin searching. • In this case, the substring to find (indicating the beginning of the price) is "€“. Locate the beginning ofthe price in html Figure 22.9|Locating the desired item price.
Locating the Selected Item’s Price (Cont.) • A </TD> tag directly follows every price in the HTML string, so the index of the first </TD> tag after priceBegin marks the end of the current price (Fig. 22.10). Locate the end of theprice inhtml Figure 22.10|Locating the end of the item’s price.
22.4 Locating Substrings in Strings • The LastIndexOf locates the last occurrence of a substring in a String. • If method LastIndexOf finds the substring, it returns the starting index of the specified substring in the String; otherwise, LastIndexOf returns –1. • Figure 22.11 shows examples of the three versions. Figure 22.11|LastIndexOf examples.
Retrieving the Desired Item’s Price • The first argument (priceBegin) specifies the starting index. • The second argument (priceEnd-priceBegin) specifies the length of the substring to be copied (Fig. 22.12). Extract price fromhtml Figure 22.12|Retrieving the desired price.
Converting the Price to Dollars • String method Replace (Fig. 22.13) is used to return a new String object in which every occurrence of substring "€" is replaced with the empty String. • String method Format displays the price in resultLabel as currency. Replace "€" with "" and convert the amount to dollars Figure 22.13|Converting the price to dollars.
Displaying the HTML String • Double click the Form to generate an empty Load event handler (Fig. 22.14). Figure 22.14|Load event for the Form.
Displaying the HTML String (Cont.) • String method Replace (Fig. 22.15) replaces every occurrence of "€" with "&€". • For the text to display in a Label correctly, you must prefix it with an additional ampersand. Replace all occurrences of"&euro" with "&&euro" Figure 22.15|Displaying the HTML string in a Label.
22.7 Other String Methods • Figure 22.16 lists some of the methods for manipulating Strings. Figure 22.16| Description of some other String methods. (Part 1 of 2.)
22.7 Other String Methods (Cont.) Figure 22.16| Description of some other String methods. (Part 2 of 2.)
Outline • Figure 22.17 presents the source code forthe Screen Scraping application. (1 of 4 )
Outline (2 of 4 ) Search for theSelectedItem in theStringhtml
Outline (3 of 4 ) Locate the beginning ofthe price inhtml Locate the end of theprice inhtml Extract the price fromhtml Replace"€" with the empty String
Outline (4 of 4 ) Replace "€" with "&&euro"