280 likes | 372 Views
HTML. Darby Tien-Hao Chang Department of Electrical Engineering National Cheng Kung University. HTML introduction. HTML stands for H yper T ext M arkup L anguage An HTML file is a text file containing small markup tags The markup tags tell the Web browser how to display the page
E N D
HTML Darby Tien-Hao Chang Department of Electrical EngineeringNational Cheng Kung University
HTML introduction • HTML stands for Hyper Text Markup Language • An HTML file is a text file containing small markup tags • The markup tags tell the Web browser how to display the page • An HTML file must have an htm or html file extension • An HTML file can be created using a simple text editor
Sample HTML • <html> • <head> • <title>Title of page</title> • </head> • <body> • This is my first homepage. <b>This text is bold</b> • </body> • </html>
HTML elements • HTML tags are used to mark-up HTML elements • HTML tags are surrounded by the two characters < and > • The surrounding characters are called angle brackets • HTML tags normally come in pairs like <b> and </b> • The first tag in a pair is the start tag, the second tag is the end tag • The text between the start and end tags is the element content • HTML tags are not case sensitive, <b> means the same as <B>
Sample HTML • <b>This text is bold</b> • Start tagcontentend tag • <body> • This is my first homepage. <b>This text is bold</b> • </body> • <body bgcolor="red"> • Tag attribute
Basic HTML tags • <html>Defines an HTML document • <body>Defines the document's body • <h1> to <h6>Defines header 1 to header 6 • <p>Defines a paragraph • <br />Inserts a single line break • <hr />Defines a horizontal rule • <!-->Defines a comment
Sample HTML • <html> • <body> • <h1>This is heading 1</h1> • <h2>This is heading 2</h2> • <h3>This is heading 3</h3> • <h4>This is heading 4</h4> • <h5>This is heading 5</h5> • <h6>This is heading 6</h6> • </body> • </html>
Sample HTML • <html> • <body> • <p> • This paragraph • contains a lot of lines • in the source code, • but the browser • ignores it. • </p> • <p> • To break<br>lines<br>in a<br>paragraph,<br>use the br tag. • </p> • </body> • </html>
Sample HTML • <html> • <body> • <h1 align="center">This is heading 1</h1> • <hr /> • <h2 color=“red">This is heading 2</h2> • <!--This comment will not be displayed--> • </body> • </html>
<b>Defines bold text <big>Defines big text <em>Defines emphasized text <i>Defines italic text <small>Defines small text <strong>Defines strong text <sub>Defines subscripted text <sup>Defines superscripted text <ins>Defines inserted text <del>Defines deleted text <code>Defines computer code text <kbd>Defines keyboard text <samp>Defines sample computer code <tt>Defines teletype text <var>Defines a variable <pre>Defines preformatted text <abbr>Defines an abbreviation <acronym>Defines an acronym <address>Defines an address element <bdo>Defines the text direction <blockquote>Defines a long quotation <q>Defines a short quotation <cite>Defines a citation <dfn>Defines a definition term More HTML tags
Haha • s/<[^>]*>//g
Powerful regular expression • s/<[^>]*>//g • s substitute • < left angle bracket • [^>] any character except right angle bracket • [^>]* all characters formed the tag (attributes) • > right angle bracket • g replace globally, i.e. all occurrences
Is semantic important? • Yes, sometimes • To extract the heading of a news article • http://news.yam.com/ettoday/society/200608/20060816189987.html • <h2><span class="red1">發票案/李慧芬週五前返澳 近日將與李碧君對質</span></h2> • /^<h2><span class=“red1”>(.*)<\/span><\/h2>\n$/ • print $1, “\n”;
How to display a less than sign (<) in browser? • Character Entities • A character entity has three parts: an ampersand (&), an entity name or a # and an entity number, and finally a semicolon (;). • To display a less than sign in an HTML document we must write: < or <
HTML links • <html> • <body> • <p> • <a href="lastpage.htm"> • This text</a> is a link to a page on • this Web site. • </p> • <p> • <a href="http://www.microsoft.com/"> • This text</a> is a link to a page on • the World Wide Web. • </p> • </body> • </html>
<html> <frameset cols="25%,50%,25%"> <frame src="frame_a.htm"> <frame src="frame_b.htm"> <frame src="frame_c.htm"> </frameset> </html> <html> <frameset rows="25%,50%,25%"> <frame src="frame_a.htm"> <frame src="frame_b.htm"> <frame src="frame_c.htm"> </frameset> </html> HTML frames
HTML frames • <html> • <frameset rows="50%,50%"> • <frame src="frame_a.htm"> • <frameset cols="25%,75%"> • <frame src="frame_b.htm"> • <frame src="frame_c.htm"> • </frameset> • </frameset> • </html>
HTML tables • <table border="1"> • <tr> • <td>row 1, cell 1</td> • <td>row 1, cell 2</td> • </tr> • <tr> • <td>row 2, cell 1</td> • <td>row 2, cell 2</td> • </tr> • </table>
<html> <body> <h4>Cell that spans two columns:</h4> <table border="1"> <tr> <th>Name</th> <th colspan="2">Telephone</th> </tr> <tr> <td>Bill Gates</td> <td>555 77 854</td> <td>555 77 855</td> </tr> </table> <!-- continued --> <h4>Cell that spans two rows:</h4> <table border="1"> <tr> <th>First Name:</th> <td>Bill Gates</td> </tr> <tr> <th rowspan="2">Telephone:</th> <td>555 77 854</td> </tr> <tr> <td>555 77 855</td> </tr> </table> </body> </html> HTML tables
<html> <body> <h4>An Unordered List:</h4> <ul> <li>Coffee</li> <li>Tea</li> </ul> <h4>An Ordered List:</h4> <ol> <li>Coffee</li> <!-- continued --> <li>Tea</li> </ol> <h4>A Definition List:</h4> <dl> <dt>Coffee</dt> <dd>Black hot drink</dd> <dt>Milk</dt> <dd>White cold drink</dd> </dl> </body> </html> HTML lists
HTML forms • <form> • <input> • <input> • </form> • description: <input type="text" name="name" /> • <input type="radio" name= " name" value="value" />description • <input type="checkbox" name="name" />description • <select name="name"> • <option value="value 1">description 1 • <option value="value 2"> description 2 • </select> • <textarea rows="10" cols="30"> • default text • </textarea>
Form’s action attribute and submit button • <form name="input" action="html_form_action.asp" method="get"> • Username: <input type="text" name="user" /> • <input type="submit" value="Submit" /> • </form>
Methods GET and POST in HTML forms - what's the difference? • http://www.cs.tut.fi/~jkorpela/forms/methods.html • The difference between GET and POST is primarily defined in terms of form data encoding so that former means that form data is to be encoded (by a browser) into a URL while the latter means that the form data is to appear within a message body • If the processing of a form is idempotent (i.e. it has no lasting observable effect on the state of the world), then the form method should be GET • If the service associated with the processing of a form has side effects (for example, modification of a database or subscription to a service), the method should be POST
Exercise • Resolution, number of units, EC no. and so on with a given PDB ID • http://www.pdb.org/ • Today’s headings • Comics • http://jojo.jojohot.com/ • use LWP::Simple; • $web = &get( $url );
Exercise hints • $web =~ /Title\s*<.td>\s*[^>]*>\s*([^\n]+)/
Javascript – a case study • http://proteminer.csie.ntu.edu.tw/
A review of dirtycomi • http://dm.www.wangyou.com/ • Encoding (Big5, GB2312, UTF-8) • Retrieve HTML code with GET method • Traverse multiple pages • Trace Javascript code and re-implement it in Perl • Completely pretend itself as a human + browser