4.02k likes | 4.15k Views
XML 入门与精通. What is XML?. XML is a markup language for documents containing structured information. Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays.
E N D
What is XML? • XML is a markup language for documents containing structured information. • Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays. • A markup language is a mechanism to identify structures in a document. The XML specification defines a standard way to add markup to documents.
What's a Document? • The number of applications currently being developed that are based on, or make use of, XML documents is truly amazing (particularly when you consider that XML is not yet a year old)! For our purposes, the word "document" refers not only to traditional documents, like this one, but also to the miriad of other XML "data formats". These include vector graphics, e-commerce transactions, mathematical equations, object meta-data, server APIs, and a thousand other kinds of structured information.
So XML is Just Like HTML? • No. In HTML, both the tag semantics and the tag set are fixed. • XML specifies neither semantics nor a tag set. In fact XML is really a meta-language for describing markup languages. In other words, XML provides a facility to define tags and the structural relationships between them. Since there's no predefined tag set, there can't be any preconceived semantics. All of the semantics of an XML document will either be defined by the applications that process them or by stylesheets.
So XML Is Just Like SGML? • No. Well, yes, sort of. XML is defined as an application profile of SGML. SGML is the Standard Generalized Markup Language defined by ISO 8879. SGML has been the standard, vendor-independent way to maintain repositories of structured documentation for more than a decade, but it is not well suited to serving documents over the web (for a number of technical reasons beyond the scope of this article). Defining XML as an application profile of SGML means that any fully conformant SGML system will be able to read XML documents. However, using and understanding XML documents does not require a system that is capable of understanding the full generality of SGML. XML is, roughly speaking, a restricted form of SGML.
Why XML? • In order to appreciate XML, it is important to understand why it was created. XML was created so that richly structured documents could be used over the web. The only viable alternatives, HTML and SGML, are not practical for this purpose. • HTML, as we've already discussed, comes bound with a set of semantics and does not provide arbitrary structure. • SGML provides arbitrary structure, but is too difficult to implement just for a web browser. Full SGML systems solve large, complex problems that justify their expense. Viewing structured documents sent over the web rarely carries such justification.
XML Development Goals • It shall be straightforward to use XML over the Internet. Users must be able to view XML documents as quickly and easily as HTML documents. In practice, this will only be possible when XML browsers are as robust and widely available as HTML browsers, but the principle remains. • XML shall support a wide variety of applications. XML should be beneficial to a wide variety of diverse applications: authoring, browsing, content analysis, etc. Although the initial focus is on serving structured documents over the web, it is not meant to narrowly define XML.
XML shall be compatible with SGML. Most of the people involved in the XML effort come from organizations that have a large, in some cases staggering, amount of material in SGML. XML was designed pragmatically, to be compatible with existing standards while solving the relatively new problem of sending richly structured documents over the web. • It shall be easy to write programs that process XML documents. The colloquial way of expressing this goal while the spec was being developed was that it ought to take about two weeks for a competent computer science graduate student to build a program that can process XML documents.
The number of optional features in XML is to be kept to an absolute minimum, ideally zero. Optional features inevitably raise compatibility problems when users want to share documents and sometimes lead to confusion and frustration. • XML documents should be human-legible and reasonably clear. If you don't have an XML browser and you've received a hunk of XML from somewhere, you ought to be able to look at it in your favorite text editor and actually figure out what the content means.
The XML design should be prepared quickly. Standards efforts are notoriously slow. XML was needed immediately and was developed as quickly as possible. • The design of XML shall be formal and concise. In many ways a corollary to rule 4, it essentially means that XML must be expressed in EBNF(Extended Backus-Naur Form) and must be amenable to modern compiler tools and techniques. There are a number of technical reasons why the SGML grammar cannot be expressed in EBNF. Writing a proper SGML parser requires handling a variety of rarely used and difficult to parse language features. XML does not.
XML documents shall be easy to create. Although there will eventually be sophisticated editors to create and edit XML content, they won't appear immediately. In the interim, it must be possible to create XML documents in other ways: directly in a text editor, with simple shell and Perl scripts, etc. • Terseness in XML markup is of minimal importance. Several SGML language features were designed to minimize the amount of typing required to manually key in SGML documents. These features are not supported in XML. From an abstract point of view, these documents are indistinguishable from their more fully specified forms, but supporting these features adds a considerable burden to the SGML parser (or the person writing it, anyway). In addition, most modern editors offer better facilities to define shortcuts when entering text.
How Is XML Defined? What is XML? • XML stands for EXtensible Markup Language • XML is a markup language much like HTML • XML was designed to describe data • XML tags are not predefined. You must define your own tags • XML uses a Document Type Definition (DTD) or an XML Schema to describe the data • XML with a DTD or XML Schema is designed to be self-descriptive • XML is a W3C Recommendation
The main difference between XML and HTML • XML was designed to describe data and to focus on what data is. • HTML was designed to display data and to focus on how data looks. • XML可以用来描述数据,重点是‘数据是什么’。 • HTML则是用来显示数据,重点是‘如何显示数据’。
XML was designed to carry data. • XML is not a replacement for HTML.XML and HTML were designed with different goals: • XML was designed to describe data and to focus on what data is.HTML was designed to display data and to focus on how data looks. • HTML is about displaying information, while XML is about describing information.
XML does not DO anything • XML was not designed to DO anything. • Maybe it is a little hard to understand, but XML does not DO anything. XML was created to structure, store and to send information. • The following example is a note to Tove from Jani, stored as XML:
<note> • <to>Tove</to> • <from>Jani</from> • <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> </note> • The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it.
XML is free and extensibleXML tags are not predefined. You must "invent" your own tags. • The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). • XML allows the author to define his own tags and his own document structure. • The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document.
XML is a complement to HTMLXML is not a replacement for HTML. • It is important to understand that XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. • My best description of XML is this: XML is a cross-platform, software and hardware independent tool for transmitting information.
What XML Is Not • First of all, XML is not a programming language. • Second, XML is not a network transport protocol. • Finally, to mention the example where the hype most often obscures the reality, XML is not a database.
XML offers the tantalizing possibility of truly cross-platform, long-term data formats.
XML in future Web developmentXML is going to be everywhere. • We have been participating in XML development since its creation. It has been amazing to see how quickly the XML standard has been developed and how quickly a large number of software vendors have adopted the standard. • We strongly believe that XML will be as important to the future of the Web as HTML has been to the foundation of the Web and that XML will be the most common tool for all data manipulation and data transmission.
How can XML be Used? • It is important to understand that XML was designed to store, carry, and exchange data. XML was not designed to display data.
XML can Separate Data from HTMLWith XML, your data is stored outside your HTML. • When HTML is used to display data, the data is stored inside your HTML. With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for data layout and display, and be sure that changes in the underlying data will not require any changes to your HTML. • XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on using HTML only for formatting and displaying the data.
XML is used to Exchange DataWith XML, data can be exchanged between incompatible systems. • In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. • Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications.
XML and B2BWith XML, financial information can be exchanged over the Internet. • Expect to see a lot about XML and B2B (Business To Business) in the near future. • XML is going to be the main language for exchanging financial information between businesses over the Internet. A lot of interesting B2B applications are under development.
XML can be used to Share DataWith XML, plain text files can be used to share data. • Since XML data is stored in plain text format, XML provides a software- and hardware-independent way of sharing data. • This makes it much easier to create data that different applications can work with. It also makes it easier to expand or upgrade a system to new operating systems, servers, applications, and new browsers.
XML can be used to Store DataWith XML, plain text files can be used to store data. • XML can also be used to store data in files or in databases. Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data.
XML can make your Data more UsefulWith XML, your data is available to more users. • Since XML is independent of hardware, software and application, you can make your data available to other than only standard HTML browsers. • Other clients and applications can access your XML files as data sources, like they are accessing databases. Your data can be made available to all kinds of "reading machines" (agents), and it is easier to make your data available for blind people, or people with other disabilities.
XML can be used to Create new LanguagesXML is the mother of WAP and WML. • The Wireless Markup Language (WML), used to markup Internet applications for handheld devices like mobile phones, is written in XML.
If Developers have SenseIf they DO have sense, all future applications will exchange their data in XML. • The future might give us word processors, spreadsheet applications and databases that can read each other's data in a pure text format, without any conversion utilities in between. • We can only pray that Microsoft and all the other software vendors will agree.
XML Syntax • The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use. • Because of this, creating software that can read and manipulate XML is very easy.
An example XML documentXML documents use a self-describing and simple syntax. • <?xml version="1.0" encoding="ISO-8859-1"?> • <note> • <to>Tove</to> • <from>Jani</from> <heading>Reminder</heading> • <body>Don't forget me this weekend!</body> • </note>
The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.
The next line describes the root element of the document (like it was saying: "this document is a note"): • <note> • The next 4 lines describe 4 child elements of the root (to, from, heading, and body): • <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element: • </note> • Can you detect from this example that the XML document contains a Note to Tove from Jani? Don't you agree that XML is pretty self-descriptive?
All XML elements must have a closing tagWith XML, it is illegal to omit the closing tag. • In HTML some elements do not have to have a closing tag. The following code is legal in HTML: • <p>This is a paragraph • <p>This is another paragraph
In XML all elements must have a closing tag, like this: • <p>This is a paragraph</p> • <p>This is another paragraph</p>
Note: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.
XML tags are case sensitiveUnlike HTML, XML tags are case sensitive. • With XML, the tag <Letter> is different from the tag <letter>. • Opening and closing tags must therefore be written with the same case: • <Message>This is incorrect</message> <message>This is correct</message>
All XML elements must be properly nestedImproper nesting of tags makes no sense to XML. • In HTML some elements can be improperly nested within each other like this: • <b><i>This text is bold and italic</b></i> • In XML all elements must be properly nested within each other like this: • <b><i>This text is bold and italic</i></b>
All XML documents must have a root elementAll XML documents must contain a single tag pair to define a root element. • All other elements must be within this root element. • All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element: • <root> <child> <subchild>.....</subchild> </child> </root>
Attribute values must always be quotedWith XML, it is illegal to omit quotation marks around attribute values. • XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:
<?xml version="1.0" encoding="ISO-8859-1"?> • <note date=12/11/2002> • <to>Tove</to> • <from>Jani</from> • </note>
<?xml version="1.0" encoding="ISO-8859-1"?> • <note date="12/11/2002"> • <to>Tove</to> • <from>Jani</from> • </note>
The error in the first document is that the date attribute in the note element is not quoted. This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.
With XML, white space is preservedWith XML, the white space in your document is not truncated. • This is unlike HTML. With HTML, a sentence like this: • Hello my name is Tove, • will be displayed like this: • Hello my name is Tove, • because HTML reduces multiple, consecutive white space characters to a single white space.
With XML, CR / LF is converted to LFWith XML, a new line is always stored as LF. • Do you know what a typewriter is? Well, a typewriter is a mechanical device which was used last century to produce printed documents. :-) • After you have typed one line of text on a typewriter, you have to manually return the printing carriage to the left margin position and manually feed the paper up one line. • In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). The character pair bears some resemblance to the typewriter actions of setting a new line. In Unix applications, a new line is normally stored as a LF character. Macintosh applications use only a CR character to store a new line.
Comments in XML • The syntax for writing comments in XML is similar to that of HTML. • <!-- This is a comment -->
There is nothing special about XML • There is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets. • Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially. • In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.
XML Elements • XML Elements are extensible and they have relationships. • XML Elements have simple naming rules.