340 likes | 498 Views
Understanding How XML Works. Ellen Pearlman Eileen Mullin Programming the Web Using XML. Learning Objectives. Understanding the overall structure of an XML document Familiarity with basic EBNF Characters Knowing the difference between well formed and valid XML Making a simple XML file
E N D
Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the Web Using XML
Learning Objectives • Understanding the overall structure of an XML document • Familiarity with basic EBNF Characters • Knowing the difference between well formed and valid XML • Making a simple XML file • Looking at a simple DTD • Looking at elements, attributes, entities, and comments • Deciding between elements and attributes
Steps to a Basic XML Document • Write the statement in an editor such as Notepad for PC Users or for Mac users, BBEdit. Save the document with a “.xml” extension. • Have the .xml document read by a XML compliant browser. • Display the .xml document to the user.
Basic Markup: EBNF Characters • EBNF stands for Extended Backus-Naur Form Notation and is a syntaxic metalanguage that defines all the components of an XML document. < is the beginning of a tag > is the end of a tag / shows that it is an end tag <?xxx?> starts and ends a processing statement (PI) ! alerts that a reserved word keyword will follow [ ] Shows the start and end of a range.
Basic Markup: EBNF Characters (2) [ shows the beginning of an internal DTD ]> Shows the end of an internal DTD <![xxx]]> Shows the start and end of a CDATA section | is a choice and allows any order of entry , shows a specific order of entry # used in front of PCDATA and to determine default attributes & starts a parsed general entity ; ends an entity ? Shows a content piece can happen up to once + Shows a content piece must happen once or more * Shows a content piece can happen for an unlimited number of times
Well-Formed XML Documents • An XML document is considered well-formed in its syntax when: • Each opening tag has a corresponding closing tag and tags must nest correctly • All cases match. • There must be at least one element • There can only be one root element and all other elements must be contained within it • Elements and their associated tags must nest correctly
Rules for Element Names • Element names must follow some simple rules: • They start with either a letter or an underscore • They are allowed to have letters, digits, periods (.), underscores (_) or hyphens (-) • No whitespace • They cannot begin with anything that says "xml" either with or without quotes and must adhere to all XML naming conventions
Valid XML Documents • A valid XML document must, in addition to being well-formed, conform to rules defined in its DTD (Document Type Definition) or XML Schema. • A DTD ensures that an XML document meets these standards by defining the grammar and vocabulary of the markup language. • The DTD lists everything that a parser needs to know in order to display and process a XML document, as does the XML Schema.
Tagging a XML Document • Every XML document must begin with a processing instruction (PI): <?xml version="1.0?> • Without this PI, the software programs and processors are not aware that they are dealing with an XML document. • It is called the prologue and contains the version number but can also have other information such what character set it is using.
First XML Example <?xml version="1.0"?> <!DOCTYPE veryfirst [ <!ELEMENT veryfirst (#PCDATA)> ]> <veryfirst> This is my very first XML document </veryfirst>
Character References • Character references are used to insert special characters into an XML document. • Character references come from the ISO/IEC character set, an international numbering system that references characters from all languages. • They are used in case you want to actually insert a specific character into your code, and that character is reserved for use by the processor.
Understanding The Tree Structure Of A Document • The descriptions are more or less on the same hierarchical level, in terms of a tree structure. • Think of a tree as your family. There are great-grandparents, grandparents, parents and children. Therefore, parents have one level above and one level below. • The root of this is Company, because without the company, there would be no product.
Sample PastaPrimavera DTD <?xml version="1.0"?> <!DOCTYPE Company [ <!ELEMENT Company (Subdivision,Market,Product,Type,Packaging,Size,Ingredients)> <!ELEMENT Subdivision (#PCDATA)> <!ELEMENT Market (#PCDATA)> <!ELEMENT Product (#PCDATA)> <!ELEMENT Type (#PCDATA)> <!ELEMENT Packaging (#PCDATA)> <!ELEMENT Size (#PCDATA)> <!ELEMENT Ingredients (Spices|Seasonings)> <!ELEMENT Spices (#PCDATA)> <!ELEMENT Seasonings (#PCDATA)> ]> <Company> <Subdivision>North America </Subdivision> <Market>Northeast</Market> <Product>Spaghetti Sauce</Product> <Type>Alfredo</Type> <Packaging>Glass</Packaging> <Size>12 oz.</Size> <Ingredients> <Spices>Pepper</Spices> </Ingredients> </Company>
Creating A Root Element • Every XML document must have one root element. • The root element contains all the other elements, and attributes inside of it. • The root element is the most important in the document, and usually takes a name that describes its function.
Comments • Insert comments when you want to write notes about either the code to guide you through revisions or to leave instructions for others. • Although comments are contained within an XML statement, they do nothing in terms of its programming instructions. <!--I like spaghetti-->
Elements • An element is the basic building block of an XML document. • The best way to think of elements is that they are like the nouns of an XML statement. • The declaration begins with an opening bracket (<) and exclamation point (!) followed by the keyword ELEMENT. • An element name must start with a letter or underscore (_) character.
Empty Element Tags • An empty element is a placeholder for content. It tells the parser that later on something is going to fill in the space. • Why use an empty element? One example might be if you were designing a logo for a company, and it wasn’t ready yet. • Empty elements are also used for line breaks and can be anywhere from a single character up to a large amount of XML. <image/>
#PCDATA • Parsed Character Data (#PCDATA) is data that has text. That text is a string of characters that either contains markup, content or both. • Parsed Character Data is sent through an XML processor and unparsed information involves itself with data that refers to an application other than XML. <?xml version="1.0"?> <!DOCTYPE myfirstpcdata [ <!ELEMENT myfirstpcdata (#PCDATA)> ]> <myfirstpcdata> This is my first xml document </myfirstpcdata>
CDATA • CDATA is very important for putting large blocks of text into a special section that the XML processor looks at only as pure text. • This is because the character reference for the "<" symbol is "<"; and other character symbols are equally as cumbersome. It would be extremely odd to have hundreds of lines of text looking like "<. • A CDATA statement looks like this. <![CDATA[ Text Information (for example "<") ]]>
Attributes (#!ATTLIST) • Attributes give more information about an element and reside inside of the element's start tag, listed underneath the element with which it is affiliated. • Attributes can further define the behavior of an element and can also allow it to have extended links through giving it an identifier. • Attributes follow the same naming rules as elements and each attribute has a name and a value.
Attributes (2) • Attributes break into basic value types; Tokenized, Enumerated, CDATA and Default. • Tokenized breaks down into a simple unit, for example a specific name. It imposes constraints on the values of the attribute. • Enumerated consists of one or more notations in which a choice of valid values is provided. • CDATA means Character Data; the value is not constrained.
Syntax for an Attribute • The basic syntax for an attribute is; <!ELEMENT squarebox (#PCDATA)> <!ATTLIST squarebox color #IMPLIED> • And in an XML document it would look like: <squareboxcolor="green">Gift Box </squarebox>
Entities • Entities are the basic storage units of any XML document. • The actual data they store may be derived from a variety of sources, but the sources themselves are not entities. • An entity can be declared once, but uses countless times and does not change the semantic markup of the actual document. Its real purpose is to streamline the code. <!ENTITY PPV "Pasta Primavera Superb"> <Announcement>This is the home of &PPV</Announcement>
How To Decide: Attribute vs. !ELEMENT • The best way to decide in choosing an attribute instead of an element is to figure out their basic functions. • Actual data about anything is usually an element. • Anything that describes the data itself is probably an attribute. • If the actual information needs to be displayed and read, then an element is best using child elements because it is more extensible. Attributes work best with identifiers or certain types of formatting.