XML Implementation Issues
XML Reserved Characters
The topic of XML "reserved" characters is normally covered in XML tutorials, but is worth emphasizing here because ignoring these reserved characters is a frequent cause of errors in data. There are five characters that are reserved and cannot be used directly in XML element or attribute data; they must be replaced with what are called “XML Entity References”. Some XML software systems will automatically do the conversion of the XML reserved characters; otherwise logic must be added to perform the conversion of the characters. The following table displays the reserved character in the first column, followed by the XML Entity Reference that it is replaced with in the second column.
Reserved Character | Substitute With | Character Name |
---|---|---|
& | Ampersand | |
< | Less Than | |
> | Greater Than | |
' | Apostrophe | |
" | Quote |
In addition, accented characters, such as the ñ in Peña or the é in résumé, must be properly UTF-8 encoded. This consideration affects all characters whose Unicode numeric value is greater than 126 (decimal), which can also be expressed as 7e (hexadecimal). If UTF-8 encoding is too difficult, Numeric Entity Encoding is quite easy. Numeric Entity Encoding can be done using the decimal or hexadecimal value of the character, whichever is more convenient. To distinguish decimal Unicode value from hexadecimal Unicode value, the letter x immediately precedes a hexadecimal value. The following table displays two example characters that require encoding in the first column, followed by the decimal value of that character in Unicode, followed by the decimal Numeric Entity Encoding, followed by the hexadecimal value of the same character, and finally the hexadecimal Numeric Entity Encoding:
Character | Unicode (decimal) | Substitute | Unicode (hexadecimal) | Substitute |
---|---|---|---|---|
é | 233 | e9 | ||
ñ | 241 | f1 |
Specifying the XSD in the XML Data File
Any XML parser will validate whether a file or string contains valid XML. But to get the parser to validate whether it conforms to a specific XSD, you need to tell the parser which XSD to use. There are 2 ways to do it. Some parsers use the XML Namespace Declaration method, and others use the XML Schema Instance method. In both cases, you modify the XML itself to say what XSD it adheres to.
Using XML Namespace Declaration
In the SBA_ETran element, you may optionally define an “XML Namespace Declaration” that specifies the location of the XSD that is used to validate the data in the XML file. The following line is a sample XML Namespace Declaration from an XML data file.
Because no prefix is given between “xmlns” and “=”, the entire XML file will be governed by this XSD as the “default namespace”. You can, if you wish, define a namespace prefix, but this is unnecessary and involves more work.
Using XML Schema Instance Declaration
In this case, you need to do an XML Namespace declaration to define the "xsi" prefix in the XML's root tag. (The "xsi" stands for "XML Schema Instance:".) Then, in the same root tag, you will need to specify the XML's xsi:noNamespaceSchemaLocation attribute. Note that noNamespaceSchemaLocation has the prefix "xsi:", specifying that it's defined at the URL in the previous xmlns:xsi declaration:
Again, in the first attribute, xsi comes after the colon. In the second attribute, xsi comes before the colon.
White Space in XML Documents
All of the XML sample data shown in this document includes formatting characters to make the data more readable. Each element is on its own line, and data within element containers is indented. These space, tab, carriage return, and line feed characters are known as "white space" and create a nicely-formatted output as shown below.
Sample XML Data with "White Space":
The XML files that are exchanged with the SBA will normally be sent without any "white space" characters as shown below.
Sample XML Data without "White Space":
"White space" characters are generally eliminated from production XML files for two reasons:
- Eliminating the "white space" characters makes the XML files smaller (approximately 7% to 9%).
- Most Internet browser software and XML editors currently in use, automatically format the XML data with "white space" to make it more readable.
Handling Elements with No Data
There are three valid ways of showing (or not showing) that an element has no data value. For Loan Applications, it does not matter which of the three ways you choose. However for Loan Servicing, if the element name is provided and no data is provided, the “nullstring” will be stored in the database column. If the element name is not provided, the database column will not be updated.
The preferred method is to not include the element name in the XML file. The following data sample displays a container record for a guarantor with no middle name and no "MiddleInitial" element.
Some XML generation software will create an "empty" element, where there is a single element tag that includes the "slash" character after the element name, as shown below. While this is a valid XML expression for an element with no data, if there are multiple values for the element, then all values must be empty for this element.
From some systems, both the beginning and end tags without data are generated. This is also a valid XML expression and must be used when there are multiple values for the element to specify a place holder.