邢红瑞的blog--使用jaxp对text的处理

本站首页管理页面写新日志退出

« September 2025 »
日一二三四五六
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

公告

戒除浮躁，读好书，交益友

我的分类（专题）

首页(523)
生活杂事(38)
脚本语言(15)
template engine(3)
opensource(4)
数据库(23)
c++(68)
linux kernel(20)
jvm(22)
java语言(118)
web开发(1)
开发工具(35)
算法与数据结构(0)
orm(4)
linux(37)
软件项目管理(15)
j2ee(67)
编程感想(45)
PKI(7)
UTM(16)
rootkit(9)
concurrent(0)
multicore(0)
WAF(2)

日志更新

ubuntu下安装vmware
ubuntu删除vmware
nginx配置ssl
半价售书限北京
2012年的计划
centos安装LiHei Pro字体
fedora 15 root不能登陆修
secrt在实现vim彩色显示
vc9编译openvpn2.2.1
如何调试nginx

留言板

签写新留言

求助
mysql5.0.45客户端登陆hang
关于jdk本地代码
哈哈，看来国内的产权保护意识越来越浓了，

链接

尚老大的blog
cyt
黑夜路人的开源世界
庄周梦蝶
熔岩
 成都心情
 龙居
 mmwy
jackyrong
猩猩的空间
 他山之石可以攻玉
 坏男孩
 上善若水
 杨中科
 蛟龍居
 周波的Blog
小明思考
 sysnap

Blog信息

blog名称:邢红瑞的blog
日志总数:523
评论数量:1142
留言数量:0
访问次数:9715008
建立时间:2004年12月20日

[java语言]使用jaxp对text的处理　
原创空间, 软件技术, 电脑与网络

邢红瑞发表于 2006/4/25 17:22:21

这个问题很有意思,看下面代码:orders.xml文件：<orders> <order> <customerid limit="1000">12341</customerid> <status>pending</status> <item instock="Y" itemid="SA15"> <name>Silver Show Saddle, 16 inch</name> <price>825.00</price> <qty>1</qty> </item> <item instock="N" itemid="C49"> <name>Premium Cinch</name> <price>49.00</price> <qty>1</qty> </item> </order> <order> <customerid limit="150">251222</customerid> <status>pending</status> <item instock="Y" itemid="WB78"> <name>Winter Blanket (78 inch)</name> <price>20</price> <qty>10</qty> </item> </order></orders> 请问orders根结点下有几个子结点？有五个子结点，两个元素结点，三个空白结点（order 元素之间和周围的），我有些不明白，空白能算做文本结点吗？我写程序时怎样把它忽略掉？我写的程序如下：public static void main(String[] args) { File docFile = new File("h://orders.xml"); Document doc = null; try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); // dbf.isIgnoringElementContentWhitespace(); // dbf.setIgnoringElementContentWhitespace(true); DocumentBuilder db = dbf.newDocumentBuilder().; doc = db.parse(docFile); // STEP 1: Get the root element Element root = doc.getDocumentElement(); System.out.println("The root element is " + root.getNodeName()); // STEP 2: Get the children NodeList children = root.getChildNodes(); System.out.println("There are "+children.getLength() +" nodes in this document."); } catch (Exception e) { System.out.print("Problem parsing the file: "+e.getMessage()); } }}程序输出如下：The root element is ordersThere are 5 nodes in this document.原因是空格和回车都产生TEXT节点，<orders>---</orders>中有三个这样的TEXT节点，二个<order>节点继续使用上面的xml文件，这样的程序看着更清楚：package com.dom;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import java.io.File;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.NamedNodeMap;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import org.w3c.*; public class OrderProcessor { private static void stepThrough(Node start){ System.out.println(start.getNodeName()+" = "+start.getNodeValue()); /*通过将节点的 nodeType 与常量 ELEMENT_NODE 作比较，从而确定它是否为一个元素。 Node 对象带有成员常量，它们表示每种类型的节点，比如 ELEMENT_NODE 或 ATTRIBUTE_NODE。如果 nodeType 与 ELEMENT_NODE 匹配，它就是一个元素。对于找到的每个元素，应用程序都会创建一个包含该元素的所有属性的 NamedNodeMap。应用程序能够迭代 NamedNodeMap，打印每个属性的名称和值，就像它迭代 NodeList 一样。*/if (start.getNodeType() == start.ELEMENT_NODE) { NamedNodeMap startAttr = start.getAttributes();for (int i = 0; i < startAttr.getLength();i++) {Node attr = startAttr.item(i);System.out.println(" Attribute: "+ attr.getNodeName()+" = "+attr.getNodeValue());} } // for 循环首先从根元素的第一个孩子开始。应用程序迭代第一个孩子的所有兄弟，直至已全部对它们求值 for (Node child = start.getFirstChild(); child != null;child = child.getNextSibling()){stepThrough(child); }} public static void main(String[] args) {File docFile = new File("h://orders.xml");Document doc = null; try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();// 这句话dbf.setIgnoringElementContentWhitespace(true);怎么没起作用？ dbf.setIgnoringElementContentWhitespace(true);DocumentBuilder db = dbf.newDocumentBuilder(); doc = db.parse(docFile);doc.normalize();// STEP 1: Get the root element Element root = doc.getDocumentElement();System.out.println("The root element is " + root.getNodeName()); // STEP 2: Get the childrenNodeList children = root.getChildNodes();System.out.println("There are "+children.getLength()+" nodes in this document.");// STEP 4: Recurse this functionality(递归)stepThrough(root);} catch (Exception e) {System.out.print("Problem parsing the file: "+e.getMessage());} }} The root element is ordersThere are 5 nodes in this document.orders = null#text = order = null#text = customerid = nullAttribute: limit = 1000#text = 12341#text = status = null#text = pending#text = item = nullAttribute: instock = YAttribute: itemid = SA15#text = name = null#text = Silver Show Saddle, 16 inch#text = price = null#text = 825.00#text = qty = null#text = 1#text = #text = item = nullAttribute: instock = NAttribute: itemid = C49#text = name = null#text = Premium Cinch#text = price = null#text = 49.00#text = qty = null#text = 1#text = #text = #text = order = null#text = customerid = nullAttribute: limit = 150#text = 251222#text = status = null#text = pending#text = item = nullAttribute: instock = YAttribute: itemid = WB78#text = name = null#text = Winter Blanket (78 inch)#text = price = null#text = 20#text = qty = null#text = 10#text = #text = #text =我的目的就是要去除不需要的文本,可是doc.normalize();我加上了，可是还是没去掉空白结点！！！sun的jaxp规范上所述setCoalescing()To convert CDATA nodes to Text node and append to an adjacent Text node (if any).setExpandEntityReferences()To expand entity reference nodes.setIgnoringComments()To ignore comments.setIgnoringElementContentWhitespace()To ignore whitespace that is not a significant part of element content. The default values for all of these properties is false, which preserves all the lexical information necessary to reconstruct the incoming document in its original form. Setting them all to true lets you construct the simplest possible DOM, so the application can focus on the data's semantic content, without having to worry about lexical syntax details.可是setIgnoringElementContentWhitespace()根本发挥作用,后来发现不能用NodeList children = root.getChildNodes(); 应该用 NodeList children = root.getElementByTagName("order"); !!!!使用root.getChildNodes(); 当然把空白结点也算进去了.幸亏这个xml 比较小,只有一种标签名order ,如果遇到跟结点下有几十个标签名的xml文件,要写几十句类似 NodeList children = root.getElementByTagName("order"); 把各个标签名的个数相加!!不敢使用啊.后来发现用节点的getNodeType()方法得到节点的类型,空白类型的是Node.TEXT_NODE,节点类型的是Node.ELEMENT_NODE,这样用if判断一下就可以得到想要的节点了.以前使用JDOM的时候,没有这个问题,但是使用jaxp不是我的错啊.其实使用SAX就行,startElement(String nameSpaceURI,String sName,String qName,Attribute attrs) 方法中判断以下不就完了吗？if("".equals(sName)) 忽略....不过更为恐怖在后边,使用dom3的技术 DOMConfiguration config = doc.getDomConfig(); config.setParameter("element-content-whitespace", Boolean.FALSE); doc.normalizeDocument(); sun的描述 normalizeDocumentvoid normalizeDocument() 此方法的行为如同使文档通过一个保存和加载的过程，而将其置为 "normal（标准）" 形式。因此，此方法更新 EntityReference 节点的替换树并规范化 Text 节点，如在方法 Node.normalize() 中定义的那样。否则，实际结果取决于在 Document.domConfig 对象上设置的、控制哪个操作实际发生的特性。值得注意的是，此方法还可以按照其中描述的算法使文档名称空间格式良好、检查字符规范化、移除 CDATASection 节点，等等。有关详细信息请参见 DOMConfiguration。 // Keep in the document the information defined // in the XML Information Set (Java example) DOMConfiguration docConfig = myDocument.getDomConfig(); docConfig.setParameter("infoset", Boolean.TRUE); myDocument.normalizeDocument();生成改变事件（当被支持时）来反映在该文档上发生的更改。如果在调用此方法期间发生错误（如试图更新只读节点）或按照正在使用的 XML 版本 Node.nodeName 包含无效字符，则将使用与 "error-handler" 参数相关联的 DOMErrorHandler 对象报告错误或警告（DOMError.SEVERITY_ERROR 或 DOMError.SEVERITY_WARNING）。注意，如果实现无法从错误中恢复，则此方法还可能报告严重错误（DOMError.SEVERITY_FATAL_ERROR）。从以下版本开始： DOM Level 3 出现java.lang.AbstractMethodError: org.apache.xerces.dom.DeferredDocumentImpl.getDomConfig()Lorg/w3c/dom/DOMConfiguration;一般是应用程序试图调用一个抽象方法时抛出的异常信息。通常由编译器检测此错误；如果一些类的定义从当前执行的方法被最后一次编译以来作了不兼容的修改，那么此错误将只可能发生在运行时刻。jdk做得太可爱了.

阅读全文(3742) | 回复(0) | 编辑 | 精华

发表评论：

昵称：
密码：
主页：
标题：

验证码： (不区分大小写,请仔细填写,输错需重写评论内容！)

站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.049 second(s), page refreshed 144811648 times.
《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号