扫一扫
分享文章到微信
扫一扫
关注官方公众号
至顶头条
在本页阅读全文(共19页)
CPAN上的XML模块可以分成三大类:对 XML 数据提供独特的接口(通常有关在XML实例和Perl数据之间的转换),实现某一标准XML API的模块,和对一些特定的XML相关任务进行简化的特殊用途模块。这个月我们先关注第一个,XML Perl专用接口。
<?xml version="1.0"?>
<camelids>
<species name="Camelus dromedarius">
<common-name>Dromedary, or Arabian Camel</common-name>
<physical-characteristics>
<mass>300 to 690 kg.</mass>
<appearance>
The dromedary camel is characterized by a long-curved
neck, deep-narrow chest, and a single hump.
...
</appearance>
</physical-characteristics>
<natural-history>
<food-habits>
The dromedary camel is an herbivore.
...
</food-habits>
<reproduction>
The dromedary camel has a lifespan of about 40-50 years
...
</reproduction>
<behavior>
With the exception of rutting males, dromedaries show
very little aggressive behavior.
...
</behavior>
<habitat>
The camels prefer desert conditions characterized by a
long dry season and a short rainy season.
...
</habitat>
</natural-history>
<conservation status="no special status">
<detail>
Since the dromedary camel is domesticated, the camel has
no special status in conservation.
</detail>
</conservation>
</species>
...
</camelids>
现在我们假设此完整文档(可从本月例子代码中获取)包含骆驼家族所有成员的全部信息,而不仅仅是上面的单峰骆驼信息。为了举例说明每一模块是如何从此文件中提取某一数据子集,我们将写一个很简短的脚本来处理camelids.xml文档和在STDOUT上输出我们找到的每一种类的普通名(common-name),拉丁名(用括号包起来),和当前保存状况。因此,处理完整个文档,每一个脚本的输出应该为如下结果: Bactrian Camel (Camelus bactrianus) endangered Dromedary, or Arabian Camel (Camelus dromedarius) no special status Llama (Lama glama) no special status Guanaco (Lama guanicoe) special concern Vicuna (Vicugna vicugna) endangered
Hash 如下:
my %camelid_links = ( one => { url => ' http://www.online.discovery.com/news/picture/may99/photo20.html', description => 'Bactrian Camel in front of Great ' . 'Pyramids in Giza, Egypt.'}, two => { url => 'http://www.fotos-online.de/english/m/09/9532.htm', description => 'Dromedary Camel illustrates the ' . 'importance of accessorizing.'}, three => { url => 'http://www.eskimo.com/~wallama/funny.htm', description => 'Charlie - biography of a narcissistic llama.'}, four => { url => 'http://arrow.colorado.edu/travels/other/turkey.html', description => 'A visual metaphor for the perl5-porters ' . 'list?'}, five => { url => 'http://www.galaonline.org/pics.htm', description => 'Many cool alpacas.'}, six => { url => 'http://www.thpf.de/suedamerikareise/galerie/vicunas.htm', description => 'Wild Vicunas in a scenic landscape.'} );而我们所期望从hash中创建的文档例子为:
<?xml version="1.0">
<html>
<body>
<a href="http://www.eskimo.com/~wallama/funny.htm">Charlie -
biography of a narcissistic llama.</a>
<a href="http://www.online.discovery.com/news/picture/may99/photo20.html">Bactrian
Camel in front of Great Pyramids in Giza, Egypt.</a>
<a href="http://www.fotos-online.de/english/m/09/9532.htm">Dromedary
Camel illustrates the importance of accessorizing.</a>
<a href="http://www.galaonline.org/pics.htm">Many cool alpacas.</a>
<a href="http://arrow.colorado.edu/travels/other/turkey.html">A visual
metaphor for the perl5-porters list?</a>
<a href="http://www.thpf.de/suedamerikareise/galerie/vicunas.htm">Wild
Vicunas in a scenic landscape.</a>
</body>
</html>
良好缩进的XML结果文件(如上面所显示的)对于阅读很重要,但这种良好的空格处理不是我们案例所要求的。我们所关心的是结果文档是结构良好的/well-formed和它正确地表现了hash里的数据。
任务定义完毕,接下来该是代码例子的时候了。
use XML::Simple; my $file = 'files/camelids.xml'; my $xs1 = XML::Simple->new(); my $doc = $xs1->XMLin($file); foreach my $key (keys (%{$doc->{species}})){ print $doc->{species}->{$key}->{'common-name'} . ' (' . $key . ') '; print $doc->{species}->{$key}->{conservation}->final . "\n"; }
use XML::Simple; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $xsimple = XML::Simple->new(); print $xsimple->XMLout(\%camelid_links, noattr => 1, xmldecl => '');这数据到文档的任务的条件要求暴露了XML::Simple的一个弱点:它没有允许我们决定hash里的哪个key应该作为元素返回和哪个key该作为属性返回。上面例子的输出虽然接近我们的输出要求但还远远不够。对于那些更喜欢将XML文档内容直接作为Perl数据结构操作,而且需要在输出方面做更细微控制的案例,XML::Simple和XML::Writer配合得很好。
如下例子说明了如何使用XML::Write来符合我们的输出要求。
use XML::Writer; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $writer = XML::Writer->new(); $writer->xmlDecl(); $writer->startTag('html'); $writer->startTag('body'); foreach my $item ( keys (%camelid_links) ) { $writer->startTag('a', 'href' => $camelid_links{$item}->{url}); $writer->characters($camelid_links{$item}->{description}); $writer->endTag('a'); } $writer->endTag('body'); $writer->endTag('html'); $writer->end();
use XML::Parser; use XML::SimpleObject; my $file = 'files/camelids.xml'; my $parser = XML::Parser->new(ErrorContext => 2, Style => "Tree"); my $xso = XML::SimpleObject->new( $parser->parsefile($file) ); foreach my $species ($xso->child('camelids')->children('species')) { print $species->child('common-name')->{VALUE}; print ' (' . $species->attribute('name') . ') '; print $species->child('conservation')->attribute('status'); print "\n"; }
use XML::TreeBuilder; my $file = 'files/camelids.xml'; my $tree = XML::TreeBuilder->new(); $tree->parse_file($file); foreach my $species ($tree->find_by_tag_name('species')){ print $species->find_by_tag_name('common-name')->as_text; print ' (' . $species->attr_get_i('name') . ') '; print $species->find_by_tag_name('conservation')->attr_get_i('status'); print "\n"; }
use XML::Element; require "files/camelid_links.pl"; my %camelid_links = get_camelid_data(); my $root = XML::Element->new('html'); my $body = XML::Element->new('body'); my $xml_pi = XML::Element->new('~pi', text => 'xml version="1.0"'); $root->push_content($body); foreach my $item ( keys (%camelid_links) ) { my $link = XML::Element->new('a', 'href' => $camelid_links{$item}->{url}); $link->push_content($camelid_links{$item}->{description}); $body->push_content($link); } print $xml_pi->as_XML; print $root->as_XML();
use XML::Twig; my $file = 'files/camelids.xml'; my $twig = XML::Twig->new(); $twig->parsefile($file); my $root = $twig->root; foreach my $species ($root->children('species')){ print $species->first_child_text('common-name'); print ' (' . $species->att('name') . ') '; print $species->first_child('conservation')->att('status'); print "\n"; }
use XML::Twig;
require "files/camelid_links.pl";
my %camelid_links = get_camelid_data();
my $root = XML::Twig::Elt->new('html');
my $body = XML::Twig::Elt->new('body');
$body->paste($root);
foreach my $item ( keys (%camelid_links) ) {
my $link = XML::Twig::Elt->new('a');
$link->set_att('href', $camelid_links{$item}->{url});
$link->set_text($camelid_links{$item}->{description});
$link->paste('last_child', $body);
}
print qq|<?xml version="1.0"?>|;
$root->print;
这些例子举例说明了这些普通XML Perl模块的基本使用方法。我的目标是提供足够多的例子让你感受怎么用每个模块写代码。下个月我们将关注“实现某一标准XML API的模块”,特别说明的,XML::DOM, XML::XPath 和其他大量的 SAX 和类SAX模块。
濠电姷鏁告慨鐑藉极閸涘﹥鍙忛柣鎴濐潟閳ь剙鍊圭粋鎺斺偓锝庝簽閸旓箑顪冮妶鍡楀潑闁稿鎹囬弻娑㈡偄闁垮浠撮梺绯曟杹閸嬫挸顪冮妶鍡楀潑闁稿鎸剧槐鎾愁吋閸滃啳鍚Δ鐘靛仜閸燁偉鐏掗柣鐘叉穿鐏忔瑧绮i悙鐑樷拺鐟滅増甯掓禍浼存煕閹惧娲撮柟顔藉劤鐓ゆい蹇撴噳閹锋椽姊婚崒姘卞闁告娲熷畷濂稿Ψ閵壯勭叄婵犵數濮撮敃銈団偓姘煎弮瀹曪綀绠涢弮鍌滅槇婵犵數濮撮崐缁樻櫠濞戙垺鐓曢悗锝冨妼婵′粙鏌曢崶褍顏€殿喕绮欐俊姝岊槹闁逞屽墯鐢繝寮婚悢鍏煎癄濠㈣泛锕ュ▓濠氭⒑閸濆嫮鐏遍柛鐘崇墵楠炲啫饪伴崼婵堝幐闂佺ǹ鏈粙鎾广亹鐎n喗鐓熼幖娣€ゅḿ鎰箾閸欏顏堟偩濠靛牏鐭欓悹鎭掑妽濞堥箖姊洪崜鎻掍簼婵炲弶鐗犻幃鈥斥槈閵忥紕鍘遍柣蹇曞仜婢т粙鎯岀€n偆绠鹃柛顐ゅ枑閸婃劖鎱ㄦ繝鍕笡闁瑰嘲鎳愮划鐢碘偓锝庝簼閻d即姊绘担瑙勫仩闁告柨顑夊畷锟犲礃閼碱剚娈鹃梺闈涚箞閸婃洟宕橀埀顒€顪冮妶鍡楀闁稿骸宕惃顒勬⒒閸屾瑧鍔嶉悗绗涘懐鐭欓柟瀵稿Л閸嬫挸顫濋悡搴$睄閻庤娲戦崡鍐茬暦閸楃倣鐔兼⒐閹邦喚娉块梻鍌欑窔濞佳囨偋閸℃稑绠犻幖娣灪閸欏繑銇勯幒鍡椾壕闂佸疇顫夐崹鍧楀春閵夆晛骞㈡俊鐐插⒔閸戣绻濋悽闈浶為柛銊︽そ閺佸鏌ч懡銈呬沪濞e洤锕俊鍫曞川椤斿吋顏¢梻浣呵归鍛村磹閸︻厽宕叉繛鎴欏灩楠炪垺淇婇婵愬殭缁炬澘绉归弻锝嗘償閵忥絽顥濆銈忓閺佽顕g拠宸悑闁割偒鍋呴鍥⒒娴e憡鍟為柟鎼佺畺瀹曠増鎯旈…鎴炴櫔闂佹寧绻傞ˇ浠嬪极閸℃ぜ鈧帒顫濋濠傚闂佹椿鍘介〃鍡欐崲濞戙垹绠婚柡澶嬪灩閸斾即姊虹粙娆惧剱闁圭懓娲濠氭晲閸涱亝顫嶅┑鐐叉閸旀洜澹曢幎鑺モ拺闁告繂瀚﹢鎵磼鐎n偄鐏撮柛鈺冨仱楠炲鏁冮埀顒€顔忓┑鍥ヤ簻闁哄洨鍋為崳娲煃鐠囪鍔熺紒杈ㄦ崌瀹曟帒鈻庨幋婵嗩瀴婵$偑鍊戦崝宀勫箠濮椻偓楠炲棗鐣濋崟顐わ紲闂佺粯鍔欏ḿ褏绮婇敃鍌涚厵闁稿繗鍋愰弳姗€鏌涢弬璺ㄧ劯闁诡喚鍋ゅ畷褰掝敃閻樿京鐩庨梻浣告贡閸庛倝宕归悽鍓叉晜闁冲搫鎳忛崐鍨叏濮楀棗澧绘俊鎻掔秺閺屾洟宕惰椤忣厾鈧鍠曠划娆愪繆濮濆矈妲奸梺闈╃祷閸庡磭妲愰幘瀛樺缂佹稑顑呭▓顓炩攽閳藉棗浜濈紒璇茬墕椤曪絾绻濆顓炰簻缂佺偓濯芥ご鎼佸疾閿濆鍋℃繝濠傚暟鏁堥梺璇″枟閿曘垽骞婇悩娲绘晢闁稿本绮g槐鏌ユ⒑閸濆嫷妲搁柣妤€瀚板畷婵囨償閿濆洣绗夐梺缁樺姉閸庛倝鎮″☉銏″€堕柣鎰硾琚氶梺鍝ュУ閿曘垽寮婚埄鍐╁闁荤喐婢橀~鎺楁倵鐟欏嫭绀堥柛鐘崇墵閵嗕礁顫滈埀顒勫箖閳哄懏鎯炴い鎰╁€濋幏濠氭⒒閸屾艾鈧嘲霉閸パ呮殾闁割煈鍋呴崣蹇涙煙閹澘袚闁抽攱姊婚埀顒€绠嶉崕閬嵥囬鐐插瀭闁稿瞼鍋為悡銏′繆椤栨粌鐨戠紒杈ㄥ哺閺屻劌鈹戦崱鈺傂︾紓浣插亾閻庯綆鍋佹禍婊堟煛瀹ュ啫濡块柍钘夘槹缁绘盯宕奸悢铏圭厜濠殿喖锕ㄥ▍锝呪槈閻㈢ǹ宸濇い鏂惧嫎閳ь剚鍔曢—鍐Χ鎼粹€茬凹濠电偠灏欓崰鏍х暦濞差亜鐒垫い鎺嶉檷娴滄粓鏌熼崫鍕棞濞存粓绠栧娲箰鎼淬垻鈹涙繝纰樷偓铏悙閸楅亶鏌熼悧鍫熺凡缂侇偄绉归弻娑㈩敃閿濆洨鐣煎銈嗘尰濡炶棄顫忛搹鍦<婵☆垰鎼~宀勬倵濞堝灝娅橀柛鎾寸懆閻忓啴姊洪崨濠佺繁闁哥姵宀稿畷銏ゅ箹娴e厜鎷洪梺鍛婃尰瑜板啯绂嶆禒瀣厱閻庯綆浜滈顓㈡煙椤旀枻鑰块柡浣稿暣瀹曟帒鈽夊顒€绠為梻浣筋嚙閸戠晫绱為崱娑樼;闁糕剝蓱濞呯姵銇勯幒鎴濃偓鑽ゅ婵傚憡鐓曢悘鐐插⒔閳藉绱掑锕€娲﹂悡娆撴煟閻斿憡绶叉い蹇e弮閺岀喖鎮℃惔銏g闂佺懓寮堕幐鍐茬暦閻斿吋顥堟繛鎴炵懄閻濓繝姊婚崒姘偓鎼佸磹妞嬪海鐭嗗〒姘e亾妤犵偞鐗犻、鏇㈠Χ閸屾矮澹曞┑顔矫畷顒勫储鐎电硶鍋撶憴鍕缂傚秴锕濠氬幢濡ゅ﹤鎮戦梺鍛婁緱閸ㄧ晫妲愰柆宥嗙厽閹艰揪绱曢悾顓㈡煕鎼淬劋鎲鹃挊婵喢归崗鍏肩稇缁炬崘娉曢埀顒€绠嶉崕閬嵥囨导瀛樺亗闁哄洢鍨洪悡娑㈡煕閵夛絽鍔氬┑锛勫帶椤儻顧侀柛銊ゅ嵆濠€渚€姊虹紒妯撳湱绮旈鈧、鏃堝醇閻旇櫣鏆㈤梻鍌氬€烽悞锔锯偓绗涘懏宕查柛灞绢嚤濞戞鏃堝川椤撶姴骞掗梻浣告惈濞层垽宕瑰ú顏呭亗闁告劦浜濋崰鎰節婵犲倻澧曠紒鈧崼鐔稿弿婵☆垱瀵х涵楣冩煢閸愵亜鏋涢柡灞炬礃缁绘稖顦查悗姘卞厴瀹曟垿濡搁埡鍌楁嫼缂傚倷鐒﹂敋濠殿喖娲﹂妵鍕即閵娿儱绫嶉梺绯曟杺閸ㄨ棄顕i幘顔碱潊闁炽儲鏋奸崑鎾绘偨閸涘﹦鍙嗗┑鐘绘涧濡鍩€椤掑倹鍤€闁宠绉瑰畷鍫曞Ω閿濆嫮鐩庨梻濠庡亜濞诧妇绮欓幇鏉跨疅濡わ絽鍟悡娑㈡倶閻愰潧浜剧紒鈧€n兘鍋撶憴鍕濞存粌鐖奸妴浣割潨閳ь剟骞冮姀锛勯檮濠㈣泛顦辨径锟�