python爬虫之三:解析网络报文xml

本节主要是讲解在项目中怎么解析获取的xml报文并获取相关字段。 xml解析第三方库学习地址:http://www.runoob.com/python/python-xml.html

xml文件如下:

<?xml version="1.0" encoding="UTF-8"?>  <Task version="1.3" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task">    <RegistrationInfo>      <Date>2018-03-19T03:57:44.2908045</Date>      <Author>FANBINGLINAdministrator</Author>      <Description>开机提醒事件</Description>    </RegistrationInfo>    <Triggers>      <LogonTrigger>        <Enabled>true</Enabled>      </LogonTrigger>    </Triggers>    <Principals>      <Principal id="Author">        <UserId>FANBINGLINAdministrator</UserId>        <LogonType>InteractiveToken</LogonType>        <RunLevel>LeastPrivilege</RunLevel>      </Principal>    </Principals>    <Settings>      <MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy>      <DisallowStartIfOnBatteries>true</DisallowStartIfOnBatteries>      <StopIfGoingOnBatteries>true</StopIfGoingOnBatteries>      <AllowHardTerminate>true</AllowHardTerminate>      <StartWhenAvailable>false</StartWhenAvailable>      <RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable>      <IdleSettings>        <StopOnIdleEnd>true</StopOnIdleEnd>        <RestartOnIdle>false</RestartOnIdle>      </IdleSettings>      <AllowStartOnDemand>true</AllowStartOnDemand>      <Enabled>true</Enabled>      <Hidden>false</Hidden>      <RunOnlyIfIdle>false</RunOnlyIfIdle>      <DisallowStartOnRemoteAppSession>false</DisallowStartOnRemoteAppSession>      <UseUnifiedSchedulingEngine>false</UseUnifiedSchedulingEngine>      <WakeToRun>false</WakeToRun>      <ExecutionTimeLimit>P3D</ExecutionTimeLimit>      <Priority>7</Priority>    </Settings>    <Actions Context="Author">      <ShowMessage>        <Title>每日提醒</Title>        <Body>  1、掌握python基本语法,3.19-3.24  2、VBA程序研究  3、工作任务总结</Body>      </ShowMessage>    </Actions>  </Task>

解析的代码(中间有部分调试文件):

#!/usr/bin/python3  #coding:utf-8    from xml.dom.minidom import parse  import xml.dom.minidom  Root = xml.dom.minidom.parse('开机提醒.xml')  # print(dir(DOMTree))  task = Root.documentElement  # print(dir())  for line in task.childNodes:      # print('line.nodeName:',line.nodeName,'line.nodeType:',line.nodeType,'line.nodeValue:',line.nodeValue,'line.normalize:',line.normalize)      # print(len(line))      # print(line)      if 3 == line.nodeType:          continue      if 'Actions' == line.nodeName:            for tmp in line.childNodes:              # print(tmp)              if 3 == tmp.nodeType:                  continue              # print(tmp)              for tmp1 in tmp.childNodes:                  if 3 == tmp1.nodeType:                      continue                  for tmp2 in tmp1.childNodes:                      # print(tmp2)                      # if 3 == tmp2.nodeType:                      #   continue                      print(tmp2.nodeValue)      # for line1 in line.childNodes:      #   if 3 == line1.nodeType:      #       continue      #   # print(line1.nodeName)      #   # print(dir(line1))        #   for line2 in line1.childNodes:      #       if 3 == line2.nodeType:      #           continue              # print(line2.nodeValue)              # print(line2.data)

效果图: