python爬虫之三:解析网络报文xml
- 2020 年 1 月 7 日
- 筆記
本节主要是讲解在项目中怎么解析获取的xml报文并获取相关字段。 xml解析第三方库学习地址:http://www.runoob.com/python/python-xml.html
xml文件如下:
<?xml version="1.0" encoding="UTF-8"?> <Task version="1.3" xmlns="http://schemas.microsoft.com/windows/2004/02/mit/task"> <RegistrationInfo> <Date>2018-03-19T03:57:44.2908045</Date> <Author>FANBINGLINAdministrator</Author> <Description>开机提醒事件</Description> </RegistrationInfo> <Triggers> <LogonTrigger> <Enabled>true</Enabled> </LogonTrigger> </Triggers> <Principals> <Principal id="Author"> <UserId>FANBINGLINAdministrator</UserId> <LogonType>InteractiveToken</LogonType> <RunLevel>LeastPrivilege</RunLevel> </Principal> </Principals> <Settings> <MultipleInstancesPolicy>IgnoreNew</MultipleInstancesPolicy> <DisallowStartIfOnBatteries>true</DisallowStartIfOnBatteries> <StopIfGoingOnBatteries>true</StopIfGoingOnBatteries> <AllowHardTerminate>true</AllowHardTerminate> <StartWhenAvailable>false</StartWhenAvailable> <RunOnlyIfNetworkAvailable>false</RunOnlyIfNetworkAvailable> <IdleSettings> <StopOnIdleEnd>true</StopOnIdleEnd> <RestartOnIdle>false</RestartOnIdle> </IdleSettings> <AllowStartOnDemand>true</AllowStartOnDemand> <Enabled>true</Enabled> <Hidden>false</Hidden> <RunOnlyIfIdle>false</RunOnlyIfIdle> <DisallowStartOnRemoteAppSession>false</DisallowStartOnRemoteAppSession> <UseUnifiedSchedulingEngine>false</UseUnifiedSchedulingEngine> <WakeToRun>false</WakeToRun> <ExecutionTimeLimit>P3D</ExecutionTimeLimit> <Priority>7</Priority> </Settings> <Actions Context="Author"> <ShowMessage> <Title>每日提醒</Title> <Body> 1、掌握python基本语法,3.19-3.24 2、VBA程序研究 3、工作任务总结</Body> </ShowMessage> </Actions> </Task>
解析的代码(中间有部分调试文件):
#!/usr/bin/python3 #coding:utf-8 from xml.dom.minidom import parse import xml.dom.minidom Root = xml.dom.minidom.parse('开机提醒.xml') # print(dir(DOMTree)) task = Root.documentElement # print(dir()) for line in task.childNodes: # print('line.nodeName:',line.nodeName,'line.nodeType:',line.nodeType,'line.nodeValue:',line.nodeValue,'line.normalize:',line.normalize) # print(len(line)) # print(line) if 3 == line.nodeType: continue if 'Actions' == line.nodeName: for tmp in line.childNodes: # print(tmp) if 3 == tmp.nodeType: continue # print(tmp) for tmp1 in tmp.childNodes: if 3 == tmp1.nodeType: continue for tmp2 in tmp1.childNodes: # print(tmp2) # if 3 == tmp2.nodeType: # continue print(tmp2.nodeValue) # for line1 in line.childNodes: # if 3 == line1.nodeType: # continue # # print(line1.nodeName) # # print(dir(line1)) # for line2 in line1.childNodes: # if 3 == line2.nodeType: # continue # print(line2.nodeValue) # print(line2.data)
效果图:
