Parsing XML with DOM APIs in Python
Last Updated :
10 May, 2020
Improve
The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.
Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:
html
Now, let's parse the above XML using python. The below code demonstrates the process,
Python3 1==
Output:
Python3 1==
Output:
<?xml version="1.0"?>
<company>
<name>GeeksForGeeks Company</name>
<staff id="1">
<name>Amar Pandey</name>
<salary>8.5 LPA</salary>
</staff>
<staff id="2">
<name>Akbhar Khan</name>
<salary>6.5 LPA</salary>
</staff>
<staff id="3">
<name>Anthony Walter</name>
<salary>3.2 LPA</salary>
</staff>
</company>
from xml.dom import minidom
doc = minidom.parse("sample.xml")
# doc.getElementsByTagName returns the NodeList
name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
staff_id = staff.getAttribute("id")
name = staff.getElementsByTagName("name")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:% s, name:% s, salary:% s" %
(staff_id, name.firstChild.data, salary.firstChild.data))
GeeksForGeeks Company id:1, name: Amar Pandey, salary:8.5 LPA id:2, name: Akbar Khan, salary:6.5 LPA id:3, name: Anthony Walter, salary:3.2 LPAThe same can also be done using a user-defined function as shown in the code below:
from xml.dom import minidom
doc = minidom.parse("sample.xml")
# user-defined function
def getNodeText(node):
nodelist = node.childNodes
result = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
result.append(node.data)
return ''.join(result)
name = doc.getElementsByTagName("name")[0]
print("Company Name : % s \n" % getNodeText(name))
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
staff_id = staff.getAttribute("id")
name = staff.getElementsByTagName("name")[0]
salary = staff.getElementsByTagName("salary")[0]
print("id:% s, name:% s, salary:% s" %
(staff_id, getNodeText(name), getNodeText(salary)))
Company Name : GeeksForGeeks Company id:1, name:Amar Pandey, salary:8.5 LPA id:2, name:Akbhar Khan, salary:6.5 LPA id:3, name:Anthony Walter, salary:3.2 LPA