Extensible Markup Language (XML) is a popular data transfer format used by websites and applications. It allows you to easily define different types of data in one place, in a compact, lucid, human readable, machine readable manner. Since it has standard rules for data definition, it enables easy transfer of data between different applications & websites. XML along with JSON are the de facto data transfer formats for the internet. Sometimes you may need to modify XML file in Python. In this article, we will learn how to edit XML file in Python.
XML contains data represented in a hierarchical format so we will treat it as a tree for ease of use. For our purpose, we will use xml.etree.ElementTree for the purpose of parsing, searching & modifying XML tree. It offers many features & functions but we will use ElementTree and Element for our purpose. ElementTree contains the whole XML document as a tree, while Element contains a single node. ElementTree is useful for searching, reading & writing the whole document. Element is useful for reading & writing a single element. Each element has the following properties:
|Tag||String identifying type of data the element represents.|
Accessed using elementname.tag.
|Number of Attributes||Stored as a python dictionary.|
Accessed by elementname.attrib.
|Text string||String regarding the element.|
|Child string||Optional string about child elements|
|Child Elements||Number of child elements of a node|
How to Modify XML File in Python
We will look at the different methods & classes available for parsing, searching & modifying XML file in Python.
Parsing XML in Python
Parsing an XML in Python means loading an XML file or string to a python object, to be able to work with it using pythonic functions. Say you have an ElementTree object as ET. Here are the built-in functions and methods of the element tree for parsing.
1. ET.parse(‘Filename’).getroot() – ET.parse(‘fname’)-creates a tree and accesses the root by .getroot().
2. ET.fromstring(stringname) – Creates a root from an XML data string.
Here is a python code to create XML code. We have shown how to parse XML directly from string, as well as from XML file xmldocument.xml. When you parse XML document from file you need to explicitly call getroot() function to get the root. When you parse XML from string, it automatically returns the root.
# importing the module. import xml.etree.ElementTree as ET XMLexample_stored_in_a_string ='''<?xml version ="1.0"?> <COUNTRIES> <country name ="INDIA"> <neighbor name ="Dubai" direction ="W"/> </country> <country name ="Singapore"> <neighbor name ="Malaysia" direction ="N"/> </country> </COUNTRIES> ''' # parsing directly. tree = ET.parse('xmldocument.xml') root = tree.getroot() # parsing using the string. stringroot = ET.fromstring(XMLexample_stored_in_a_string) # printing the root. print(root) print(stringroot)
Here is the XML document used in the above python code. It contains the same content as the string.
<?xml version="1.0"?> <!--COUNTRIES is the root element--> <COUNTRIES> <country name="INDIA"> <neighbor name="Dubai" direction="W"/> </country> <country name="Singapore"> <neighbor name="Malaysia" direction="N"/> </country> </COUNTRIES>
Modify XML in Python
Once you have parsed XML document into a python object. xml.etree offers many methods to modify different parts of XML document as well as single nodes. Here are some of the popular ones.
- Element.set(‘attrname’, ‘value’) – Modifying element attributes.
- Element.SubElement(parent, new_childtag) -creates a new child tag under the parent.
- Element.write(‘filename.xml’)-creates the tree of xml into another file.
- Element.pop() -delete a particular attribute.
- Element.remove() -to delete a complete tag.
Here is a sample XML file that we will modify. It contains information about breakfast menu at a restaurant, where each element contains information about a food item, with attribute such as name, price, description, calories.
<?xml version="1.0"?> <breakfast_menu> <food> <name itemid="11">Belgian Waffles</name> <price>5.95</price> <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description> <calories>650</calories> </food> <food> <name itemid="31">Berry-Berry Belgian Waffles</name> <price>8.95</price> <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description> <calories>900</calories> </food> <food> <name itemid="41">French Toast</name> <price>4.50</price> <description>Thick slices made from our homemade sourdough bread</description> <calories>600</calories> </food> </breakfast_menu>
Here is the python code to iterate through price values and create new attribute ‘newprices’ for each node.
import xml.etree.ElementTree as ET mytree = ET.parse('xmldocument.xml') myroot = mytree.getroot() # iterating through the price values. for prices in myroot.iter('price'): # updates the price value prices.text = str(float(prices.text)+10) # creates a new attribute prices.set('newprices', 'yes') mytree.write('output.xml')
In the above code, we load the xmldocument.xml to mytree python object. We store its root element in myroot python object. Then using element methods, we iterate through the elements and add attribute ‘newprices’ to each node. We finally write the modified XML tree to output.xml file.
In this article, we have learnt how to easily modify XML document in python. Basically, you need to parse XML document or string to a python object, then use element methods, depending on your requirement, to modify the XML document, and then write back the modified XML tree back to an XML file.