how to delete new lines in a dataframe when writing xml to it?

I would recommend to build a dict in a loop and then create dataframe on basis of that dict. Here is an example:

xml = '''<?xml version="1.0" ?>
<TrackingResult>
   <Events>
      <TrackingEvent>
         <DateTimeStamp>202010</DateTimeStamp>
         <Event>Delivered</Event>
         <ExtraInfo>02921</ExtraInfo>
      </TrackingEvent>
      <TrackingEvent>
         <DateTimeStamp>202010</DateTimeStamp>
         <Event>Delivery today</Event>
         <ExtraInfo>31916</ExtraInfo>
      </TrackingEvent>
   </Events>
   <Signatures />
   <Errors />
</TrackingResult>'''

from lxml import etree as ET
import pandas as pd
from collections import defaultdict

d = defaultdict(list)

tree = ET.fromstring(xml)
for child in tree.iter('TrackingEvent'):
    for elem in child.iter():
        if (elem.text is not None and str(elem.text).strip() != ''):
            d[elem.tag].append(elem.text.strip())
        else:
            if len(list(elem)) == 0:
                d[elem.tag].append(None)

df = pd.DataFrame(d)

print(df)

Output:

  DateTimeStamp           Event ExtraInfo
0        202010       Delivered     02921
1        202010  Delivery today     31916

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top