how do you parse text in a table format shown in a csv file?

It seems that somebody saved the document with word-wrap option turned on.

You can try this script to convert the text to DataFrame (I’m supposing the columns are separated by more than two spaces):

import re
import pandas as pd


txt = '''name                type              attribute
----------------- ----------------------=
---------------- --------------------------=
oscar                  dog               can bark
and can roll over

bruno                  dog              can bark

buddy                 dog             can bark and can 
roll over'''

is_row = re.compile(r'\s{2,}')
is_dashed_line = re.compile(r'-{3,}')

data = []
for line in map(str.strip, txt.splitlines()):
    if not line or is_dashed_line.search(line):
        continue
    if is_row.search(line):
        data.append( is_row.split(line) )
    else:
        data[-1][-1] += ' ' + line

df = pd.DataFrame(data[1:], columns=data[0])
print(df)

Prints:

    name type                   attribute
0  oscar  dog  can bark and can roll over
1  bruno  dog                    can bark
2  buddy  dog  can bark and can roll over

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top