In general iterating over the contents of a pandas series or dataframe is bad for performance and instead vectorized approaches are recommended when possible:
Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided…
See the docs.
That said, for the “non-botched” entries you can do this:
df['date_time'] = df['date_time/full_company_name'].str.slice(0, 10)
df['full_company_name'] = df['date_time/full_company_name'].str.slice(10, None)
df.drop('date_time/full_company_name', axis=1, inplace=True)
For botched entries, if there is no consistent pattern, it’s gonna be hard to come up with a programmatic approach, but something that works for your examples and could be extended with other known entry errors is using extract
with a regular expression:
# The first capture group is either a Y-M-D date, the NaN string or a sequence of digits
# The second one is any string
pattern = r'(\d{4}-\d{2}-\d{2}|NaN|0|\d+)(.+)'
df['date_time/full_company_name'].str.extract(pattern)
Output:
0 1
0 2020-05-19 Lopez-Wallace
1 2020-05-12 Smith-Simon
2 2020-10-02 Jenkins Inc
3 2020-07-06 Moore-Weiss
4 0 Lopez, Barton and Jones
5 NaN Brown, Singleton and Harrell
6 84635 Ball-Thomas
CLICK HERE to find out more related problems solutions.