counting the different number of combinations that exists in a column

I believe you need split values by \s*-\s* – here \s* means zero or more spaces, then flatten in list comprehension all combinations:

from  itertools import combinations

L = ['-'.join(y) for x in df['DNA'].str.split('\s*-\s*') for y in combinations(x, 2)]

If necessary sorting values:

L = ['-'.join(sorted(y)) for x in df['DNA'].str.split('\s*-\s*') 
     for y in combinations(x, 2)]

Last pass to Series and call Series.value_counts:

s = pd.Series(L)
print (s)
0     xx345-b324
1      xx345-c82
2      xx345-d13
3      xx345-c14
4       b324-c82
5       b324-d13
6       b324-c14
7        c82-d13
8        c82-c14
9        d13-c14
10     xx345-a22
11     xx345-c14
12     xx345-d13
13       a22-c14
14       a22-d13
15       c14-d13
16       a34-f12
17       a34-r27
18      a34-fg98
19      a34-tr12
20       f12-r27
21      f12-fg98
22      f12-tr12
23      r27-fg98
24      r27-tr12
25     fg98-tr12
dtype: object

s1 = s.value_counts()
print (s1)
xx345-c14     2
xx345-d13     2
c14-d13       1
f12-tr12      1
xx345-a22     1
a34-fg98      1
f12-r27       1
a34-r27       1
c82-c14       1
f12-fg98      1
a22-c14       1
a34-tr12      1
a34-f12       1
b324-d13      1
r27-tr12      1
xx345-c82     1
d13-c14       1
b324-c14      1
xx345-b324    1
r27-fg98      1
fg98-tr12     1
b324-c82      1
c82-d13       1
a22-d13       1
dtype: int64

EDIT:

from  itertools import combinations

L = []
for x in df['DNA'].str.split('\s*-\s*'):
    if len(x) > 1:
        for y in combinations(x, 2):
            L.append('-'.join(sorted(y)))
    else:
        L.append(x[0])


s = pd.Series(L)
print (s)

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top