Example data is:
000000008,2, 1,000000010000000009,1, 1,000000011000000010,1, 1,000000008000000011,2, 1,000000032000000012,3, 1,000000009000000013,2, 1,000000108
You can see that some values in the first column also appear in the fourth column. I want to remove those rows, where the value in the fourth column also appears on any row in the first column.
Therefore, in this example, following rows should be removed:
000000008,2, 1,000000010000000010,1, 1,000000008000000012,3, 1,000000009000000009,1, 1,000000011
Code starting point:
import numpy as npimport pandas as pdT = u'''000000008,2, 1,000000010 000000009,1, 1,000000011 000000010,1, 1,000000008 000000011,2, 1,000000032 000000012,3, 1,000000009 000000013,2, 1,000000108'''from io import StringIOdf = pd.read_csv(StringIO(T), header=None)print(df)