I hava a dataframe df like the following:
Col1 Col2
0 1 T
1 1 B
2 3 S
3 2 A
4 1 C
5 2 A
etc...
I would like to create two dataframes: df1 is a random sample of 10 rows such that Col2=='T'. df2 is df minus the rows in df1.
Assuming you have a unique-indexed dataframe (and if you don't, you can simply do .reset_index()
, apply this, and then set_index
after the fact), you could use DataFrame.sample
. [Actually, you should be able to use sample
even if the frame didn't have a unique index, but you couldn't use the below method to get df2
.]
Note that I'm using A instead of T in this example because A is the only repeated value of Col2 in the example you gave, and I'll only select 1 randomly rather than 10.
>>> df1 = df[df.Col2 == "A"].sample(1)
>>> df2 = df[~df.index.isin(df1.index)]
>>> df1
Col1 Col2
3 2 A
>>> df2
Col1 Col2
0 1 T
1 1 B
2 3 S
4 1 C
5 2 A