I am looking for the best way to do a random stratified sampling like survey and polls. I don't want to do a sklearn.model_selection.StratifiedShuffleSplit since I am not doing a supervised learning and I have no target. I just want to create random stratified samples from pandas DataFrame (https://www.investopedia.com/terms/stratified_random_sampling.asp).
Python is my main language.
Thank you for any help
Given that the variables are binned, the following one liner should give you the desired output. I see that scikit-learn is mainly employed for purposes other than yours but using a function from it should not do any harm.
Note that if you have a scikit-learn version earlier than the 0.19.0, the sampling result might contain duplicate rows.
If you test the following method, please share whether it behaves as expected or not.
from sklearn.model_selection import train_test_split
stratified_sample, _ = train_test_split(population, test_size=0.999, stratify=population[['income', 'sex', 'age']])