熊貓隨機抽樣配比1：1的特定列條目

我有一個熊貓數據框對象，列['text', 'label']，標籤的值爲'pos'或'neg'。熊貓隨機抽樣配比1：1的特定列條目

問題是我有更多的'負'標籤列，因爲我有'pos'。

現在的問題是，是否存在隨機選擇與'pos'句子一樣多的'neg'句子的可能性，所以我得到一個新的數據框，兩個標籤的比例爲50:50？

我是否必須計算'pos'句子將它們全部放在一個新的數據框中，然後執行neg_df = dataframe.sample(n=pos_count)並將其追加到之前創建的所有正數據框中，還是有更快的方法？

感謝您的幫助。

2016-02-11 d.a.d.a

# Sample data. 
df = pd.DataFrame({'text': ['a', 'b', 'c', 'd', 'e'], 
        'label': ['pos'] * 2 + ['neg'] * 3}) 
>>> df 
    label text 
0 pos a 
1 pos b 
2 neg c 
3 neg d 
4 neg e 

# Create views of 'pos' and 'neg' text. 
neg_text = df.loc[df.label == 'neg', 'text'] 
pos_text = df.loc[df.label == 'pos', 'text'] 

# Equally sample 'pos' and 'neg' with replacement and concatenate into a dataframe. 
result = pd.concat([neg_text.sample(n=5, replace=True).reset_index(drop=True), 
        pos_text.sample(n=5, replace=True).reset_index(drop=True)], axis=1) 

result.columns = ['neg', 'pos'] 

>>> result 
    neg pos 
0 c b 
1 d a 
2 c b 
3 d a 
4 e a

來源

2016-02-11 18:01:51 Alexander

感謝這導致了我想要的行爲。首先，我不能多次使用相同的文本行，因爲我正在使用它來訓練分類器，但是刪除'replace = True'的確有用。其次我需要追加兩個新的幀而不是concat othervise我的分類器拋出一個錯誤。 –

熊貓隨機抽樣配比1：1的特定列條目

回答

相關問題