使用映射函數在pandas列中繪製直方圖

l有csv文件，該文件與pandas一起處理。該列被稱爲manual_raw_value我想檢索此列中的唯一字符並製作histogram。使用映射函數在pandas列中繪製直方圖

檢索所有不重複值L做了以下內容：

unique_values = set(df.manual_raw_value.apply(list).sum()) 

{' ', 
'!', 
'"', 
'%', 
'&', 
"'", 
'(', 
')', 
'*', 
'+', 
',', 
'-', 
'.', 
'/', 
'0', 
'1', 
'2', 
'3', 
'4', 
'5', 
'6', 
'7', 
'8', 
'9', 
':', 
'=', 
'>', 
'?', 
'@', 
'_', 
'a', 
'b', 
'c', 
'd', 
'e', 
'f', 
'g', 
'h', 
'i', 
'j', 
'k', 
'l', 
'm', 
'n', 
'o', 
'p', 
'q', 
'r', 
's', 
't', 
'u', 
'v', 
'w', 
'x', 
'y', 
'z'}

下面是數據

manual_raw_value 
    6,35 
    11,68 
    VOTRE 
    AVEL AR VRO 
    2292 
    questions. 
    nb 
    les 
    937,99 
    à 
    et 
    TTC 
    1 
    620 
    Echéance 
    vos 
    ROB21 
    Pièce 
    AGRIAL 
    désignation 
    des 
    taux 
    13s 
    2 
    par 
    le 
    mois, 
    32 
    21/07/2016 
    FR 
    au 
    0 
    téléphonique 
    BROYEUR 
    et 
    ST 
    TVA 
    de 
    des 
    ECHEANCIER 
    à 
    ne 
    lieu 
    481,67 
    N°0016 
    de 
    ministère 
    de 
    20/11/2015 
    Si 
    vous 
    59 
    cas 
    EUR 
    3.19 
    2 
    contrôle 
    assurances 
    BAS 
    et 
    4423873 
    renseignements 
    6104219 
    C9DECOMPTEDIVERS 
    6635 
    DE 
    10825

現在，由於L具有unique values升要打個histgram。這裏是歐萊雅已經試過

import pandas as pd 
    def find_group(val): 
     unique_values = set(df.manual_raw_value.apply(list).sum()) 
     for unique in unique_values: 
      # get the number of occurence of all the unique values 
      # then make a histogram 



    df = pd.read_csv('words.csv',sep=',') 
    df = df.astype(str) 
    df.manual_raw_value=df.manual_raw_value.str.lower() 
    df.manual_raw_value.apply(find_group) 
    df.manual_raw_value.apply(find_group).value_counts().plot(kind='bar')

的唯一值是那些由函數 unique_values = set(df.manual_raw_value.apply(list).sum())這是{' ', '!', '"', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', .....等返回。現在來看例如在手動row values : 6,35 11,68兩個值那麼我們可以說1 appears twice 6 twice ',' twice 3 one time 5 one time

編輯-1 我企圖此代碼來使alpha cells occurence的numbre的直方圖，alpahnumeric細胞和special char cells

def find_group(val): 
    val = str(val) 
    if val.isalpha(): 
     return 'Alpha' 
    elif val.isalnum and any(c.isalpha() for c in val): 
     return 'Alphanumeric' 
    else: 
     return 'Special' 

df.Column_values.apply(find_group) 
df.Column_values.apply(find_group).value_counts().plot(kind='bar')

現在升想使一個直方圖在人物等級：

通過循環遍歷每個單元格來獲取列中的唯一字符。 （完成）
計算這些字符在所有單元格中出現的次數並生成直方圖。 ＃1獲得stucked它 - 一旦

編輯-2 讓我們舉一個實際的例子。比方說，我的專欄叫做Column_value

Column_value 
    hello 
    good 
    morning 
    how 
    are 
    you

1，在每列L計算每個字符

hello : h=1 l=2 o=1 e=1 
good : g=1 o=2 d=1 
morning : m=1 o=1 r=1 n=2 g=1 
how: h=1 o=1 w=1 
are : a=1 r=1 e=1 
you: y=1 o=1 u=1

2-發生的數量進行求和得到的每個字符的出現次數在所有行

h=1+1=2 
l=2 
o=2+1+1+1=5 
e=1+1=2 
g=1 
d=1

等現在，使 H = 2，L = 2的直方圖，O = 5，E = 2，G = 1，d = 1

來源

2017-06-12 vincent75

@ImportanceOfBeingErnest，請參閱我的更新。我想通過循環單元格來計算該列中每個字符的出現次數，然後製作一個直方圖 – vincent75

@ImportanceOfBeingErnest，請參閱更新2的實際示例 – vincent75

@ vincent75我想你不想要直方圖的唯一值，然後直方圖總是1. – suvy

採取從OP例子。

import pandas as pd 
words=["hello","good","morning","how","are","you"] 
df=pd.DataFrame(words,columns=['words']) 

pd.Series(list(df.words.str.cat())).value_counts().plot(kind="bar")

df.words.str.cat（）之後，人們還可以使用正則表達式過濾字符

來源

2017-06-12 13:44:35 suvy

熊貓系列有一個內置的直方圖功能。例如：

df['col'] = [1,1,1,2,3,4,4] 
df.col.hist()

將返回每個值出現的直方圖。但是，由於非數字值可能會導致出現錯誤，因此您還可以使用value_counts和plot(kind='bar')方法。

df.col.value_counts()

將返回系列與價值作爲指標算作一個值。

然後，你可以運行plot顯示直方圖：

df.col.value_counts().plot(kind = 'bar')

來源

2017-06-12 13:13:14 Dimgold

請參閱編輯2和1.我正在尋找一個直方圖來統計所有字符串中每個字符的出現次數 – vincent75

更新說明第一個問題（每個單元格的單個字符）已解決，所以我從那裏提供瞭解決方案。如果您在分割字符時遇到麻煩，請首先嚐試使用char分隔列，然後使用「pandas.melt」鏈接（https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt的.html） – Dimgold

使用映射函數在pandas列中繪製直方圖

回答

相關問題