2017-03-01 100 views
1

我能做些什麼來防止大熊貓從轉換我的字符串值浮動。列Billing Doc.Sales Order含有要被存儲在MySQL表具有CHAR的數據類型(15)的柱內數10-11位數字。當我執行下面的腳本時,我在每個數字的末尾看到.0。我想在數據庫中將它們視爲字符串/字符。 的Billing Doc. field包含3206790137, 3209056079, 3209763880, 3209763885, 3206790137誰是存儲在DB作爲3206790137.0, 3209056079.0, 3209763880.0, 3209763885.0, 3206790137.0號碼。數據庫中Billing doc的列數據類型爲CHAR(15)大熊貓自動轉換我的字符串列浮動

def insert_billing(df): 
     df = df.where((pd.notnull(df)), None) 
     for row in df.to_dict(orient="records"): 
      bill_item = row['Bill.Item'] 
      bill_qty = row['Billed Qty'] 
      bill_doct_date = row['Billi.Doc.Date'] 
      bill_doc = row['Billing Doc.'] 
      bill_net_value = row['Billi.Net Value'] 
      sales_order = row['Sales Order'] 
      import_date = DT.datetime.now().strftime('%Y-%m-%d') 


      query = "INSERT INTO sap_billing(" \ 
        "bill_item, " \ 
        "bill_qty, " \ 
        "bill_doc_date, " \ 
        "bill_doc, " \ 
        "bill_net_value, " \ 
        "sales_order, " \ 
        "import_date" \ 
        ") VALUES (" \ 
        "\"{}\", \"{}\", \"{}\", \"{}\"," \ 
        "\"{}\", \"{}\", \"{}\"" \ 
        ") ON DUPLICATE KEY UPDATE " \ 
        "bill_qty = VALUES(bill_qty), " \ 
        "bill_doc_date = VALUES(bill_doc_date), " \ 
        "bill_net_value = VALUES(bill_net_value), " \ 
        "import_date = VALUES(import_date) " \ 
        "".format(
         bill_item, 
         bill_qty, 
         bill_doct_date, 
         bill_doc, 
         bill_net_value, 
         sales_order, 
         import_date 
         ) 
      query = query.replace('\"None\"', 'NULL') 
      query = query.replace('(None', '(NULL') 
      query = query.replace('\"NaT\"', 'NULL') 
      query = query.replace('(NaT', '(NULL') 

      try: 
       q1 = gesdb_connection.execute(query) 
      except Exception as e: 
       print(bill_item, bill_doc, sales_order, e) 



    if __name__ == "__main__": 
     engine_str = 'mysql+mysqlconnector://root:[email protected]/mydb' 

     file_name = "tmp/dataload/so_tracking.XLSX" 
     df = pd.read_excel(file_name) 
     if df.shape[1] == 35 and compare_columns(list(df.columns.values)) == 1: 
      insert_billing(df) 
     else: 
      print("Incorrect column count, column order or column headers.\n") 

當我創建一個簡單的df並打印它時,問題不顯示。

import pandas as pd 
df = pd.DataFrame({'Sales Order': [1217252835, 1217988754, 1219068439], 
        'Billing Doc.': [3222102723, 3209781889, 3214305818]}) 
    >>> df 
    Billing Doc. Sales Order 
0 3222102723 1217252835 
1 3209781889 1217988754 
2 3214305818 1219068439 

但是,當我通過excel讀取然後打印它時,該列讀取爲float64。

file_name = "tmp/dataload/so_tracking.XLSX" 
    df = pd.read_excel(file_name) 
    print(df['Billing Doc.']) 

680 3.252170e+09 
681 3.252170e+09 
682 3.252170e+09 
683 3.252170e+09 
684 3.252170e+09 
685 3.252170e+09 
686 3.252170e+09 
687 3.252170e+09 
688 3.252170e+09 
689 3.252170e+09 
690 3.252170e+09 
. 
. 
. 
694 3.251601e+09 
695 3.251631e+09 
696 3.252013e+09 
697    NaN 
698 3.252272e+09 
699 3.252360e+09 
700 3.252474e+09 
. 
. 
Name: Billing Doc., dtype: float64 
+2

你能不能提煉,這歸因於重複的例子?沒有其他人可以訪問您的數據庫或電子表格。所以任何幫助的嘗試都只是猜測。 –

+0

熊貓較真可能不喜歡這種快速修復,但我用'pd.read_csv(「FILE.CSV」,D型細胞=對象)'和它保持大熊貓從數字轉換爲浮點數。我相當肯定你可以用其他DataFrame創建函數替換'read_csv()'。 – pshep123

+0

@PaulH我添加了一個示例。 – nomad

回答

0

我找到了解決辦法我自己,張貼在這裏來記錄它。

df = pd.read_excel(file_name, converters={'Billing Doc.' : str}) 
print(df['Billing Doc.']) 

695 3251631331 
696 3252012614 
697   NaN 
698 3252272451 
699 3252359504 
700 3252473894 
701   NaN 
702   NaN 
703   NaN 
704 3252652940 
705   NaN 
706   NaN 
707   NaN 
708   NaN 
Name: Billing Doc., dtype: object 
-1

試試這個:

df = df.astype(str) 

注意,這是非常無效

或每個值轉換爲int之前將它們插入到查詢