直接從CSV文件計算統計信息

我有一個CSV格式的事務日誌文件，我想用它來運行統計信息。日誌具有以下字段：直接從CSV文件計算統計信息

 
date: Time/date stamp 
salesperson: The username of the person who closed the sale 
promo: sum total of items in the sale that were promotions. 
amount: grand total of the sale

我希望得到以下數據：

 
salesperson: The username of the salesperson being analyzed. 
minAmount: The smallest grand total of this salesperson's transaction. 
avgAmount: The mean grand total.. 
maxAmount: The largest grand total.. 
minPromo: The smallest promo amount by the salesperson. 
avgPromo: The mean promo amount...

我很想建立一個數據庫結構，導入此文件，寫SQL，和拉統計數據。除了這些統計數據，我不需要這些數據。有更容易的方法嗎？我希望一些bash腳本可以使這一點變得簡單。

來源

2010-04-16 User1

問題的哪一部分是您遇到問題？統計數據本身？數據結構？解析？對於任何您熟悉的腳本語言來說，這似乎不是一個非常難的問題。 – Kena 2010-04-16 19:30:58

*擁抱PowerShell * ... – Joey 2010-04-16 19:42:00

也可以敲出一個awk腳本來做到這一點。這只是帶有幾個變量的CSV。

來源

2010-04-16 19:27:51 AlG

Awk很自然 - 甚至有開始/結束，所以你可以初始化，然後計算平均值很容易 – 2010-04-16 19:38:01

非常有趣。我如何使用awk做「GROUP BY銷售人員」？ – User1 2010-04-16 20:17:25

@ User1：使用關聯數組或asort（）或asorti（）。 – 2010-04-16 20:36:43

您可以遍歷CSV中的行並使用bash腳本變量來保存最小/最大金額。對於平均水平，只要保持一個運行總數，然後除以總行數（不包括可能的標題）。

這裏有一些useful snippets用於在bash中處理CSV文件。

如果您的數據可能被引用（例如，因爲一個字段包含逗號），使用bash，sed等進行處理變得更加複雜。

來源

2010-04-16 19:28:03

TxtSushi做到這一點：

tssql -table trans transactions.csv \ 
'select 
    salesperson, 
    min(as_real(amount)) as minAmount, 
    avg(as_real(amount)) as avgAmount, 
    max(as_real(amount)) as maxAmount, 
    min(as_real(promo)) as minPromo, 
    avg(as_real(promo)) as avgPromo 
from trans 
group by salesperson'

我有一大堆的example scripts說明如何使用它。

編輯：固定語法

來源

2010-04-17 14:56:03 Keith

+1這看起來好多了。下次我不得不下一次。順便說一句：你是一個土豚嗎？ – User1 2010-04-19 15:00:32

不，先生，我是一個糖滑翔機 – Keith 2010-04-19 17:10:41

這就是我所害怕的......我希望你的主人不會因爲有你在他家（如果你在美國）而被搗毀。我聽說你們「特別聰明」，但是，哇，回答是非常重要的。那麼，糖是否讓你如此聰明？ – User1 2010-04-20 14:23:02

直接從CSV文件計算統計信息

回答

相關問題