以6年爲單位的組數據

我有一個csv文件，其中包含從2006年1月1日至2011年1月1日開始的6年數據，我需要按照6年的每個月分組數據。這裏是我的csv文件概述：以6年爲單位的組數據

timestamp,heure,lat,lon,impact,type 
2006-01-01 00:00:00,13:58:43,33.837,-9.205,10.3,1 
2006-01-02 00:00:00,00:07:28,34.5293,-10.2384,17.7,1 
2007-02-01 00:00:00,23:01:03,35.0617,-1.435,-17.1,2 
2007-02-02 00:00:00,01:14:29,36.5685,0.9043,36.8,1 
2008-01-01 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
2008-01-02 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1 
.... 
2011-12-31 00:00:00,05:03:51,34.1919,-12.5061,-48.9,1

和這裏的期望輸出：

month 01 10 (counts of the columns) 
month 02 20 
..... 
month 12 30

任何想法？

來源

2017-06-04 Marie Antoinette

所以，你希望12排，總結了所有Januarys，Februarys，... 12月份？ – piRSquared

是的，這就是爲什麼我要 –

考慮樣本數據幀df

np.random.seed([3,1415]) 

tidx = pd.date_range('2006-01-01', '2011-01-01', name='Date') 

df = pd.DataFrame(dict(
     heure=pd.to_timedelta(np.random.randint(24*60*60, size=len(tidx))), 
     lat=np.random.rand(len(tidx)) * 10 + 30, 
     lon=np.random.rand(len(tidx)) * 10 - 20, 
     impact=np.random.rand(len(tidx)), 
     type=np.random.randint(3, size=len(tidx)) 
    ), tidx) 

df.head() 

        heure impact  lat  lon type 
Date                
2006-01-01 00:00:00.000037 0.312643 39.324254 -14.715073  1 
2006-01-02 00:00:00.000019 0.121450 30.560726 -10.879014  0 
2006-01-03 00:00:00.000060 0.080082 38.489212 -11.899611  1 
2006-01-04 00:00:00.000021 0.270159 34.832683 -14.924849  0 
2006-01-05 00:00:00.000066 0.112194 32.193704 -19.083123  0

使用到組由

df.groupby(df.index.month).size() 

Date 
1  156 
2  141 
3  155 
4  150 
5  155 
6  150 
7  155 
8  155 
9  150 
10 155 
11 150 
12 155 
dtype: int64

你可以做所有的事情你通常通過......做一組下面是一個例子使用描述

df.groupby(df.index.strftime('%B')).impact.describe() 

     count  mean  std  min  25%  50%  75%  max 
Date                    
1  156.0 0.529216 0.279498 0.003298 0.292654 0.538437 0.774256 0.998507 
2  141.0 0.501540 0.295111 0.001063 0.243723 0.491919 0.727560 0.999231 
3  155.0 0.516168 0.306878 0.001178 0.227668 0.556316 0.783676 0.997126 
4  150.0 0.472035 0.263685 0.004031 0.246738 0.491169 0.665894 0.987965 
5  155.0 0.523897 0.320709 0.003486 0.221323 0.538594 0.841909 0.994280 
6  150.0 0.542496 0.297215 0.003550 0.273098 0.589802 0.807086 0.995538 
7  155.0 0.513857 0.285404 0.000933 0.285383 0.519170 0.746735 0.999551 
8  155.0 0.516404 0.284407 0.004662 0.288900 0.545429 0.739392 0.996601 
9  150.0 0.490965 0.299312 0.011958 0.206851 0.487708 0.737785 0.993217 
10 155.0 0.513743 0.304779 0.010712 0.199390 0.563746 0.796143 0.995488 
11 150.0 0.465428 0.271936 0.006345 0.221753 0.470793 0.684867 0.995886 
12 155.0 0.498415 0.301704 0.004538 0.215730 0.471139 0.757360 0.997268

來源

2017-06-04 22:58:20 piRSquared

非常感謝你，現在它正在工作....魔術：D –

@MarieAntoinette歡迎你...只要記住保持你的頭:-)對不起，我忍不住自己。 – piRSquared

它會工作：

df["month"]=df["timestamp"].dt.month 
df.groupby(["month"].size()

來源

2017-06-04 22:59:55 FdMon

是的，我試了一下，它的工作......非常感謝你 –

我主要使用resample來做到這一點。

這裏是我的樣品：

import numpy as np 
import pandas as pd 
index = pd.date_range('2017/1/1', '2017/10/1') 
df = pd.DataFrame(np.ones((274, 1)), index) 
df 
      0 
2017-01-01 1.0 
2017-01-02 1.0 
...   ... 
2017-09-29 1.0 
2017-09-30 1.0 
2017-10-01 1.0 

df.resample('M').count() # use resample to agg data 
2017-01-31 31 
2017-02-28 28 
2017-03-31 31 
2017-04-30 30 
2017-05-31 31 
2017-06-30 30 
2017-07-31 31 
2017-08-31 31 
2017-09-30 30 
2017-10-31 1

來源

2017-06-04 23:05:28

非常感謝你，這真的很有幫助 –

以6年爲單位的組數據

回答

相關問題