2017-10-13 99 views
1

數據刮錯誤的日期時間從網站拾起大熊貓

enter image description here

所以我刮數據和當它被刮掉的時間戳。正如你可以看到我有2017年9月14日13時56分28秒和2017年9月16日14時43分05秒之間沒有日期,但是當我使用下面的代碼是刮:

path ='law_scraped' 
files = glob.glob(path + "/*.csv") 

frame = pd.DataFrame() 

for f in files: 
    df = pd.read_csv(f) 

    df['dtScraped'] = df['dtScraped'].str.replace("|", " ") 

    try: 
     df['dtScraped'] = pd.to_datetime(df['dtScraped'], format = "%Y/%m/%d %H:%M:%S") 
    except Exception as e: 
     df['dtScraped'] = pd.to_datetime(df['dtScraped']) 

    frame = pd.concat([frame, df], ignore_index=True) 

我得到日期時間不匹配的數據,你可以看到如下:

+-----------+---------------------+-------+-------------------+ 
|   |  dtScraped | odds | team    | 
+-----------+---------------------+-------+-------------------+ 
|  15117 | 2017-09-14 14:00:00 | 7.75 | Feyenoord   | 
|  15118 | 2017-09-14 14:00:00 | 1.446 | Manchester City | 
|  15119 | 2017-09-14 14:00:00 | 5.01 | Draw    | 
|  15120 | 2017-09-14 14:00:00 | 4.73 | NK Maribor  | 
|  15121 | 2017-09-14 14:00:00 | 1.869 | Spartak Moscow | 
|  15122 | 2017-09-14 14:00:00 | 3.65 | Draw    | 
|  15123 | 2017-09-14 14:00:00 | 1.694 | Liverpool   | 
|  15124 | 2017-09-14 14:00:00 | 5.16 | Sevilla   | 
|  15125 | 2017-09-14 14:00:00 | 4.25 | Draw    | 
|  15126 | 2017-09-14 14:00:00 | 3.53 | Shakhtar Donetsk | 
|  15127 | 2017-09-14 14:00:00 | 2.19 | Napoli   | 
|  15128 | 2017-09-14 14:00:00 | 3.58 | Draw    | 
|  15129 | 2017-09-14 14:00:00 | 2.15 | RB Leipzig  | 
|  15130 | 2017-09-14 14:00:00 | 3.5 | AS Monaco   | 
|  15131 | 2017-09-14 14:00:00 | 3.73 | Draw    | 
|  15132 | 2017-09-14 14:00:00 | 1.044 | Real Madrid  | 
|  15133 | 2017-09-14 14:00:00 | 34.68 | APOEL Nicosia  | 
|  15134 | 2017-09-14 14:00:00 | 23.04 | Draw    | 
|  15135 | 2017-09-14 14:00:00 | 2.33 | Tottenham Hotspur | 
|  15136 | 2017-09-14 14:00:00 | 3.12 | Borussia Dortmund | 
|  15137 | 2017-09-14 14:00:00 | 3.69 | Draw    | 
|  15138 | 2017-09-14 14:00:00 | 1.52 | FC Porto   | 
|  15139 | 2017-09-14 14:00:00 | 7.63 | Besiktas JK  | 
|  15140 | 2017-09-14 14:00:00 | 4.32 | Draw    | 
| 144009 | 2017-09-14 14:00:00 | 7.75 | Feyenoord   | 
| 144010 | 2017-09-14 14:00:00 | 1.446 | Manchester City | 
| 144011 | 2017-09-14 14:00:00 | 5.01 | Draw    | 
| 144012 | 2017-09-14 14:00:00 | 4.609 | NK Maribor  | 
| 144013 | 2017-09-14 14:00:00 | 1.892 | Spartak Moscow | 
| 144014 | 2017-09-14 14:00:00 | 3.64 | Draw    | 
| 144015 | 2017-09-14 14:00:00 | 1.694 | Liverpool   | 
| 144016 | 2017-09-14 14:00:00 | 5.16 | Sevilla   | 
| 144017 | 2017-09-14 14:00:00 | 4.25 | Draw    | 
| 144018 | 2017-09-14 14:00:00 | 3.53 | Shakhtar Donetsk | 
| 144019 | 2017-09-14 14:00:00 | 2.19 | Napoli   | 
| 144020 | 2017-09-14 14:00:00 | 3.58 | Draw    | 
| 144021 | 2017-09-14 14:00:00 | 2.15 | RB Leipzig  | 
| 144022 | 2017-09-14 14:00:00 | 3.5 | AS Monaco   | 
| 144023 | 2017-09-14 14:00:00 | 3.73 | Draw    | 
| 144024 | 2017-09-14 14:00:00 | 1.044 | Real Madrid  | 
| 144025 | 2017-09-14 14:00:00 | 34.68 | APOEL Nicosia  | 
| 144026 | 2017-09-14 14:00:00 | 23.04 | Draw    | 
| 144027 | 2017-09-14 14:00:00 | 2.33 | Tottenham Hotspur | 
| 144028 | 2017-09-14 14:00:00 | 3.12 | Borussia Dortmund | 
| 144029 | 2017-09-14 14:00:00 | 3.69 | Draw    | 
| 144030 | 2017-09-14 14:00:00 | 1.52 | FC Porto   | 
| 144031 | 2017-09-14 14:00:00 | 7.63 | Besiktas JK  | 
| 144032 | 2017-09-14 14:00:00 | 4.32 | Draw    | 
+-----------+---------------------+-------+-------------------+ 
+1

你r格式的字符串與您的數據不符! –

回答

0

假設您的時間戳具有相同的格式,在你的屏幕截圖的文件名,這應該(通過" "更換"|"後)工作:

df['dtScraped'] = pd.to_datetime(df['dtScraped'], format="%Y-%m-%d %H-%M-%S")