Python將某些文本放入大字符串的字典中

我試圖計算標記開啓和關閉之間的持續時間。Python將某些文本放入大字符串的字典中

下面是從一個字符串中兩行的例子：

01/01/2015 7:30:10 a.m. Tag off : 16 Address Ave  $1.00 $26.00 
01/01/2015 7:40:17 a.m. Tag on : 127 Address St   $27

此刻我要忽略地址並專注於計算持續時間。每行都有Tag Off信息和Tag On信息，我有大約60行（所以30對），它們都來自一個.txt文件。

從上面的例子來看，持續時間是10分7秒。

這裏是我的代碼：

def import_file(filename): 
    input_file = open(filename, 'r') 
    file_contents = input_file.read() 
    input_file.close() 

def strip(): 
    contents = import_file("data.txt") 

def duration_cal(): 
    pass

那麼，什麼是剝離所有不必要的信息，並有相關的時間和日期開啓或關閉成字典或列表的最佳方式？（爲了計算開和關之間持續時間）

來源

2015-04-05 user927584

* 1個多空間被向下降低到1個空間*把它放在代碼塊 – 2015-04-05 11:05:52

這些時間戳與10分鐘和7秒的時間差有什麼關係？ – 2015-04-05 11:09:51

現在已經修復。 – user927584 2015-04-05 11:12:07

到目前爲止，它看起來你沒有做過多少研究，你只打開一個文件，你甚至不這樣做的推薦方式，因爲你正在創建一個功能處理python有一個語言結構。

然後，您不從import_file()返回文件內容，因此strip()將始終將contents設置爲無。實際上，從設計的角度來看，你的功能並不是很有用。

一個更好的辦法來做到這一點，將是：

#!/usr/bin/env python3 

import os, sys 

def print_durations(durations): 
    # this is to print nicely the durations 
    pass 

def calculate_durations(contents): 
    # this is where the fun shall be, see implementation below 
    pass 

def main(): 
    if len(sys.argv) != 2: 
     print("Usage: {} filename".format(sys.argv[0])) 
     sys.exit(1) 
    if not os.path.isfile(sys.argv[1]): 
     print("Error: {} should be an existing file!".format(sys.argv[1])) 
     sys.exit(2) 
    with open(sys.argv[1], 'r') as f: 
     durations = calculate_durations(f.readlines()) 
     print_durations(durations) 

if __name__ == "__main__": 
    main()

下面是創建一個腳本，一個文件名作爲第一個參數的最簡單方法。如果你想要一個更好的CLI工具，你可能想試試docopt或argparse。

現在讓我們到最有趣的部分，即使你顯然未盡力實際嘗試實現算法，這是一個足夠的理由居然檢舉您的問題...但只是因爲它很有趣，這是我拿的它：

爲了獲得你的線的有趣的位，你可以彈出你的Python CLI，並拆分你的字符串來獲取相關的部分。如果是跨線保持一致，你並不需要去先進的東西像瘋了似的正則表達式這樣做：

>>> line = '01/01/2015 7:30:10 a.m. Tag off : 16 Address Ave  $1.00 $26.00' 
>>> line.split(' : ') 
['01/01/2015 7:30:10 a.m. Tag off','16 Address Ave  $1.00 $26.00'] 
>>> line.split(' : ')[0] 
'01/01/2015 7:30:10 a.m. Tag off' 
>>> line.split(' : ')[0].split(' Tag ') 
['01/01/2015 7:30:10 a.m.','off'] 
>>> timestr, status = line.split(' : ')[0].split(' Tag ') 
>>> print(status) 
off 
>>> print(timestr) 
01/01/2015 7:30:10 a.m.

現在你需要的時候轉換成一種時尚，使得它可以計算增量，但如Python不明白a.m.作爲AM/PM標記，你需要將其先轉換：

>>> timestr = timestr.replace('a.m.', 'AM') 
>>> import datetime 
>>> timestamp = datetime.datetime.strptime(timestr, "%d/%m/%Y %I:%M:%S %p") 
>>> timestamp 
datetime.datetime(2015, 1, 1, 7, 30, 10)

終於拿到兩個時間戳之間的增量，你只需要。減去日期：

>>> timestamp2 = datetime.datetime.strptime(line.split(' : ')[0].split(' Tag ')[0].replace('a.m.', 'AM'), "%d/%m/%Y %I:%M:%S %p") 
>>> timestamp2 - timestamp 
datetime.timedelta(0, 607) 
>>> print(timestamp2 - timestamp) 
0:10:07

你去了！這裏的功能在同一個：

import datetime 

def calculate_durations(contents): 
    last_stamp = None 
    durations = [] 
    for line in contents: 
     # extract time and status from the line 
     timestr, status = line.split(' : ')[0].split(' Tag ') 
     # fix a.m./p.m. to be AM/PM 
     timestr = timestr.replace('a.m.', 'AM').replace('p.m.', 'PM') 
     # load the time as a python timestamp 
     timestamp = datetime.datetime.strptime(timestr, "%d/%m/%Y %I:%M:%S %p") 
     # if this is the first timestamp, store the status, and consider the timestamp to be zero 
     if last_stamp is None: 
      durations.append((datetime.timedelta(0), status)) 
     # otherwise calculate the timestamp since last 
     else: 
      durations.append((timestamp-last_stamp, status)) 
     # save timestamp for next line 
     last_stamp = timestamp 
    return durations 

def print_durations(durations): 
    for stamp, status in durations: 
     print("{} for {}".format(status, stamp))

您可以複製，爲Python命令行中測試，這將輸出：

>>> contents = [ 
... '01/01/2015 7:30:10 a.m. Tag off : 16 Address Ave  $1.00 $26.00', 
... '01/01/2015 7:40:17 a.m. Tag on : 127 Address St   $27'] 
... 
>>> print_durations(calculate_durations(contents)) 
off for 0:00:00 
on for 0:10:07

或者運行一個腳本，如果你把它一起：

% python3 myscript.py myfile.log 
off for 0:00:00 
on for 0:10:07

HTH

來源

2015-04-05 12:03:30 zmo

感謝您的答案，但從.txt文件中提取內容的更好方式有點混亂，因爲我仍在學習如何編碼。 – user927584 2015-04-06 03:05:27

我得到這個錯誤：'timestr，status = line.split（'：'）[0] .split（'Tag'） ValueError：需要超過1個值來解包'當我運行56個元素的代碼。它似乎只適用於列表長度爲2 – user927584 2015-04-06 07:32:06

這意味着您公開的示例數據與您正在解析的數據不一致。我相信問題是我認爲你的日誌只有一次'Tag'。所以有可能將它改爲'timestr，status，_ = ...'，這樣它就會捕獲split中的所有其他部分，以便放入'_'中，這是忽略它們的一種方式。 – zmo 2015-04-06 09:19:40

Python將某些文本放入大字符串的字典中

回答

相關問題