2017-04-22 53 views
1

我正在嘗試編寫自定義日誌解析器。 日誌文件如下:我現在面臨解析和提取日誌文件中兩個時間對象之間的值

 09:57:25Host_Name Trace      00000             
     <MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
     some string --- 
     SQ-> 
    09:57:25Host_Name Trace      00000             
    <MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
     some string --- 
     D--> 
     SQ-> 
    09:57:28Host_Name Trace      00000             
    <MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcd 
     some string --- 
     D--> 
     SQ-> 
    09:58:28Host_Name Trace      00000             
    <MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcd 
     some string --- 
     D--> 
     SQ-> 

The goal is to have json output in following format 
[{'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg 
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
    some string --- 
    D--> 
    SQ->'}, {'host_name': host_name, 'time': '2017-04-13T09:58:28.1393344+00:00', 'msg 
: '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcd 
    some string --- 
    D--> 
    SQ->'}] 

問題是讓兩個時間目標和時間之間的價值。

下面我想:

jsonlist = [] 
jsonout = {} 
li = [i.strip().split() for i in open(filepath).readlines()] 
start_index, end_index=0,0 
msg = '' 
with open(filepath, 'r') as f: 
    for index, line in enumerate(f): 
    if start_index !=0 and end_index!=0: 
     result = list(itertools.chain.from_iterable(li[start_index: end_index])) 
      msg = ''.join(str(x) for x in result) 
      jsonoutput['message'] = msg.replace('"', '\\').strip() 
      jsonoutput['time'] = msg. 
      start_index, end_index = 0,0 
    try: 
     if start_index !=0: 
     if parser(line.split()[0].split('Host_Name')[0]): 
      end_index = index 
     else: 
      start_index = index 

我不能夠獲得時間價值和正確的味精。在做什麼更好的辦法任何建議將是非常有益的

回答

2

我寫我自己的代碼:根據您所提供的數據,在VAR final看起來像

import json 
import re 


def logs(file_path): 
    """ 
    :param file_path: path to your log file, example: /home/user/my_file.log 
    """ 
    msg = '' 
    final = [] 

    our_log = open(file_path, 'r') 
    log_lines = our_log.readlines() 

    for line in log_lines: 
     time = re.search("^[\d]+:[\d]+:[\d]+", line) 

     if time: 
      if msg: 
       final[-1].update(msg=msg) 
       msg = '' 

      time = time.group(0) 
      host_name = re.search(time + '(.*)' + ' Trace', line).group(1) 

      # If you need the time like "09:57:25", instead of "'2017-04-13T09:57:25.1393344+00:00" 
      # then uncomment the line below 
      # info = dict(time=time, host_name=host_name) 

      # and comment the one below 
      info = dict(host_name=host_name) 

      final.append(info) 

     else: 
      # and also comment the next 3 lines 
      if 'Time="' in line: 
       time = re.search('Time="' + '(.*)' + '"', line).group(1) 
       final[-1].update(time=time) 
      msg += line.strip() 

    final[-1].update(msg=msg) # adds message for the last time-section 

    json_out = json.dumps(final) 

[{'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:25.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:25.1393344+00:00', 'host_name': 'Host_Name'}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:57:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:57:28.1393344+00:00', 'host_name': 'Host_Name '}, {'msg': '<MessageLogTraceRecord Time="2017-04-13T09:58:28.1393344+00:00" abcdsome string ---D-->SQ->', 'time': '2017-04-13T09:58:28.1393344+00:00', 'host_name': 'Host_Name '}] 
相關問題