2016-03-01 83 views
-1

我有兩個大的列表,每個列表大約有100 000個元素,一個大於另一個,我想要遍歷。我的循環如下所示:在指定索引處循環

for i in list1: 
    for j in list2: 
     function() 

此電流循環耗時過長。但是,list1是需要從list2進行檢查的列表,但是從某個索引開始,除list2之外沒有更多的實例。這意味着從索引循環可能會更快,但問題是我不知道如何去做。

在我的項目中,list2是一個有三個鍵的字典列表:value,nametimestamp。 list1是按順序排列的timestamp的列表。該功能是以timestamp爲基礎的value,並將其放入合適的name列中的csv文件中。

這是從列表1項的例子:

[1364310855.004000, 1364310855.005000, 1364310855.008000] 

這是列表2的樣子:

{"name":"vehicle_speed","value":2,"timestamp":1364310855.004000} 
{"name":"accelerator_pedal_position","value":4,"timestamp":1364310855.004000} 
{"name":"engine_speed","value":5,"timestamp":1364310855.005000} 
{"name":"torque_at_transmission","value":-3,"timestamp":1364310855.008000} 
{"name":"vehicle_speed","value":1,"timestamp":1364310855.008000} 

在我最後的csv文件,我應該有這樣的事情:

http://s000.tinyupload.com/?file_id=03563948671103920273

+2

我想這可能是值得說明你有什麼打算在你做「函數()」 - 也許你不具備做一個嵌套循環..你是什​​麼? nd目標?你想用這兩個列表做什麼? – MaxU

+2

準確描述list1中的值如何確定應用該函數的list2中值的範圍。 – martineau

+0

您是否正在討論從list2 list2 [0:2]切片# – sabbahillel

回答

2

如果你想這是快,你應該重組,你有list2中,以便在數據加速你查找:

# The following code converts list2 into a multivalue dictionary 

from collections import defaultdict 

list2_dict = defaultdict(list) 

for item in list2: 
    list2_dict[item['timestamp']].append((item['name'], item['value'])) 

這給你看看你的時間戳一個更快的方法:

使用 list2_dict
print(list2_dict) 

defaultdict(<type 'list'>, { 
    1364310855.008: [('torque_at_transmission', -3), ('vehicle_speed', 0)], 
    1364310855.005: [('engine_speed', 0)], 
    1364310855.004: [('vehicle_speed', 0), ('accelerator_pedal_position', 0)]}) 

查找將更加高效:

for i in list1: 
    for j in list2_dict[i]: 
     # here j is a tuple in the form (name, value) 
     function() 
+0

很好的回答。它解決了這個問題,但現在又出現了另一個問題,因爲訂單已經丟失。有沒有辦法按鍵排序defaultdict? – Reginsmal

+1

你可以使用排序字典:OrderedDict(sorted(d.items(),key = lambda t:t [0])) – purpletentacle

+0

我明白了。只是要知道,但我怎樣才能訪問價值本身?通過索引調用給我的價值和關鍵,但通過鍵調用給我一個錯誤 – Reginsmal

0

您似乎只想使用eleme在列表2中對應於i*2i*2+1的nts,即元素0,1和2,3 ...

您只需要一個循環。

for i in range(len(list1)): 
    j = list[i*2] 
    k = list2[j+1] 
    # Process function using j and k 

您將只處理到列表1的末尾。

0

我覺得pandas模塊是你的目標是絕配......

import ujson   # 'ujson' (Ultra fast JSON) is faster than the standard 'json' 
import pandas as pd 

filter_list = [1364310855.004000, 1364310855.005000, 1364310855.008000] 

def file2list(fn): 
    with open(fn) as f: 
     return [ujson.loads(line) for line in f] 

# Use pd.read_json('data.json') instead of pd.DataFrame(load_data('data.json')) 
# if you have a proper JSON file 
# 
# df = pd.read_json('data.json') 
df = pd.DataFrame(file2list('data.json')) 

# filter DataFrame with 'filter_list' 
df = df[df['timestamp'].isin(filter_list)] 

# convert UNIX timestamps to readable format 
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s') 

# pivot data frame 
# fill NaN's with zeroes 
df = df.pivot(index='timestamp', columns='name', values='value').fillna(0) 

# save data frame to CSV file 
df.to_csv('output.csv', sep=',') 

#pd.set_option('display.expand_frame_repr', False) 
#print(df) 

輸出。以csv

timestamp,accelerator_pedal_position,engine_speed,torque_at_transmission,vehicle_speed 
2013-03-26 15:14:15.004,4.0,0.0,0.0,2.0 
2013-03-26 15:14:15.005,0.0,5.0,0.0,0.0 
2013-03-26 15:14:15.008,0.0,0.0,-3.0,1.0 

PS我不知道你從哪兒得到[緯度,經度]列,但它是很容易的列添加到您的結果數據框 - 只是打電話df.to_csv()

前添加以下行
df.insert(0, 'latitude', 0) 
df.insert(1, 'longitude', 0) 

這將導致:

timestamp,latitude,longitude,accelerator_pedal_position,engine_speed,torque_at_transmission,vehicle_speed 
2013-03-26 15:14:15.004,0,0,4.0,0.0,0.0,2.0 
2013-03-26 15:14:15.005,0,0,0.0,5.0,0.0,0.0 
2013-03-26 15:14:15.008,0,0,0.0,0.0,-3.0,1.0