2017-01-16 58 views
0

我從JSON響應中爬取數據。使用for循環和所有我將數據提取到項目中,是重寫此循環所做的所有以前記錄的最後一條記錄。Scrapy - 使用for循環附加項目時獲取重複項目

這裏是我的代碼:

def parse_centers_and_ambulances(self, response): 
    json_response = json.loads(response.body_as_unicode()) 
    facility = MedFacilityItem() 
    facility["name"] = "Med Facility #1" 
    centers = [] 
    med_centers = MedCenterItem() 
    for center in json_response: 
     if center["name"].startswith("Center"): 
     med_centers["response_url"] = center["product_id"] 
     med_centers["name"] = center["name"] 
     med_centers["address"] = center["name_short"] + "." +  
               center["adr_name"] + " " + 
               center["adr_dom"] 
     med_centers["lat"] = center["latitude"] 
     med_centers["lon"] = center["longitude"] 
     med_centers["phoneInfo"] = [{"number": center["tel1"], 
            "description": center["tel1_descr"]}, 
            {"number": center["tel2"], 
            "description": center["tel2_descr"]}] 
     centers.append(med_centers) 

    facility["facility_type"] = centers 
    return facility 

什麼,我缺少什麼?

回答

1

由於Scrapy項目基本上像dicts一樣行事,我將在下面的例子中使用dicts。試想一下:

In [1]: dict_list = [] 
    ...: d = {} 
    ...: for i in range(3): 
    ...:  d['i'] = i 
    ...:  dict_list.append(d) 
    ...: print dict_list 
    ...: print [id(e) for e in dict_list] 
    ...: 
[{'i': 2}, {'i': 2}, {'i': 2}] 
[4557722520, 4557722520, 4557722520] 

日文N3 N4 N5是可變的對象,在這種情況下,你是在同字典例如多次追加到列表中。結果列表不包含不同的項目,只有幾個對同一個dict對象的引用。下面的例子顯示了相同的行爲,三次追加相同的字典到列表,然後設定一個值:

In [2]: dict_list = [] 
    ...: d = {} 
    ...: for i in range(3): 
    ...:  dict_list.append(d) 
    ...: d['some'] = 'value' 
    ...: print dict_list 
    ...: 
[{'some': 'value'}, {'some': 'value'}, {'some': 'value'}] 

什麼,你需要做的就是通過初始化它們的創建不同類型的字典for循環,如下所示:

In [3]: dict_list = [] 
    ...: for i in range(3): 
    ...:  d = {} 
    ...:  d['i'] = i 
    ...:  dict_list.append(d) 
    ...: print dict_list 
    ...: print [id(e) for e in dict_list] 
    ...: 
[{'i': 0}, {'i': 1}, {'i': 2}] 
[4557901904, 4557724760, 4557843264] 
1

您可以嘗試在循環內部定義項目,而不是在其外部。

def parse_centers_and_ambulances(self, response): 
    json_response = json.loads(response.body_as_unicode()) 
    facility = MedFacilityItem() 
    facility["name"] = "Med Facility #1" 
    centers = [] 
    # med_centers = MedCenterItem() <-- this 
    for center in json_response: 
     if center["name"].startswith("Center"): 
     med_centers = MedCenterItem() <-- should be here 
     med_centers["response_url"] = center["product_id"] 
     med_centers["name"] = center["name"] 
     med_centers["address"] = center["name_short"] + "." +  
               center["adr_name"] + " " + 
               center["adr_dom"] 
     med_centers["lat"] = center["latitude"] 
     med_centers["lon"] = center["longitude"] 
     med_centers["phoneInfo"] = [{"number": center["tel1"], 
            "description": center["tel1_descr"]}, 
            {"number": center["tel2"], 
            "description": center["tel2_descr"]}] 
     centers.append(med_centers) 

    facility["facility_type"] = centers 
    return facility