2017-07-04 65 views
0

我希望把數據幀列表,我的代碼,轉換多個列表到數據幀蟒蛇

webpage_urls = ["https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=departmentofagriculturefisheriesandforestry&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&organization=commonwealthscientificandindustrialresearchorganisation&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&organization=bureauofmeteorology&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&organization=tasmanianmuseumandartgallery&_groups_limit=0", 
       "https://data.gov.au/dataset?q=&organization=department-of-industry&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc&_organization_limit=0&groups=sciences&_groups_limit=0"] 

    for i in webpage_urls: 
     wiki2 = i 
     page= urllib.request.urlopen(wiki2) 

     soup = BeautifulSoup(page) 

     # fetching organisations 

     data3 = soup.find_all('li', class_="nav-item active") 

     lobbying1 = [] 
     for element in data3: 
      lobbying1.append(element.span.get_text()) 
     print(lobbying1) 

     df = pd.DataFrame({'Organisation':lobbying1}) 

我上面的代碼是給輸出:

['Reserve Bank of Aus... (24)', 'Business Support an... (24)'] 
['Department of Finance (16)', 'Business Support an... (16)'] 
['Department of Agric... (13)', 'Business Support an... (13)']...so on 

這是多個列表,而不是嵌套之一,我得到的數據幀如下:

Organisation 
0 Australian Charitie... (1) 
1 Business Support an... (1) 

我想輸出作爲兩列第一個元素的列在列列1和列2的第二個元素列表,並且我想要所有條目:

Organisation   Groups 
Australian Cha...  Business Support and... 

幫我在這裏。

回答

1

我認爲你需要添加[]list of lists然後用DataFrame構造函數:所以,你可以通過簡單地調用pd.Dataframe如下得到兩列數據幀

df = pd.DataFrame([lobbying1], columns=['Organization','Groups']) 
    print (df) 

        Organization  Groups 
0 Department of Agric... (35) Science (35) 
       Organization  Groups 
0 Commonwealth Scient... (8) Science (8) 
       Organization  Groups 
0 Bureau of Meteorology (4) Science (4) 
       Organization  Groups 
0 Tasmanian Museum an... (1) Science (1) 
       Organization  Groups 
0 Department of Indus... (1) Science (1) 

如果需要一個DataFrame所有數據追加lobbying1data列表,然後調用構造函數DataFrame退出循環:

data = [] 
for i in webpage_urls: 
    wiki2 = i 
    page= urllib.request.urlopen(wiki2) 

    soup = BeautifulSoup(page) 
    # fetching organisations 
    data3 = soup.find_all('li', class_="nav-item active") 

    lobbying1 = [] 
    for element in data3: 
     lobbying1.append(element.span.get_text()) 
    data.append(lobbying1) 

df = pd.DataFrame(data, columns=['Organization','Groups']) 
print (df) 
        Organization  Groups 
0 Department of Agric... (35) Science (35) 
1 Commonwealth Scient... (8) Science (8) 
2 Bureau of Meteorology (4) Science (4) 
3 Tasmanian Museum an... (1) Science (1) 
4 Department of Indus... (1) Science (1) 
+0

非常感謝。它完全符合我想要的。 – Arti123

+0

很高興能提供幫助。順便說一句,我非常喜歡澳大利亞;) – jezrael

+0

很高興知道,你喜歡澳大利亞:) – Arti123

0

您的列表lobbying1是列表的列表。

lobbying1 = [['Reserve Bank of Aus... (24)', 'Business Support an... (24)'], 
['Department of Finance (16)', 'Business Support an... (16)'], 
['Department of Agric... (13)', 'Business Support an... (13)']] 
df = pd.DataFrame(main_list, columns=['Organization','Groups']) 

你得到這個作爲輸出

>>> df.head() 
        Organization      Groups 
0 Reserve Bank of Aus... (24) Business Support an... (24) 
1 Department of Finance (16) Business Support an... (16) 
2 Department of Agric... (13) Business Support an... (13) 
>>>