2017-06-20 105 views
0

我有詳細的訓練數據的6個不同的CSV文件如下:如何合併具有不同條目但同一列的文件夾中的不同csv文件?

1 chefmozaccepts.csv 
Instances: 1314 
Attributes: 2 
placeID: Nominal 
Rpayment: Nominal, 12 [cash,VISA,MasterCard-Eurocard,American_Express,bank_debit_cards,checks,Discover,Carte_Blanche,Diners_Club,Visa,Japan_Credit_Bureau,gift_certificates] 
%--- 
2 chefmozcuisine.csv 
Instances: 916 
Attributes: 2 
placeID: Nominal 
Rcuisine: Nominal, 59 [Afghan,African,American,Armenian,Asian,Bagels,Bakery,Bar,Bar_Pub_Brewery,Barbecue,Brazilian,Breakfast-Brunch,Burgers,Cafe-Coffee_Shop,   Cafeteria,California,Caribbean,Chinese,Contemporary,Continental-European,Deli-Sandwiches,Dessert-Ice_Cream,Diner,Dutch-Belgian,Eastern_European,Ethiopian,Family,Fast_Food,Fine_Dining,French,,Game,German,Greek,Hot_Dogs,   International,Italian,Japanese,Juice,Korean,Latin_American,Mediterranean,Mexican,Mongolian,Organic-Healthy,Persian,   Pizzeria,Polish,Regional,Seafood,Soup,Southern,Southwestern,Spanish,Steaks,Sushi,Thai,Turkish,Vegetarian,Vietnamese] 
%--- 
3 chefmozhours4.csv 
Instances: 2339 
Attributes: 3 
placeID: Nominal 
hours: Nominal, Range:00:00-23:30 
days:Nominal, 7 [Mon;Tue;Wed;Thu;Fri;Sat;Sun] 
%--- 
4 chefmozparking.csv 
Instances: 702 
Attributes: 2 
placeID: Nominal 
parking_lot:Nominal, 7[public,none,yes,valet_parking,free,street,validated_parking] 
%--- 
5 geoplaces2.csv 
Instances: 130 
Attributes: 21 
placeID: Nominal 
latitude: Numeric 
longitude: Numeric 
the_geom_meter: Nominal (Geospatial) 
name: Nominal 
address: Nominal,Missing: 27 
city: Nominal, Missing: 18 
state: Nominal, Missing: 18 
country: Nominal, Missing: 28 
fax: Numeric, Missing: 130 
zip: Nominal,Missing: 74 
alcohol: Nominal, Values: 3 [No_Alcohol_Served,Wine_Beer,Full_Bar] 
%--- 
6 rating_final.csv 
Instances: 1161 
Attributes: 5 
userID: Nominal 
placeID: Nominal 
rating: Numeric, 3 [0,1,2] 
food_rating: Numeric, 3 [0,1,2] 
service_rating: Numeric, 3 [0,1,2] 
%--- 
%--- 
7 usercuisine.csv 
Instances: 330 
Attributes: 2 
userID: Nominal 
Rcuisine: Nominal, 103 

正如你可以看到我有一個共同的列PlaceID,但許多情況是在每個文件不同的。

我需要將所有csv文件合併到一個以placeID作爲唯一基礎的最終csv中。但對於具有更多實例的文件,我想分割數據,以便最終所有列均勻填充,並且可以爲實例不均勻的那些行復制剩餘的元數據。

樣本輸入:

文件1:

placeID Rpayment 
135110 cash 
135110 VISA 
135110 MasterCard-Eurocard 
135110 American_Express 
135110 bank_debit_cards 
135109 cash 
135107 cash 
135107 VISA 
135107 MasterCard-Eurocard 
135107 American_Express 
135107 bank_debit_cards 
135106 cash 
135106 VISA 
135106 MasterCard-Eurocard 
135105 cash 

文件2

placeID Rcuisine 
135110 Spanish 
135109 Italian 
135107 Latin_American 
135106 Mexican 
135105 Fast_Food 
135104 Mexican 
135103 Burgers 
135103 Dessert-Ice_Cream 
135103 Fast_Food 
135103 Hot_Dogs 

文件3

placeID hours   days 
135110 08:00-19:00; Mon;Tue;Wed;Thu;Fri; 
135110 00:00-00:00; Sat; 
135110 00:00-00:00; Sun; 
135109 08:00-21:00; Mon;Tue;Wed;Thu;Fri; 
135109 08:00-21:00; Sat; 
135109 08:00-21:00; Sun; 
135108 00:00-23:30; Mon;Tue;Wed;Thu;Fri; 

文件4

placeID parking_lot 
135110 public 
135109 none 
135108 none 
135107 none 
135106 none 
135105 none 

文件5

placeID latitude longitude name address city state country fax zip alcohol smoking_area dress_code accessibility price url Rambience franchise area other_services 
135109 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet 
135107 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135106 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none 

樣本輸出:

placeID payment Cuisine parking_lot hours days latitude longitude name address city state country fax zip alcohol smoking_area dress_code accessibility price url ambience franchise area other_services 
135110 cash Spanish public 08:00-19:00; Mon;Tue;Wed;Thu;Fri;                    
135110 VISA Spanish public 00:00-00:00; Sat;                    
135110 MasterCard-Eurocard Spanish public 00:00-00:00; Sun;                    
135110 American_Express Spanish public 08:00-19:00; Mon;Tue;Wed;Thu;Fri;                    
135110 bank_debit_cards Spanish public 00:00-00:00; Sat;                    
135110 bank_debit_cards Spanish public 00:00-00:00; Sun;                    
135109 cash Italian none 08:00-21:00; Mon;Tue;Wed;Thu;Fri; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet 
135109 cash Italian none 08:00-21:00; Sat; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet 
135109 cash Italian none 08:00-21:00; Sun; 18.9217848 -99.2353499 Paniroles ? ? ? ? ? ? Wine-Beer not permitted informal no_accessibility medium ? quiet f closed Internet 
135107 cash Latin_American none 07:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135107 VISA Latin_American none 07:00-23:30; Sat; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135107 MasterCard-Eurocard Latin_American none 07:00-23:30; Sun; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135107 American_Express Latin_American none 07:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135107 bank_debit_cards Latin_American none 07:00-23:30; Sat; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135107 MasterCard-Eurocard Latin_American none 07:00-23:30; Sun; 22.1362534 -100.9335852 Potzocalli Carretera Central Sn San Luis Potosi ? ? ? ? No_Alcohol_Served none informal completely low ? familiar f closed none 
135106 cash Mexican none 18:00-23:30; Mon;Tue;Wed;Thu;Fri; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none 
135106 VISA Mexican none 18:00-23:30; Sat; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none 
135106 MasterCard-Eurocard Mexican none 18:00-21:00; Sun; 22.1497088 -100.9760928 El Rincón de San Francisco Universidad 169 San Luis Potosi San Luis Potosi Mexico ? 78000 Wine-Beer only at bar informal partially medium ? familiar f open none 

excel screenshot

我知道這是一個繁瑣的任務,但幫助將不勝感激。我正在嘗試使用熊貓。不是csvreader。

回答

1

試着這麼做:

import pandas as pd 

df_out = pd.read_csv('file1.csv') 

for f in ('file2.csv','file3.csv','file4.csv','file4.csv','file5.csv'): 
    df_out = df_out.merge(pd.read_csv(f),how='inner',on='placeID') 

df_out.to_csv('output.csv') 
+0

這部分的工作,有一些數據文件5人失蹤,比如說,placeID 135110,因此合併後,我希望它保持空白,但仍然合併。您的代碼可以正常工作,但會忽略那些缺少數據的行。所以最終的輸出沒有placeID 135110及其相關數據...... – lightyagami96

+0

將how ='inner''改爲'how ='left'',它會按照你所說的 – brunormoreira

+0

感謝很多人。 – lightyagami96

相關問題