2016-11-25 51 views
0

我正在嘗試編寫一些csv數據,但我一直在csv文件中的每個單詞後面都得到轉義序列鍵。Python csv編寫器在每個單詞後附加轉義字符

設置:

with open('gibber.csv', 'wb') as csvfile: 
    writer = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_NONE, escapechar=" ") 
    for values in izip_longest(*csv_data, fillvalue="-,-"): 
     writer.writerow([unicode(s).encode("utf-8") for s in values]) 
csvfile.close() 

如果我打印出writer.writerow(...)如上,下面的線是SAMPE。

['dipey,1', 'you have,2', 'at the beginning,1', 'brilliant charles brown truly,1', 'great the first also was,1', 'identical to this one as far,1', 'be when pie mood mark lake a,1', 'shardely uptown is you free on a stone,1', 'let it rest and sun it those it super,1'] 

我試過很多thingsthis和幾乎每一件事情我可以搜索一下,爲什麼CSV作家的每一個字後加轉義序列?

我的期望輸出應該是這樣的

-------------------------------------------------------------------------- 
word1 | word_count1 | word2 | word_count2 | .. wordN | word_countN 
-------------------------------------------------------------------------- 
word |  3  | word word |  7  | .............. N 

而是我使用一個空格作爲我escapechar讓這樣的事情

[] = escapecharacter 
-------------------------------------------------------------------------- 
word1 | word_count1 | word2 | word_count2 | .. wordN | word_countN 
-------------------------------------------------------------------------- 
word[]|  3  |word[] word[]| 7  | .............. N 

然後我的每一個字後獲得額外的空間。使用製表符或換行符會破壞行/列布局。使用任何單個字母,數字或甚至\會將該漸變放在任何行項目的最右側,但雙空格將消失。

我上面張貼的樣本列表的是,我傳遞給writer.writerow(...)

測試數據

data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8") 

data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Many Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8") 

data2 = unicode("Tropical forests cover many a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8") 

樣品csv_data看到完整的數據here 進口pprint PP的列表的例子= pprint.PrettyPrinter(indent = 4) pp.pprint(csv_data)

[ [ u'shrubs,1', 
     u'chile,1', 
     u'equatorial,1', 
     u'china,1', 
     u'may,1', 
     u'zone7,1'], 
    [ u'washington oregon,1', 
     u'new zealand10,1', 
     u'moist forests,1', 
     u'biotic species,1', 
     u'and tropic,1', 
     u'term jungle,1', 
     u'sometimes misnamed,1', 
     u'japan and,1', 
     u'the world,1', 
     u'200 cm,1', 
     u'between the,1', 
     u'canopy is,1', 
     u'as hawaii,1', 
     u'and temperate,1', 
     u'many australia,1', 
     u'but temperate,1'], 
    [ u'cancer and tropic,1', 
     u'black sea including,1', 
     u'asia in southern,1', 
     u'some areas of,1', 
     u'also known as,1', 
     u'as well as,1', 
     u'areas of a,1', 
     u'central america eg,1', 
     u'250 and 450,1'], 
    [ u'rainforest the monsoon trough,1', 
     u'shrubs and small trees,1',u'dense tangled growth of,1', 
     u'of the british isles,1'], 
    [ u'sometimes misnamed oxygen production4 processing,1', 
     u'a significant role in creating,1', 
     and,1', 
     u'are also responsible for 28 of the worlds oxygen,1', 
     u'the climatic conditions necessary for the earths tropical rainforests,1', 
     u'growth of vines shrubs and small trees called a,1', 
     u'columbia washington oregon and california in europe parts of,1']] 

你可以從上面的示例數據中看到,然後我izip csv_data轉置它,並寫出每一行。

編輯

這是我怎麼寫,我想在一個行的數據。

csv_data = [] 
    for index, item in enumerate(package.count_set[0]): 
     payload = [] 
     phrase = item[0] 
     for pindex, pitem in enumerate(phrase): #pitem is a Counter 
      # print(index, pindex, " ".join(pitem), phrase[pitem]) 
      _str = " ".join(pitem) 
      _cnt = phrase[pitem] 
      _data = _str+",%d"%(_cnt) 
      payload.append(_data) 
     csv_data.append(payload) 

,所以我創建的項目列表這樣 [ "word,count,", "word1,count1,", "word2,count2,", "wordN,countN," ]

我也試過沒有尾隨逗號 [ "word,count", "word1,count1", "word2,count2", "wordN,countN" ]

難道是我創建這個列表​​然後追加方式它到csv_data列表的問題?

+1

什麼是一些示例輸入?什麼是預期的輸出?實際產出是多少? –

+0

@MarkTolonen我編輯了一些更多的信息。爲什麼space escapechar在每個單詞的右側放置一個空格,使得輸出間隔爲雙倍,例如:而不是「hello word」,它將寫入「hello word」 – user1610950

+0

這似乎是代碼中的語法錯誤('writer。 writerow([unicode(s).encode(「utf-8」)for s in values])values])')。另外,請提供您的意見(特別是什麼是'csvdata'?) –

回答

0

我不喜歡典型地回答我自己的問題,但我通過自己建立字符串並寫入文件來解決問題。

_range = files_to_load + 1 
with open('data.csv', 'wb') as csvfile: 
    header = (["%d word phrase, phrase count"%(i) for i in range(1, _range)]) 

    header_line = "" 
    for index, item in enumerate(header): 
     word, count = item.split(",") 
     if int(word[0]) <= 1: 
      pass 
     else: 
      word = word.replace("phrase", "phrases") 

     header_line += word+","+count+"," 
    header_line = header_line[:-1] 
    header_line += "\n" 
    csvfile.write(header_line) 

    for values in izip_longest(*csv_data, fillvalue="-,0"): 
     line_list = ([unicode(s).encode("utf-8") for s in values]) 
     line_str = "" 
     for item in line_list: 
      word, count = item.split(",") 
      line_str += word+","+count+"," 
     line_str = line_str[:-1]+"\n" 

     csvfile.write(line_str) 
csvfile.close() 

上面的代碼很可能被清理了很多,但無論我做什麼,我不能讓蟒蛇CSV模塊與我的數據正常工作。

這是最有可能的用戶錯誤和我的一些疏忽,但仍然。上面的代碼寫出了我需要的csv格式,沒有任何奇怪的工件。

相關問題