2016-08-30 163 views
2

我開始甩使用含有特定的句子中的文件:編碼/爲Python的CSV和JSON文件解碼故障排除

with open(labelFile, "wb") as out: 
     json.dump(result, out,indent=4) 

的JSON中這句話是這樣的:

"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .", 

我再繼續加載它通過:

with open(sys.argv[1]) as sentenceFile: 
    sentenceFile = json.loads(sentenceFile.read()) 

處理它,然後把它寫出來一個CSV使用:

with open(sys.argv[2], 'wb') as csvfile: 
    fieldnames = ['x','y','z' 
        ] 
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames) 
    writer.writeheader() 
    for sentence in sentence2locations2values: 
     sentence = unicode(sentence['parsedSentence']).encode("utf-8") 
     writer.writerow({'x': sentence}) 

這在Excel中打開的Mac CSV文件所做的一句話:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' . 

我接着就藉此從Excel的Mac電腦谷歌表,它是:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' . 

注意,略有不同,Â已取代Ã

,然後標記它,把它背到Excel的Mac此時它變成了回:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' . 

如何開始在CSV閱讀,包含類似一句話:

-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' . 

到一個值,該值是:

"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating 45,000 per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in Hong Kong are granted a `` one way permit '' .", 

爲了使它與本問題開始時原始的json轉儲相匹配?

編輯

我從這次檢查,看到的​​到Ã編碼,在谷歌表格式,其實就是拉美8

編輯

我跑enca並看到原始轉儲的文件是7位ASCII字符,並且我的CSV是unicode。所以我需要加載爲unicode並轉換爲7位ASCII碼?

+0

閱讀它作爲一個正常的文件,而不是使用CSV類應該做的竅門 –

+0

你可以發佈一個解決方案或例子? –

回答

1

我想出瞭解決方案。解決方案是從原始格式(標識爲UTF-8)解碼CSV文件,然後句子變成原始格式。所以:

csvfile = open(sys.argv[1], 'r') 

fieldnames = ("x","y","z") 
reader = csv.DictReader(csvfile, fieldnames) 
next(reader) 

for i,row in enumerate(reader): 
    row['x'] = row['x'].decode("utf-8") 

所發生的很奇怪的事情是,當我編輯在Excel中爲Mac CSV文件並保存,每次似乎轉換爲不同的編碼。我警告其他用戶,因爲這是一個非常頭痛的問題。