（編輯：我原來的問題是張貼在這裏，但問題已解決，下面的代碼是正確的）。我正在尋找如何將Unicode字符轉換爲土耳其字符的建議。以下代碼（在線發佈）爲個別用戶刮取推文並輸出csv文件，但土耳其文字符以Unicode字符形式出現，即\ xc4。我在Mac上使用Python 3。Unicode字符到土耳其字符

import sys 

default_encoding = 'utf-8' 
if sys.getdefaultencoding() != default_encoding: 
    reload(sys) 
    sys.setdefaultencoding(default_encoding) 

import tweepy #https://github.com/tweepy/tweepy 
import csv 
import string 
import print 

#Twitter API credentials 
consumer_key = "" 
consumer_secret = "" 
access_key = "" 
access_secret = "" 

def get_all_tweets(screen_name): 
#Twitter only allows access to a users most recent 3240 tweets with this method 

#authorize twitter, initialize tweepy 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_key, access_secret) 
api = tweepy.API(auth) 

#initialize a list to hold all the tweepy Tweets 
alltweets = [] 

#make initial request for most recent tweets (200 is the maximum allowed count) 
new_tweets = api.user_timeline(screen_name = screen_name,count=200) 

#save most recent tweets 
alltweets.extend(new_tweets) 

#save the id of the oldest tweet less one 
oldest = alltweets[-1].id - 1 

#keep grabbing tweets until there are no tweets left to grab 
while len(new_tweets) > 0: 
    #print "getting tweets before %s" % (oldest) 

    #all subsiquent requests use the max_id param to prevent duplicates 
    new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest) 

    #save most recent tweets 
    alltweets.extend(new_tweets) 

    #update the id of the oldest tweet less one 
    oldest = alltweets[-1].id - 1

改造tweepy鳴叫到一個二維數組將填充CSV

outtweets = [[tweet.id_str, tweet.created_at, tweet.text)] for tweet in alltweets]

寫CSV

with open('%s_tweets.csv', 'w', newline='', encoding='utf-8-sig') as f: 
    writer = csv.writer(f) 
    writer.writerow(["id","created_at","text"]) 
    writer.writerows(outtweets) 

pass 

if __name__ == '__main__':

傳中您想要下載

的帳戶的用戶名

get_all_tweets("")

來源

2016-09-12 bayrah

如果*不*編碼tweet.text，會發生什麼？ –

@MarkRansom如果我只輸入「tweet.text」而不是「tweet.text.encode（」utf-8「）我得到以下錯誤：」UnicodeEncodeError：'ascii'編解碼器無法編碼字符'\ xd6'in位置55：序號不在範圍內（128）「 – bayrah

'setdefaultencoding（）'是[不推薦]（https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code /）。 –

csv module docs建議您在打開文件時指定編碼。（並且您還使用newline=''，以便CSV模塊可以自行處理換行符）。在寫行時不要編碼Unicode字符串。

import csv 

with open('test.csv', 'w', newline='', encoding='utf-8') as f: 
    writer = csv.writer(f) 
    writer.writerow(['id','created_at','text']) 
    writer.writerows([[123, 456, 'Äβç']])

來源

2016-09-12 23:05:42 roeland

明白了，謝謝。現在，當我打開文件時，我必須將它作爲utf-8文件導入，當我在Excel中打開它時，我假設我會找出一個方法來做到這一點。所以我不必每次都這樣做，另外，當我導入數據如下時，出於某種原因，我在Python中設置的列不再存在（例如，id，created_at和text都是一列）。這是修改後的代碼： – bayrah

我已經編輯了上面的代碼，如果有人有任何進一步的建議，請告訴我（關於設置導入環境和de與列聯繫）。我不能使用逗號作爲分隔符，因爲tweets中有逗號。 – bayrah

@bayrah然後看看其餘的文檔。 CSV導入設置（分隔符等）必須與您的腳本編寫CSV文件的方式相匹配。 – roeland

Unicode字符到土耳其字符

改造tweepy鳴叫到一個二維數組將填充CSV

寫CSV

傳中您想要下載

回答

相關問題