2016-10-03 103 views
0

我想分析twitter數據。我已經下載了一些推文並將它們保存在一個.txt文件中。將推文保存到python字典

當我試圖提取從微博數據有用的信息,我沒能取得任何進展,因爲對於初學者和我一樣,似乎很難提取微博,位置等

,而谷歌上搜索,我發現如果我們將json轉換爲字典可以很容易地提取信息。

現在我想將我的JSON數據轉換爲python字典。我不知道如何繼續。

這裏是用來保存鳴叫

import tweepy 
import json 
import jsonpickle 

consumer_key = "*********" 
consumer_secret = "*******" 

access_token = "************" 
access_token_secret = "**********" 

auth = tweepy.AppAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_token_secret) 


# It make the Tweepy API call auto wait (sleep) when it hits the rate limit and continue upon expiry of the window. 
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) 

if (not api): 
    print ("Can't Authenticate") 
    sys.exit(-1) 


searchQuery = 'SomeHashtag' 
maxTweets = 10000000 # Some arbitrary large number 
tweetsPerQry = 100 
fName = 'file.txt' 

sinceId = None 
max_id = "Latest tweet ID" 

tweetCount = 0 
print("Downloading max {0} tweets".format(maxTweets)) 
with open(fName, 'a') as f: 

    while tweetCount < maxTweets: 
     try: 
      if (max_id <= 0): 
       if (not sinceId): 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry) 

       else: 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry, 
             since_id=sinceId) 
      else: 
       if (not sinceId): 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry, 
             max_id=str(max_id - 1)) 
       else: 
        new_tweets = api.search(q=searchQuery, lang ="en", count=tweetsPerQry, 
             max_id=str(max_id - 1), 
             since_id=sinceId) 

      if not new_tweets: 
       print("No more tweets found") 
       break 
      for tweet in new_tweets: 
       f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n') 

      tweetCount += len(new_tweets) 
      print("Downloaded {0} tweets".format(tweetCount)) 
      max_id = new_tweets[-1].id 
     except tweepy.TweepError as e: 
      # Just exit if any error 
      print("some error : " + str(e)) 
      break 

    print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName)) 
+1

您的.txt文件的外觀如何? –

+0

我編輯了你的問題的語法。請檢查它是否清楚。請添加請求的信息:txt文件的內容和足夠的代碼,以便測試。 –

回答

0

代碼看來你可以只用一行讀你的文件中的行,並使用jsonpickle.decode方法它unpickle:

tweets = [] 
with open(filename) as f: 
    for line in f: 
     tweets.append(jsonpickle.decode(line)) 

而且我認爲你可以繞過第三方庫:

import json 
with open(filename, 'w') as f: 
    for tweet in new_tweets: 
     f.write(json.dumps(tweet) + '\n') 

tweets = [] 
with open(filename) as f: 
    for line in f: 
     tweets.append(json.loads(line)) 
+0

當我試圖用json.dumps()而不是jsonpickle下載推文時,在錯誤窗口中我得到了一些tweet數據以及錯誤「不是JSON可序列化的」 – Khurshid