2017-12-02 172 views
0

我正在使用Twitter搜索API,它返回字典的詞典。我的目標是從響應字典中的鍵列表中創建一個數據框。如何從Twitter Search API創建熊貓數據框?

API響應的例子在這裏:Example Response

我的狀態字典中的密鑰列表

keys = ["created_at", "text", "in_reply_to_screen_name", "source"] 

我想通過的狀態字典中返回的每個鍵值循環並把它們在以鍵爲列的數據框。

當前有代碼循環遍歷一個單獨的鍵,並分配給列表,然後追加到數據框,但想要一種方法一次執行多個鍵。當前代碼如下:

#w is the word to be queired 
w = 'keyword' 
#count of tweets to return 
count = 1000 

#API call 
query = twitter.search.tweets(q= w, count = count) 

def data_l2 (q, k1, k2): 

    data = [] 

    for results in q[k1]: 
     data.append(results[k2]) 

    return(data) 

screen_names = data_l3(query, "statuses", "user", "screen_name") 

data = {'screen_names':screen_names, 
     'tweets':tweets} 
frame=pd.DataFrame(data) 
frame 
+0

有幾個問題:是否Twitter的API返回JSON?你可以使用pd.read_json()嗎?你能修正你的函數調用的縮進嗎?乾杯! – Evan

+0

我無法閱讀使用pandas read_json函數。謝謝我糾正了縮進。您可以通過示例響應鏈接查看實際的API響應 –

回答

0

我會分享一個更通用的解決方案,因爲我正在使用Twitter API。比方說,你必須要在一個名爲my_ids列表獲取微博的ID的:

# Fetch tweets from the twitter API using the following loop: 
list_of_tweets = [] 
# Tweets that can't be found are saved in the list below: 
cant_find_tweets_for_those_ids = [] 
for each_id in my_ids: 
    try: 
     list_of_tweets.append(api.get_status(each_id)) 
    except Exception as e: 
     cant_find_tweets_for_those_ids.append(each_id) 

然後在此代碼塊中,我們分離,我們已經下載的每tweepy狀態對象的JSON一部分,我們添加所有到列表....

my_list_of_dicts = [] 
for each_json_tweet in list_of_tweets: 
    my_list_of_dicts.append(each_json_tweet._json) 

...我們寫這個列表到一個txt文件:

with open('tweet_json.txt', 'w') as file: 
     file.write(json.dumps(my_list_of_dicts, indent=4)) 

現在我們要創建一個從tweet_json.txt文件中的數據幀(I添加了一些按鍵那名相關的,我是工作在我的使用情況,但您可以添加,而不是特定的鍵):

my_demo_list = [] 
with open('tweet_json.txt', encoding='utf-8') as json_file: 
    all_data = json.load(json_file) 
    for each_dictionary in all_data: 
     tweet_id = each_dictionary['id'] 
     whole_tweet = each_dictionary['text'] 
     only_url = whole_tweet[whole_tweet.find('https'):] 
     favorite_count = each_dictionary['favorite_count'] 
     retweet_count = each_dictionary['retweet_count'] 
     created_at = each_dictionary['created_at'] 
     whole_source = each_dictionary['source'] 
     only_device = whole_source[whole_source.find('rel="nofollow">') + 15:-4] 
     source = only_device 
     retweeted_status = each_dictionary['retweeted_status'] = each_dictionary.get('retweeted_status', 'Original tweet') 
     if retweeted_status == 'Original tweet': 
      url = only_url 
     else: 
      retweeted_status = 'This is a retweet' 
      url = 'This is a retweet' 

     my_demo_list.append({'tweet_id': str(tweet_id), 
          'favorite_count': int(favorite_count), 
          'retweet_count': int(retweet_count), 
          'url': url, 
          'created_at': created_at, 
          'source': source, 
          'retweeted_status': retweeted_status, 
          }) 
     tweet_json = pd.DataFrame(my_demo_list, columns = ['tweet_id', 'favorite_count', 
                 'retweet_count', 'created_at', 
                 'source', 'retweeted_status', 'url'])