2017-05-15 34 views
0

在我的Python腳本中,我試圖顯示給定的一組流式傳輸tweet的geo_enabled值。在geo_enabledfalse的情況下,我想將其顯示爲falsetrue,否則我還想將placecountry標籤顯示爲null,如果相應的值未由人發短信填充。問題是我目前卡住了,因爲我的腳本一直在扔KeyErrorPython腳本正在爲推文中的geo_enabled字段返回空數據框

我的印象是KeyError生成爲geo_enabled的值爲false。任何解決方法。

我的Python腳本:

import time 
import json 
import pandas as pd 
import re 

#tweepy based modules 
import tweepy 
from tweepy import OAuthHandler 
from tweepy import Stream 
from tweepy.streaming import StreamListener 


#initializing authentication credentials 
consumer_key = '' 
consumer_secret = '' 
access_key = '' 
access_secret = '' 


#This is a basic listener that just prints received tweets to stdout. 
class StdOutListener(StreamListener) : 
    def __init__(self,time_limit) : 
     self.start_time = time.time() 
     self.limit = time_limit 
     self.saveFile = open('requests.json','a') 
     super(StdOutListener,self).__init__() 

    def on_data(self, data) : 
     if ((time.time() - self.start_time) < self.limit) : 
      self.saveFile.write(data) 
      self.saveFile.write('\n') 
      return True 
     else : 
      self.saveFile.close() 
      return False 

    def on_error(self, status) : 
     print(status) 

def getwords(string) : 
    return re.findall(r"[\w'#]+|[.,!?;]",string) 

if __name__ == '__main__' : 
    #This handles Twitter authetification and the connection to Twitter Streaming API 
    auth = OAuthHandler(consumer_key, consumer_secret) 
    auth.set_access_token(access_key, access_secret) 

    time_limit = input("Enter the time limit in minutes : ") 
    time_limit *= 60 

    stream = Stream(auth,listener = StdOutListener(time_limit)) 
    string = raw_input("Enter the list of keywords/hashtags to be compared : ") 

    keyword_list = getwords(string) 

    #This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby' 
    stream.filter(track = keyword_list) 

    tweets_data_path = 'requests.json' 

    tweets_data = [] 
    tweet_list = [] 

    tweets_file = open(tweets_data_path, "r") 

    for i, line in enumerate(tweets_file) : 
     if line.rstrip() : 
      tweet = json.loads(line) 
      tweet_list.append(tweet) 

    num_tweets_collected = len(tweet_list) 
    count = 0 

    #Creates a data frame structure 
    text_dump = open('text_dump.txt', 'w') 


    #Populating the location field of the data frame 

    #tweet_dataframe['location'] = map(lambda tweet : tweet['location'], tweet_list) 
    #print(tweet_dataframe['location']) 

    #index column for data frame 
    index_dataframe = [] 

    for i in range(0, num_tweets_collected) : 
     index_dataframe.append(i) 

    tweet_text = [tweet['text'].encode('utf-8') for tweet in tweet_list] 
    tweet_text_dataframe = pd.DataFrame(tweet_text, index = index_dataframe, columns = ['text']) 

    tweet_geolocation_dataframe = pd.DataFrame() 
    tweet_geolocation_dataframe['geo_enabled'] = map(lambda tweet: tweet['geo_enabled'] if tweet['geo_enabled'] != "false" else "false", tweet_list) 

    tweet_text_ = tweet_text_dataframe['text'] 

    print(tweet_geolocation_dataframe['geo_enabled']) 

輸出:

abhijeet-mohanty-2:Desktop SubrataMohanty$ python twitter_stream_dump.py 
Enter the time limit in minutes : 1 
Enter the list of keywords/hashtags to be compared : python ruby scala 
Traceback (most recent call last): 
    File "twitter_stream_dump.py", line 94, in <module> 
    tweet_geolocation_dataframe['geo_enabled'] = map(lambda tweet: tweet['geo_enabled'] if tweet['geo_enabled'] != "false" else "false", tweet_list) 
    File "twitter_stream_dump.py", line 94, in <lambda> 
    tweet_geolocation_dataframe['geo_enabled'] = map(lambda tweet: tweet['geo_enabled'] if tweet['geo_enabled'] != "false" else "false", tweet_list) 
KeyError: 'geo_enabled' 

編輯:

所以我做了以下修改我的Python腳本,而是返回數據幀。

我取代的以下行 -

tweet_geolocation_dataframe['geo_enabled'] = map(lambda tweet: tweet['geo_enabled'] if tweet['geo_enabled'] != "false" else "false", tweet_list) 

與下面的行:

for tweet in tweet_list : 
    if 'geo_enabled' in tweet : 
     tweet_geolocation_dataframe['geo_enabled'] = map(lambda tweet: tweet['geo_enabled'] if tweet['geo_enabled'] != "false" else "false", tweet_list) 
    else: 
     tweet_geolocation_dataframe['geo_enabled'] = False 

輸出:

abhijeet-mohanty-2:Desktop SubrataMohanty$ python twitter_stream_dump.py 
Enter the time limit in minutes : 1 
Enter the list of keywords/hashtags to be compared : python ruby scala 
Series([], Name: geo_enabled, dtype: bool) 

任何方式來解決數據幀的問題爲geo_enabled字段。

+0

您需要向我們展示你的程序失敗的堆棧跟蹤。 – BoarGules

+0

@BoarGules我已經把我得到的輸出。 –

+1

您得到KeyError不是因爲geo_enabled有一個錯誤的值但是tweet中不存在geo_enabled。 – Codeformer

回答

1

試試這個 -

if 'geo_enabled' in tweet: 
    tweet_geolocation_dataframe['geo_enabled'] = map(lambda tweet: tweet['geo_enabled'] if tweet['geo_enabled'] != "false" else "false", tweet_list) 
else: 
    tweet_geolocation_dataframe['geo_enabled'] = False 
+0

它返回一個空的數據框。 –

+0

你能幫忙嗎? –