2017-08-01 70 views
0

我有根據https://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./遍歷多個Twitter搜索查詢REST API

問題拉鳴叫工作REST API搜索腳本:此代碼的工作,但拉帶searchQuery1searchQuery2鳴叫。 (例如用Prostate Cancer + Colon Cancer推文)。我不想要這個。相反,我想獲得來自searchQuery1(僅包含Prostate Cancer的推文)和來自searchQuery2的所有推文(僅包含Colon Cancer的推文)的所有推文。查詢應單獨運行。

目標:按順序循環過的搜索查詢的X號(例如searchQuery1searchQuery2等)

謝謝!

searchQuery1 = 'Prostate Cancer' 
searchQuery2 = 'Colon Cancer' 


maxTweets = 10000 
tweetsPerQry = 100 
fprefix = 'REST' 
sinceId = None 
max_id = -1L 


tweetCount = 0 
with open('/Users/eer/Desktop/' + fprefix + '.' + time.strftime('%Y-%m-%d_%H-%M-%S') + '.json', 'a+') as f: #open file 
    while tweetCount < maxTweets: 
     try: 

      if (max_id <= 0): 
       if (not sinceId): 
        for x,y in zip(searchQuery1,searchQuery2): 
         new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry) 
       else: 
        print "sinceID 1" 
        new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry, 
              since_id=sinceId) 

      else: 
       if (not sinceId): 
        print "not sinceID 2" 
        new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry, 
              max_id=str(max_id - 1)) 
       else: 
        print "sinceID 1" 
        new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry, 
              max_id=str(max_id - 1), 
              since_id=sinceId) 
      if not new_tweets: 
       print("No more tweets found") 
       break     

      for tweet in new_tweets: 
       f.write(jsonpickle.encode(tweet._json, unpicklable=False) + 
         '\n') 


      tweetCount += len(new_tweets) 
      max_id = new_tweets[-1].id 

     except tweepy.TweepError as e: 
      print("some error : " + str(e)) 
      break 

print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fprefix)) 
+0

你要想要獲得所有的鳴叫在含有上週searchQuery1不包含searchQuery2,然後讓所有的鳴叫,在過去一週包含searchQuery2不包含searchQuery1? – Jonas

回答

0
searchQuery = ['Prostate Cancer', 'Colon Cancer'] 
i = 0 


maxTweets = 1000 
tweetsPerQry = 100 
fprefix = 'REST' 
language = ['en'] 

sinceId = None 
max_id = -1L 

tweetCount = 0 
print("Downloading max {0} tweets".format(maxTweets)) 
with open('/Users/eer/Desktop/' + fprefix + '.' + time.strftime('%Y-%m-%d_%H-%M-%S') + '.json', 'a+') as f: 
    while tweetCount < maxTweets: 
     try: 
      if (max_id <= 0): 
       if (not sinceId): 

        for search in searchQuery: 
         new_tweets = api.search(q=searchQuery[i], count=tweetsPerQry, languages=language) 

       else: 
        for search in searchQuery: 
         new_tweets = api.search(q=searchQuery[i], count=tweetsPerQry, 
              since_id=sinceId, languages=language) 

      else: 
        print "not sinceID 2" 
        for search in searchQuery: 
         new_tweets = api.search(q=searchQuery[i], count=tweetsPerQry, 
              max_id=str(max_id - 1),languages=language) 
       else: 

        for search in searchQuery: 
         new_tweets = api.search(q=searchQuery[i], count=tweetsPerQry, 
              max_id=str(max_id - 1), 
              since_id=sinceId, languages=language) 
      if not new_tweets: 
       print("No more tweets found; checking next query") 
       i = i + 1 

       try: 
        for search in searchQuery: 
         new_tweets = api.search(q=searchQuery[i], count=tweetsPerQry, languages=language) 
       except IndexError: 
        break 

      for tweet in new_tweets:   
       f.write(jsonpickle.encode(tweet._json, unpicklable=False) + 
         '\n') 

      tweetCount += len(new_tweets) 
      print("Downloaded {0} tweets".format(tweetCount)) 
      max_id = new_tweets[-1].id 

     except tweepy.TweepError as e: 
      print("some error : " + str(e)) 
      break 

print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fprefix)) 
+0

'searchQuery = ['前列腺癌','結腸癌'],'i = 0','搜索searchQuery:','q = searchQuery [i]'是相關的新代碼段。此外,在'if not new_tweets'下:'還有一些從'i = i + 1'開始的新代碼行幫助跟蹤新的搜索查詢(例如'Colon Cancer'),一旦從第一個查詢「前列腺」癌症「已經完成 –

0

我會將您的查詢更改爲'"Prostate Cancer" OR "Colon Cancer"'並存儲結果。然後命令他們以後如何。這聽起來像你想的僞代碼如下:

tweets_with_Prostate_Cancer = [] 
tweets_with_Colon_Cancer = [] 

for each tweet in the result set: 
    if tweet contains "Prostate Cancer" and does not contain "Colon Cancer": 
     tweets_with_Prostate_Cancer.Add(tweet) 
    if tweet contains "Colon Cancer" and does not contain "Prostate Cancer": 
     tweets_with_Color_Cancer.Add(tweet) 

final_results = Concatenate(tweets_with_Prostate_Cancer, tweets_with_Colon_Cancer)