2012-04-05 61 views

回答

1

我不知道你是否仍然有興趣在回答這個,但是從純粹的蟒蛇站在我這是怎麼存儲原始鳴叫JSON:

import tweetstream # Needed For Twitter API Capture (Make sure using modified version with proxy support) 
import argparse # Needed for taking cmd line input 
import gzip  # Needed for compressing output 
import json  # Needed for Data conversion for easier DB import 
import ast   # Also Needed for Data conversion 

collector = argparse.ArgumentParser(description='Collect a lot of Tweets')  # This line sets up the argument collector 
collector.add_argument('--username', dest='username', action="store")    # This line collects the Username 
collector.add_argument('--password', dest='password', action="store")    # This line collects the password 
collector.add_argument('--outputfilename', dest='outputfilename', action="store") # This line collects the output filename 

args = collector.parse_args()              # Setup args to store cmd line arguments 

def printusername():                # define the username argument 

     print args.username 

def printpassword():                # define the password argument 

     print args.password 

def printoutputfilename():              # define the output filename 

     print args.outputfilename 

output=gzip.open(args.outputfilename, "a")          # Open the output file for GZIP writing 

with tweetstream.TweetStream(args.username, args.password) as stream:    # Open the Twitter Stream 
    for tweet in stream:               # For each tweet within the twitter stream 
     line = str(tweet)               # turn the tweet into a string 
     line = ast.literal_eval(line)            # evaluate the python string (dictionary) 
     line = json.dumps(line)             # turn the python dictionary into valid JSON 
     output.write(line)              # write the line to the output file 
     output.write("\n") 

運行這只是:「蟒的MyScript .py --username yourusername --password yourpassword --outputfilename yourpathandfilename「

您需要安裝tweetstream argparse gzip json和ast模塊。所有這些都可以通過pip或easy_install或大多數ubuntu/fedora軟件包管理器進行安裝。

腳本將創建的輸出文件是一個簡單的gzip壓縮文本文件,其中每行都是包含完整推文json對象的新json字符串。由於腳本一直運行到達到速率限制,它不會使用合適的EOF關閉gzip文件。不過,python並不在乎,所以你可以用另一個腳本打開它,7zip或winrar也不會。

我希望有幫助。 :)