2016-03-05 131 views
2

下面是在.json文件中獲取的一組tweet的可視化分析代碼。解釋後,map()函數顯示一個錯誤。任何方式來解決它?解析json文件

import json 
import pandas as pd 
import matplotlib.pyplot as plt 


tweets_data_path = 'import_requests.txt' 

tweets_data = [] 
tweets_file = open(tweets_data_path, "r") 

for line in tweets_file: 
    try: 
    tweet = json.loads(line) 
    tweets_data.append(tweet) 
    except: 
     continue 

print(len(tweets_data)) 

tweets = pd.DataFrame() 

tweets['text'] = map(lambda tweet: tweet['text'], tweets_data) 

這些都是導致對「ValueError異常」消息我收到上面的代碼行:

Traceback (most recent call last): File "tweet_len.py", line 21, in tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1887, in setitem self._set_item(key, value)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1966, in _set_item
self._ensure_valid_index(value) File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1943, in _ensure_valid_index
raise ValueError('Cannot set a frame with no defined index ' ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

我使用Python3。

編輯:下面是收集的twitter數據(.json格式)的示例。

{ 
    "created_at": "Sat Mar 05 05:47:23 +0000 2016", 
    "id": 705993088574033920, 
    "id_str": "705993088574033920", 
    "text": "Tumi Inc. civil war: Staff manning US ceasefire hotline 'can't speak Arabic' #fakeheadlinebot #learntocode #makeatwitterbot #javascript", 
    "source": "\u003ca href=\"http://javascriptiseasy.com\" rel=\"nofollow\"\u003eJavaScript is Easy\u003c/a\u003e", 
    "truncated": false, 
    "in_reply_to_status_id": null, 
    "in_reply_to_status_id_str": null, 
    "in_reply_to_user_id": null, 
    "in_reply_to_user_id_str": null, 
    "in_reply_to_screen_name": null, 
    "user": { 
     "id": 4382400263, 
     "id_str": "4382400263", 
     "name": "JavaScript is Easy", 
     "screen_name": "javascriptisez", 
     "location": "Your Console", 
     "url": "http://javascriptiseasy.com", 
     "description": "Get learning!", 
     "protected": false, 
     "verified": false, 
     "followers_count": 167, 
     "friends_count": 68, 
     "listed_count": 212, 
     "favourites_count": 11, 
     "statuses_count": 55501, 
     "created_at": "Sat Dec 05 11:18:00 +0000 2015", 
     "utc_offset": null, 
     "time_zone": null, 
     "geo_enabled": false, 
     "lang": "en", 
     "contributors_enabled": false, 
     "is_translator": false, 
     "profile_background_color": "000000", 
     "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
     "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
     "profile_background_tile": false, 
     "profile_link_color": "FFCC4D", 
     "profile_sidebar_border_color": "000000", 
     "profile_sidebar_fill_color": "000000", 
     "profile_text_color": "000000", 
     "profile_use_background_image": false, 
     "profile_image_url": "http://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg", 
     "profile_image_url_https": "https://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg", 
     "profile_banner_url": "https://pbs.twimg.com/profile_banners/4382400263/1449314370", 
     "default_profile": false, 
     "default_profile_image": false, 
     "following": null, 
     "follow_request_sent": null, 
     "notifications": null 
    }, 
    "geo": null, 
    "coordinates": null, 
    "place": null, 
    "contributors": null, 
    "is_quote_status": false, 
    "retweet_count": 0, 
    "favorite_count": 0, 
    "entities": { 
     "hashtags": [{ 
      "text": "fakeheadlinebot", 
      "indices": [77, 93] 
     }, { 
      "text": "learntocode", 
      "indices": [94, 106] 
     }, { 
      "text": "makeatwitterbot", 
      "indices": [107, 123] 
     }, { 
      "text": "javascript", 
      "indices": [124, 135] 
     }], 
     "urls": [], 
     "user_mentions": [], 
     "symbols": [] 
    }, 
    "favorited": false, 
    "retweeted": false, 
    "filter_level": "low", 
    "lang": "en", 
    "timestamp_ms": "1457156843690" 
} 
+0

你可以添加'json'文件的樣本? – jezrael

+0

謝謝。但是這個'json'是無效的。請檢查它[jsonlint](http://jsonlint.com/)。 – jezrael

+0

我已更新它。 –

回答

2

我認爲你可以使用read_json

import pandas as pd 

df = pd.read_json('file.json') 
print df.head() 
     contributors coordinates   created_at entities \ 
contributors_enabled   NaN   NaN 2016-03-05 05:47:23  NaN 
created_at      NaN   NaN 2016-03-05 05:47:23  NaN 
default_profile     NaN   NaN 2016-03-05 05:47:23  NaN 
default_profile_image   NaN   NaN 2016-03-05 05:47:23  NaN 
description      NaN   NaN 2016-03-05 05:47:23  NaN 

         favorite_count favorited filter_level geo \ 
contributors_enabled    0  False   low NaN 
created_at       0  False   low NaN 
default_profile      0  False   low NaN 
default_profile_image    0  False   low NaN 
description       0  False   low NaN 

             id    id_str \ 
contributors_enabled 705993088574033920 705993088574033920 
created_at    705993088574033920 705993088574033920 
default_profile  705993088574033920 705993088574033920 
default_profile_image 705993088574033920 705993088574033920 
description   705993088574033920 705993088574033920 

            ...    is_quote_status lang \ 
contributors_enabled    ...       False en 
created_at       ...       False en 
default_profile      ...       False en 
default_profile_image    ...       False en 
description       ...       False en 

         place retweet_count retweeted \ 
contributors_enabled  NaN    0  False 
created_at    NaN    0  False 
default_profile   NaN    0  False 
default_profile_image NaN    0  False 
description    NaN    0  False 

                    source \ 
contributors_enabled <a href="http://javascriptiseasy.com" rel="nof... 
created_at    <a href="http://javascriptiseasy.com" rel="nof... 
default_profile  <a href="http://javascriptiseasy.com" rel="nof... 
default_profile_image <a href="http://javascriptiseasy.com" rel="nof... 
description   <a href="http://javascriptiseasy.com" rel="nof... 

                    text \ 
contributors_enabled Tumi Inc. civil war: Staff manning US ceasefir... 
created_at    Tumi Inc. civil war: Staff manning US ceasefir... 
default_profile  Tumi Inc. civil war: Staff manning US ceasefir... 
default_profile_image Tumi Inc. civil war: Staff manning US ceasefir... 
description   Tumi Inc. civil war: Staff manning US ceasefir... 

           timestamp_ms truncated \ 
contributors_enabled 2016-03-05 05:47:23.690  False 
created_at   2016-03-05 05:47:23.690  False 
default_profile  2016-03-05 05:47:23.690  False 
default_profile_image 2016-03-05 05:47:23.690  False 
description   2016-03-05 05:47:23.690  False 

               user 
contributors_enabled       False 
created_at    Sat Dec 05 11:18:00 +0000 2015 
default_profile         False 
default_profile_image       False 
description        Get learning! 

[5 rows x 25 columns] 
+0

我收到了一個'ValueError'錯誤信息upong運行上面的代碼片段。 –

+0

有趣的是,我仍然得到相同的錯誤 - ValueError:Trailing Data錯誤消息。 –

+0

嗯,是有效的原始'json'?如果是,可以分享這個'文件'嗎? – jezrael