0
我有一個簡單的數據框,有兩列。自動多處理數據幀列上的「函數應用」
+---------+-------+ | subject | score |
+---------+-------+ | wow | 0 |
+---------+-------+ | cool | 0 |
+---------+-------+ | hey | 0 |
+---------+-------+ | there | 0 |
+---------+-------+ | come on | 0 |
+---------+-------+ | welcome | 0 |
+---------+-------+
對於「主題」列中的每個記錄,我打電話的功能和更新列「分數」的結果:
df['score'] = df['subject'].apply(find_score)
Here find_score is a function, which processes strings and returns a score :
def find_score (row):
# Imports the Google Cloud client library
from google.cloud import language
# Instantiates a client
language_client = language.Client()
import re
pre_text = re.sub('<[^>]*>', '', row)
text = re.sub(r'[^\w]', ' ', pre_text)
document = language_client.document_from_text(text)
# Detects the sentiment of the text
sentiment = document.analyze_sentiment().sentiment
print("Sentiment score - %f " % sentiment.score)
return sentiment.score
這是預期,但它很慢,因爲它處理工作正常一一記錄。
有沒有辦法,這可以平行嗎?無需手動將數據幀分成更小的塊?有沒有任何圖書館可以自動執行此操作?
乾杯
你可以顯示你的find_score func的def嗎? – Allen
考慮使用dask – Boud
@Allen我已經添加了函數def的問題 – gnanagurus