0
我想在Pyspark中定義一個UseDefinedFunction來處理數據幀的實例值,以便基於這些值生成新的屬性。如何通過數據框使用UserDefinedFunction來解決「Method __getnewargs __([])不存在」的錯誤?
我有這樣的代碼:
# I have a global attribute named 'global_dataframe' which is a dataframe containing some interesant instances.
from pyspark.sql.functions import UserDefinedFunction
def method1(instance, list_attribute_names):
if(instance['Att1'] != '-'):
return instance['Att1']
else:
i = 0
result = "-"
query_SQL = ""
while(i < len(list_attribute_names)):
atr_imp = lista_2[i]
query_SQL = query_SQL + atr_imp + " = '" + instance[atr_imp] + "',"
i = i + 1
query_SQL = query_SQL[:-1]
# Here I filter the global_dataframe to get the results which are interesting according to the query generated before with the values of the instance passed to the method as a parameter
result_df = global_dataframe.filter(query_SQL)
if(result_df.head() != None):# If dataframe is empty
result = "None"
else:
result = query_SQL
return result
def method0(df, important_attributes):
udf_func = UserDefinedFunction(lambda instance: method1(instance, important_attributes), StringType())
column = udf_func(df)
df = df.withColumnRenamed("Att1", column)
return df
當我執行這一行:
example = method0(dataframe_example, attribute_list_example)
我得到了一個錯誤:
y4JError: An error occurred while calling o710.__getnewargs__. Trace:
py4j.Py4JException: Method __getnewargs__([]) does not exist
at
py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at
py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
這段代碼的想法是通過數據框執行method0並根據其屬性值獲取另一列。
我該如何解決這個錯誤?
所以,你建議傳遞一個參數is_empty來確保數據幀是否爲空? – jartymcfly
那麼,是否有機會將lambda函數中的數據框實例傳遞給UDF,並在該函數中檢查該實例的不同屬性的值? – jartymcfly
不,沒有辦法將'DataFrame'傳遞給UDF。 UDF只能在單個數據點上運行,而不能在整個數據集上運行。關鍵是你不能在執行程序執行的函數中調用'DataFrame'或'RDD'上的函數。 – timchap