0
我想將一個DataFrame分區到Windows中,然後爲每個列和窗口使用Spark的Python接口使用自定義函數(udf)。這似乎並沒有工作Pyspark爲窗口定義UDF
w = Window().partitionBy(["col"]).rowsBetween(-sys.maxsize, sys.maxsize)
def median_polish(rows, cols, values):
// shape values as matrix defined by rows/cols
// compute median polish
// cast matrix back to vector
return values
med_pol_udf = func.udf(median_polish, DoubleType())
for x in df.columns:
if x.startswith("some string"):
df = df.withColumn(x, med_pol_udf("rows", "cols", x).over(w))
這可能嗎?或者需要在Scala中做到這一點?