2
我想要使用列表過濾pyspark中的數據幀。我想要根據列表進行過濾,或者只包含那些列表中有值的記錄。我下面的代碼不起作用:pyspark數據框過濾器或包括基於列表
# define a dataframe
rdd = sc.parallelize([(0,1), (0,1), (0,2), (1,2), (1,10), (1,20), (3,18), (3,18), (3,18)])
df = sqlContext.createDataFrame(rdd, ["id", "score"])
# define a list of scores
l = [10,18,20]
# filter out records by scores by list l
records = df.filter(df.score in l)
# expected: (0,1), (0,1), (0,2), (1,2)
# include only records with these scores in list l
records = df.where(df.score in l)
# expected: (1,10), (1,20), (3,18), (3,18), (3,18)
提供了以下錯誤: ValueError異常:無法轉換成列布爾:請用「&」爲「和」,「|」爲'或','〜'爲'不'時構建DataFrame布爾表達式。