0
我有一個火花數據幀,這裏是架構:摺疊wrappedarray元素融入不同的列
|-- eid: long (nullable = true)
|-- age: long (nullable = true)
|-- sex: long (nullable = true)
|-- father: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: long (containsNull = true)
和行的樣本:.
df.select(df['father']).show()
+--------------------+
| father|
+--------------------+
|[WrappedArray(-17...|
|[WrappedArray(-11...|
|[WrappedArray(13,...|
+--------------------+
和類型是
DataFrame[father: array<array<bigint>>]
我想是摺疊father
列例如說,如果13
是這陣中的一員,創建新列,並返回1
,否則返回0
這是我嘗試的第一件事:
def modify_values(r):
if 13 in r:
return 1
else:
return 0
my_udf = udf(modify_values, IntegerType())
df.withColumn("new_col",my_udf(df['father'].getItem(0))).show()
並且它返回這個錯誤:
Py4JJavaError: An error occurred while calling o817.showString.
TypeError: argument of type 'NoneType' is not iterable
,然後我這個嘗試之一:
df.withColumn("new_col", F.when(1 in df["father"].getItem(0), 1).otherwise(0))
和抱怨是:
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.