2016-12-04 46 views
-1

計算的數據使用pyspark 2.0.1星火:添加兩列,並與來自其他列

我有這樣的數據幀填充它們

+-----------+----------+ 
| Longitude | Latitude | 
+-----------+----------+ 
| 1  | 3  | 
| 2  | 1  | 
| 2  | 3  | 
+-----------+----------+ 

我想補充效率兩列稱爲市,省對於每一行,使用列的值(經度和緯度)作爲我已經寫入的python函數的輸入返回城市和省。 所以輸出應該是這樣的

+-----------+----------+--------+-------- 
    | Longitude | Latitude | City | Province 
    +-----------+----------+--------+-------- 
    | 1  | 3  | London| London 
    | 2  | 1  | Paris | Paris 
    | 2  | 3  | Dubai | Dubai 
    +-----------+----------+--------+-------- 

回答

0
from pyspark.sql.functions import udf 
from pyspark.sql.types import StringType 

def city(lat, long): your code 
def province(lat, long): your code 

cityUdf = udf(city, StringType()) 
provinceUdf = udf(province, StringType()) 

df2 = df.withColumn("city", cityUdf(df["Latitude"], df["Longitude"])) 
df3 = df2.withColumn("province", provinceUdf(df2["Latitude"], df2["Longitude"]))