到JSON我有一個pairRDD看起來像火花/斯卡拉字符串中的地圖
(1, {"id":1, "picture": "url1"})
(2, {"id":2, "picture": "url2"})
(3, {"id":3, "picture": "url3"})
...
,其中第二個元素是一個字符串,我是從功能的get()從http://alvinalexander.com/scala/how-to-write-scala-http-get-request-client-source-fromurl了。這裏是該功能:
@throws(classOf[java.io.IOException])
@throws(classOf[java.net.SocketTimeoutException])
def get(url: String,
connectTimeout: Int = 5000,
readTimeout: Int = 5000,
requestMethod: String = "GET") =
{
import java.net.{URL, HttpURLConnection}
val connection = (new URL(url)).openConnection.asInstanceOf[HttpURLConnection]
connection.setConnectTimeout(connectTimeout)
connection.setReadTimeout(readTimeout)
connection.setRequestMethod(requestMethod)
val inputStream = connection.getInputStream
val content = io.Source.fromInputStream(inputStream).mkString
if (inputStream != null) inputStream.close
content
}
現在我想將該字符串轉換爲json以從中獲取圖片url。 (從這個https://stackoverflow.com/a/38271732/1456026)
val step2 = pairRDD_1.map({case(x,y)=>{
val jsonStr = y
val rdd = sc.parallelize(Seq(jsonStr))
val df = sqlContext.read.json(rdd)
(x,y("picture"))
}})
但我經常收到
異常線程 「main」 org.apache.spark.SparkException:任務不 序列化
當我打印出前20個元素,並嘗試將字符串轉換爲json,然後手動將它們一個接一個地外部化。
val rdd = sc.parallelize(Seq("""{"id":1, "picture": "url1"}"""))
val df = sqlContext.read.json(rdd)
println(df)
>>>[id: string, picture: string]
如何將字符串轉換爲spark/scala中的json inside .map?