2017-08-15 58 views
0

當我運行R並通過SparkR 1.5連接到Spark時,單節點hadoop POC環境(Ubuntu 14.04)出現問題。我之前跑過這個測試幾次,直到今天我都沒有問題。SparkR 1.5.2連接到HIVE =停止工作並生成錯誤

我的目標是使用SparkR連接到Hive並引入表格(最終將df結果寫回Hive)。這是來自RStudio的R Console的工作。我完全難住,任何意見,以幫助表示讚賞。

library(SparkR, lib.loc="/usr/hdp/2.3.6.0-3796/spark/R/lib/") 
sc <- sparkR.init(sparkHome = "/usr/hdp/2.3.6.0-3796/spark/") 

Launching java with spark-submit command /usr/hdp/2.3.6.0-3796/spark//bin/spark-submit sparkr-shell /tmp/RtmpdGojW1/backend_portb8b949c8f0e2 
17/08/15 15:50:18 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:19 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:19 INFO SparkContext: Running Spark version 1.5.2 
17/08/15 15:50:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/08/15 15:50:20 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:20 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 10.100.0.11 instead (on interface eth0) 
17/08/15 15:50:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
17/08/15 15:50:20 INFO SecurityManager: Changing view acls to: rstudio 
17/08/15 15:50:20 INFO SecurityManager: Changing modify acls to: rstudio 
17/08/15 15:50:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rstudio); users with modify permissions: Set(rstudio) 
17/08/15 15:50:22 INFO Slf4jLogger: Slf4jLogger started 
17/08/15 15:50:22 INFO Remoting: Starting remoting 
17/08/15 15:50:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:43827] 
17/08/15 15:50:23 INFO Utils: Successfully started service 'sparkDriver' on port 43827. 
17/08/15 15:50:23 INFO SparkEnv: Registering MapOutputTracker 
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:23 INFO SparkEnv: Registering BlockManagerMaster 
17/08/15 15:50:23 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-bea658dc-145f-48a6-bb28-6f05af529547 
17/08/15 15:50:23 INFO MemoryStore: MemoryStore started with capacity 530.0 MB 
17/08/15 15:50:23 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:23 INFO HttpFileServer: HTTP File server directory is /tmp/spark-6b719b9d-3d54-48bc-8894-cd2ddf9b0755/httpd-e7371ee1-5574-476d-9d53-679a9781af2d 
17/08/15 15:50:23 INFO HttpServer: Starting HTTP Server 
17/08/15 15:50:23 INFO Server: jetty-8.y.z-SNAPSHOT 
17/08/15 15:50:23 INFO AbstractConnector: Started [email protected]:39275 
17/08/15 15:50:23 INFO Utils: Successfully started service 'HTTP file server' on port 39275. 
17/08/15 15:50:23 INFO SparkEnv: Registering OutputCommitCoordinator 
17/08/15 15:50:23 INFO Server: jetty-8.y.z-SNAPSHOT 
17/08/15 15:50:24 INFO AbstractConnector: Started [email protected]:4040 
17/08/15 15:50:24 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
17/08/15 15:50:24 INFO SparkUI: Started SparkUI at http://10.100.0.11:4040 
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:50:24 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 
17/08/15 15:50:24 INFO Executor: Starting executor ID driver on host localhost 
17/08/15 15:50:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43075. 
17/08/15 15:50:24 INFO NettyBlockTransferService: Server created on 43075 
17/08/15 15:50:24 INFO BlockManagerMaster: Trying to register BlockManager 
17/08/15 15:50:24 INFO BlockManagerMasterEndpoint: Registering block manager localhost:43075 with 530.0 MB RAM, BlockManagerId(driver, localhost, 43075) 
17/08/15 15:50:24 INFO BlockManagerMaster: Registered BlockManager 

hiveContext <- sparkRHive.init(sc) 

17/08/15 15:51:17 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:51:19 INFO HiveContext: Initializing execution hive, version 1.2.1 
17/08/15 15:51:19 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.3.6.0-3796 
17/08/15 15:51:19 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.3.6.0-3796 
17/08/15 15:51:19 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:51:20 INFO metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083 
17/08/15 15:51:20 INFO metastore: Connected to metastore. 
17/08/15 15:51:21 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 
17/08/15 15:51:22 INFO SessionState: Created local directory: /tmp/a4f76c27-cf73-45bf-b873-a0e97ca43309_resources 
17/08/15 15:51:22 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309 
17/08/15 15:51:22 INFO SessionState: Created local directory: /tmp/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309 
17/08/15 15:51:22 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/a4f76c27-cf73-45bf-b873-a0e97ca43309/_tmp_space.db 
17/08/15 15:51:22 INFO HiveContext: default warehouse location is /user/hive/warehouse 
17/08/15 15:51:22 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 
17/08/15 15:51:22 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.3.6.0-3796 
17/08/15 15:51:22 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.3.6.0-3796 
17/08/15 15:51:22 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead. 
17/08/15 15:51:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/08/15 15:51:25 INFO metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083 
17/08/15 15:51:25 INFO metastore: Connected to metastore. 
17/08/15 15:51:27 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 
17/08/15 15:51:27 INFO SessionState: Created local directory: /tmp/16b5f51f-f570-4fc0-b3a6-eda3edd19b59_resources 
17/08/15 15:51:27 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59 
17/08/15 15:51:27 INFO SessionState: Created local directory: /tmp/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59 
17/08/15 15:51:27 INFO SessionState: Created HDFS directory: /tmp/hive/rstudio/16b5f51f-f570-4fc0-b3a6-eda3edd19b59/_tmp_space.db 

showDF(sql(hiveContext, "USE MyHiveDB")) 

Error: is.character(x) is not TRUE 

showDF(sql(hiveContext, "SELECT * FROM table")) 

Error: is.character(x) is not TRUE 
+1

'嘗試連接與URI節儉metastore://localhost.localdomain:9083' ..你在本地主機上運行Hive嗎? –

+0

感謝cricket_007進行格式編輯。非常感激。 – DKane

+1

https://forums.databricks.com/questions/9898/using-r-libraries.html –

回答

1

解決。這裏的問題正是cricket_007 suggested with the databrick link。 R會話中使用了一些與SparkR實例衝突的軟件包。

通過將它們從當前R會話中分離出來,解決了問題並獲得了代碼的工作。

要取下包是:

  • plyr
  • dplyr
  • dbplyr