2017-08-09 87 views
0

我有一個接受一些參數的pig腳本。我只需要使用AWS PowerShell Cmdlet。 我能夠創建與豬集羣中使用如下命令安裝:使用PowerShell的AWS EMR作業Cmdlet

$app = New-Object Amazon.ElasticMapReduce.Model.Application 
$app.Name="Pig" 
$jobid = Start-EMRJobFlow -Name "Pig Job" -Application $app -Instances_MasterInstanceType "m3.xlarge" -Instances_KeepJobFlowAliveWhenNoSteps $true -Instances_InstanceCount 1 -LogUri "s3://mybucket/logs" -VisibleToAllUsers $true -ReleaseLabel "emr-5.7.0" -SecurityConfiguration "my-sec-grp" -JobFlowRole "EMR_EC2_DefaultRole" -ServiceRole "EMR_DefaultRole" 

但我不能添加步驟,豬的工作。 我跟着一些文章,但那些都很老,或者那些正在使用一些自定義jar提交作業。我只需要提交一個接受一些參數的豬腳本。 任何幫助將不勝感激 注意:我需要PowerShell特定的命令。我可以使用AWS cli來完成此操作。

回答

0

我有辦法從powershell提交豬腳本。我正在關注這個link。但問題在於它關於Hive腳本。那麼,它的製作步驟爲

$runhivescriptargs = @("s3://us-east-1.elasticmapreduce/libs/hive/hive-script", ` 
     "--base-path", "s3://us-east-1.elasticmapreduce/libs/hive", ` 
     "--hive-versions","latest", ` 
     "--run-hive-script", ` 
     "--args", ` 
     "-f", "s3://elasticmapreduce/samples/hive-ads/libs/join-clicks-to-impressions.q", ` 
     "-d", "SAMPLE=s3://elasticmapreduce/samples/hive-ads",` 
     "-d", "DAY=2009-04-13", ` 
     "-d", "HOUR=08", ` 
     "-d", "NEXT_DAY=2009-04-13", ` 
     "-d", "NEXT_HOUR=09",` 
     "-d", "INPUT=s3://elasticmapreduce/samples/hive-ads/tables", ` 
     "-d", "OUTPUT=s3://my-output-bucket/joinclick1", ` 
     "-d", "LIB=s3://elasticmapreduce/samples/hive-ads/libs") 

所以我遵循相同的步驟,但不知何故,在豬的腳本參數,需要使用-p選項傳遞的情況下不使用-d選項 所以一步一步我的創作是像:

$runpigscriptargs = @("s3://us-east-1.elasticmapreduce/libs/pig/pig-script", ` 
     "--base-path", "s3://us-east-1.elasticmapreduce/libs/pig", ` 
     "--run-pig-script", ` 
     "--args", ` 
     "-f", $scriptfile, ` 
     "-p", "Id=$Id",` 
     "-p", "jarPath=$jarPath",` 
     "-p", "inputPath=$newInputPath", ` 
     "-p", "outputPath=$outputPath") 

我不是豬指定版本,我已經創建了具有最新版本的豬的EMR集羣裝 感謝