0

我在GCP技術方面比較新。目前,我正在做一個POC來創建一個預定的數據流作業,將谷歌雲存儲中的數據接收(插入)到BigQuery。閱讀一些教程及文件後,我想出了以下內容:從App Engine執行數據流作業

  1. 我首先創建一個讀取文件的Avro數據流作業,並將其載入至BigQuery。此數據流已經過測試並運行良好。

    (self.pipeline 
        | output_table + ': read table ' >> ReadFromAvro(storage_input_path) 
        | output_table + ': filter columns' >> beam.Map(self.__filter_columns, columns=columns) 
        | output_table + ': write to BigQuery' >> beam.Write(
         beam.io.BigQuerySink(output_table,    
        create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER,        
        write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))) 
    
  2. 爲了創建計劃的作業,然後我創建了一個簡單的Web服務如下:

    import logging 
    from flask import Flask 
    from common.tableLoader import TableLoader 
    from ingestion import IngestionToBigQuery 
    from common.configReader import ConfigReader 
    app = Flask(__name__) 
    @app.route('/') 
    def hello(): 
        """Return a friendly HTTP greeting.""" 
        logging.getLogger().setLevel(logging.INFO) 
        config = ConfigReader('columbus-config') # TODO read from args 
        tables = TableLoader('experience') 
        ingestor = IngestionToBigQuery(config.configuration, tables.list_of_tables) 
        ingestor.ingest_table() 
        return 'Hello World!'``` 
    
  3. 我也創造了app.yaml中:

    runtime: python 
    env: flex 
    entrypoint: gunicorn -b :$PORT recsys_data_pipeline.main:app 
    threadsafe: yes 
    runtime_config: 
        python_version: 2 
        resources: 
        memory_gb: 2.0 
    

然後,我使用這個命令gcloud app deploy部署它,但是,我得到了以下錯誤:

default[20170417t173837] ERROR:root:The gcloud tool was not found. 
default[20170417t173837] Traceback (most recent call last):  
File "/env/local/lib/python2.7/site-packages/apache_beam/internal/gcp/auth.py", line 109, in _refresh  ['gcloud', 'auth', 'print-access-token'], stdout=processes.PIPE)  
File "/env/local/lib/python2.7/site-packages/apache_beam/utils/processes.py", line 52, in Popen  return subprocess.Popen(*args, **kwargs)  
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__  errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child  raise child_exception OSError: [Errno 2] No such file or directory 

從上面的消息,我發現錯誤從apache_beam auth.py class到來,具體地,它從下面的函數來:

def _refresh(self, http_request): 
    """Gets an access token using the gcloud client.""" 
    try: 
    gcloud_process = processes.Popen(['gcloud', 'auth', 'print-access-token'], stdout=processes.PIPE) 
    except OSError as exn: 
    logging.error('The gcloud tool was not found.', exc_info=True) 
    raise AuthenticationException('The gcloud tool was not found: %s' % exn) 
    output, _ = gcloud_process.communicate() 
    self.access_token = output.strip() 

當憑證(service_acount_nameservice_acount_key是其上調用沒有給出:

if google_cloud_options.service_account_name: 
     if not google_cloud_options.service_account_key_file: 
     raise AuthenticationException(
      'key file not provided for service account.') 
     if not os.path.exists(google_cloud_options.service_account_key_file): 
     raise AuthenticationException(
      'Specified service account key file does not exist.') 

else: 
     try: 
     credentials = _GCloudWrapperCredentials(user_agent) 
     # Check if we are able to get an access token. If not fallback to 
     # application default credentials. 
     credentials.get_access_token() 
     return credentials 

所以我有兩個問題:

  1. 有沒有辦法在我的代碼或配置文件的某處(例如:在app.yaml)「附加」證書(service_acount_nameservice_acount_key)?
  2. 什麼是從谷歌應用引擎觸發數據流作業的最佳實踐?

非常感謝,任何建議和意見都會非常有幫助!

回答

0
+0

hei @jkff謝謝你的回覆。我嘗試了上面鏈接中提供的一步一步,但是,在部署app.yaml時仍然出現同樣的錯誤。 'ERROR:root:找不到gcloud工具.' – bohr

+0

你在你的app.yaml中使用'custom'運行時,並且你在目錄中的Amy例子旁邊有'Dockerfile'嗎? – jkff

+0

我在我的app.yaml中使用'custom'運行時。我也創建了Dockerfile,和Amy的例子一樣。 – bohr