0
我在GCP技術方面比較新。目前,我正在做一個POC來創建一個預定的數據流作業,將谷歌雲存儲中的數據接收(插入)到BigQuery。閱讀一些教程及文件後,我想出了以下內容:從App Engine執行數據流作業
我首先創建一個讀取文件的Avro數據流作業,並將其載入至BigQuery。此數據流已經過測試並運行良好。
(self.pipeline | output_table + ': read table ' >> ReadFromAvro(storage_input_path) | output_table + ': filter columns' >> beam.Map(self.__filter_columns, columns=columns) | output_table + ': write to BigQuery' >> beam.Write( beam.io.BigQuerySink(output_table, create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER, write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)))
爲了創建計劃的作業,然後我創建了一個簡單的Web服務如下:
import logging from flask import Flask from common.tableLoader import TableLoader from ingestion import IngestionToBigQuery from common.configReader import ConfigReader app = Flask(__name__) @app.route('/') def hello(): """Return a friendly HTTP greeting.""" logging.getLogger().setLevel(logging.INFO) config = ConfigReader('columbus-config') # TODO read from args tables = TableLoader('experience') ingestor = IngestionToBigQuery(config.configuration, tables.list_of_tables) ingestor.ingest_table() return 'Hello World!'```
我也創造了app.yaml中:
runtime: python env: flex entrypoint: gunicorn -b :$PORT recsys_data_pipeline.main:app threadsafe: yes runtime_config: python_version: 2 resources: memory_gb: 2.0
然後,我使用這個命令gcloud app deploy
部署它,但是,我得到了以下錯誤:
default[20170417t173837] ERROR:root:The gcloud tool was not found.
default[20170417t173837] Traceback (most recent call last):
File "/env/local/lib/python2.7/site-packages/apache_beam/internal/gcp/auth.py", line 109, in _refresh ['gcloud', 'auth', 'print-access-token'], stdout=processes.PIPE)
File "/env/local/lib/python2.7/site-packages/apache_beam/utils/processes.py", line 52, in Popen return subprocess.Popen(*args, **kwargs)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__ errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory
從上面的消息,我發現錯誤從apache_beam auth.py class
到來,具體地,它從下面的函數來:
def _refresh(self, http_request):
"""Gets an access token using the gcloud client."""
try:
gcloud_process = processes.Popen(['gcloud', 'auth', 'print-access-token'], stdout=processes.PIPE)
except OSError as exn:
logging.error('The gcloud tool was not found.', exc_info=True)
raise AuthenticationException('The gcloud tool was not found: %s' % exn)
output, _ = gcloud_process.communicate()
self.access_token = output.strip()
當憑證(service_acount_name
和service_acount_key
是其上調用沒有給出:
if google_cloud_options.service_account_name:
if not google_cloud_options.service_account_key_file:
raise AuthenticationException(
'key file not provided for service account.')
if not os.path.exists(google_cloud_options.service_account_key_file):
raise AuthenticationException(
'Specified service account key file does not exist.')
else:
try:
credentials = _GCloudWrapperCredentials(user_agent)
# Check if we are able to get an access token. If not fallback to
# application default credentials.
credentials.get_access_token()
return credentials
所以我有兩個問題:
- 有沒有辦法在我的代碼或配置文件的某處(例如:在
app.yaml
)「附加」證書(service_acount_name
和service_acount_key
)? - 什麼是從谷歌應用引擎觸發數據流作業的最佳實踐?
非常感謝,任何建議和意見都會非常有幫助!
hei @jkff謝謝你的回覆。我嘗試了上面鏈接中提供的一步一步,但是,在部署app.yaml時仍然出現同樣的錯誤。 'ERROR:root:找不到gcloud工具.' – bohr
你在你的app.yaml中使用'custom'運行時,並且你在目錄中的Amy例子旁邊有'Dockerfile'嗎? – jkff
我在我的app.yaml中使用'custom'運行時。我也創建了Dockerfile,和Amy的例子一樣。 – bohr