2017-07-31 63 views
1

我下面Python Client Libraries for the Google BigQuery API - https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#jobs>查詢數據(異步)當談到檢索結果,執行的代碼谷歌的BigQuery API(Python客戶端庫)>查詢數據(異步)

rows, total_count, token = query.fetch_data() # API requet 

總是返回ValueError: too many values to unpack (expected 3) (順便說一句,我認爲有一個錯字,應該是results.fetch_data(),而不是!)

然而,日Ë下面的代碼工作正常

results = job.results() 
rows = results.fetch_data() 
tbl = [x for x in rows] 

所有表中的行是回報(元組的列表)在辛格運河拍在TBL,> 225K行!

任何人都可以爲什麼我得到的錯誤,或者是否有任何錯誤的文檔?

如何我還可以檢索批(通過翻閱頁面迭代)

感謝了很多成績提前!

回答

1

前一段時間我打開this issue要求更新文檔,但從答案中可以看出它仍然需要正式發佈才能更改。

請參閱code base本身更好的文檔字符串(在這種情況下,特別是類迭代器):

"""Iterators for paging through API responses. 
These iterators simplify the process of paging through API responses 
where the response is a list of results with a ``nextPageToken``. 
To make an iterator work, you'll need to provide a way to convert a JSON 
item returned from the API into the object of your choice (via 
``item_to_value``). You also may need to specify a custom ``items_key`` so 
that a given response (containing a page of results) can be parsed into an 
iterable page of the actual objects you want. You then can use this to get 
**all** the results from a resource:: 
    >>> def item_to_value(iterator, item): 
    ...  my_item = MyItemClass(iterator.client, other_arg=True) 
    ...  my_item._set_properties(item) 
    ...  return my_item 
    ... 
    >>> iterator = Iterator(..., items_key='blocks', 
    ...      item_to_value=item_to_value) 
    >>> list(iterator) # Convert to a list (consumes all values). 
Or you can walk your way through items and call off the search early if 
you find what you're looking for (resulting in possibly fewer 
requests):: 
    >>> for my_item in Iterator(...): 
    ...  print(my_item.name) 
    ...  if not my_item.is_valid: 
    ...   break 
At any point, you may check the number of items consumed by referencing the 
``num_results`` property of the iterator:: 
    >>> my_iterator = Iterator(...) 
    >>> for my_item in my_iterator: 
    ...  if my_iterator.num_results >= 10: 
    ...   break 
When iterating, not every new item will send a request to the server. 
To iterate based on each page of items (where a page corresponds to 
a request):: 
    >>> iterator = Iterator(...) 
    >>> for page in iterator.pages: 
    ...  print('=' * 20) 
    ...  print(' Page number: %d' % (iterator.page_number,)) 
    ...  print(' Items in page: %d' % (page.num_items,)) 
    ...  print('  First item: %r' % (next(page),)) 
    ...  print('Items remaining: %d' % (page.remaining,)) 
    ...  print('Next page token: %s' % (iterator.next_page_token,)) 
    ==================== 
     Page number: 1 
     Items in page: 1 
     First item: <MyItemClass at 0x7f1d3cccf690> 
    Items remaining: 0 
    Next page token: eav1OzQB0OM8rLdGXOEsyQWSG 
    ==================== 
     Page number: 2 
     Items in page: 19 
     First item: <MyItemClass at 0x7f1d3cccffd0> 
    Items remaining: 18 
    Next page token: None 
To consume an entire page:: 
    >>> list(page) 
    [ 
     <MyItemClass at 0x7fd64a098ad0>, 
     <MyItemClass at 0x7fd64a098ed0>, 
     <MyItemClass at 0x7fd64a098e90>, 
    ] 
+0

非常感謝隊友!這真的澄清了事情:) –

0

是的,你對文件是正確的。有一個錯字 -

results = job.results()

rows, total_count, token = query.fetch_data() # API requet

while True:

do_something_with(rows) 

    if token is None: 

      break 

    rows, total_count,token=query.fetch_data(page_token=token)  # API requeste here 

對於大數據集,我們做的每小時查詢在我們的日常工作來獲取數據。