掃描使用HBase的外殼

46

試試這個。這有點醜陋，但它適用於我。

import org.apache.hadoop.hbase.filter.CompareFilter 
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter 
import org.apache.hadoop.hbase.filter.SubstringComparator 
import org.apache.hadoop.hbase.util.Bytes 
scan 't1', { COLUMNS => 'family:qualifier', FILTER => 
    SingleColumnValueFilter.new 
     (Bytes.toBytes('family'), 
     Bytes.toBytes('qualifier'), 
     CompareFilter::CompareOp.valueOf('EQUAL'), 
     SubstringComparator.new('somevalue')) 
}

HBase的外殼將包括無論你在〜/ .irbrc，所以你可以把這樣的事情在那裏（我不是紅寶石的專家，改進歡迎）：

# imports like above 
def scan_substr(table,family,qualifier,substr,*cols) 
    scan table, { COLUMNS => cols, FILTER => 
     SingleColumnValueFilter.new 
      (Bytes.toBytes(family), Bytes.toBytes(qualifier), 
      CompareFilter::CompareOp.valueOf('EQUAL'), 
      SubstringComparator.new(substr)) } 
end

然後你可以說，在外殼：

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

來源

2011-09-16 16:07:29 havanki4j

+0

這確實是超級醜陋的。不過謝謝，在HBase docs/book/oreilly書中找不到這樣的例子。 – mumrah

8

使用的過濾特性參數的scan，如圖所示用法幫助：

hbase(main):002:0> scan 

ERROR: wrong number of arguments (0 for 1) 

Here is some help for this command: 
Scan a table; pass table name and optionally a dictionary of scanner 
specifications. Scanner specifications may include one or more of: 
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, 
or COLUMNS. If no columns are specified, all columns will be scanned. 
To scan all members of a column family, leave the qualifier empty as in 
'col_family:'. 

Some examples: 

    hbase> scan '.META.' 
    hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} 
    hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} 
    hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} 
    hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} 

For experts, there is an additional option -- CACHE_BLOCKS -- which 
switches block caching for the scanner on (true) or off (false). By 
default it is enabled. Examples: 

    hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

來源

2011-08-31 21:03:58 Tony

28

scan 'test', {COLUMNS => ['F'],FILTER => \ 
"(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ 
(SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

更多信息，可以發現here。請注意，附件Filter Language.docx文件中有多個示例。

來源

2012-06-28 02:13:25 dape

+0

我認爲這個過濾器解析語言只適用於Hbase的更高版本 - 在0.90.6（cdh 3u6）我無法獲得任何變化的工作。 – Mikeb

+0

我認爲看javadoc是非常有用的;這裏是0.94的javadoc：http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html – mooreds

6

Scan scan = new Scan(); 
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); 

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions 
or MUST_PASS_ONE condition. 

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
        Bytes.toBytes("SOME COLUMN FAMILY"), 
        Bytes.toBytes("SOME COLUMN NAME"), 
        CompareOp.EQUAL, 
        Bytes.toBytes("SOME VALUE")); 

filter_by_name.setFilterIfMissing(true); 
//if you don't want the rows that have the column missing. 
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name); 


scan.setFilter(list);

來源

2014-02-18 07:03:53 KannarKK

+0

這段代碼是用Java編寫的，問題在於詢問HBase shell。 – Tony

3

其中一個過濾器的是Valuefilter可用於過濾所有列的值。

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

二進制是過濾器內所使用的比較器之一。根據你想要做的事情，你可以在過濾器中使用不同的比較器。

您可以參考以下url：http：// www.hadooptpoint.com/filters-in-hbase-shell/. 它提供了有關如何在HBase Shell中使用不同過濾器的很好示例。

來源

2016-02-12 21:17:21

+0

鏈接只有答案不是很好的問題。發佈一些代碼並解釋它以提供幫助。 – KittMedia

掃描使用HBase的外殼

回答

相關問題