2016-09-30 454 views

回答

0
Yes, this can be done easily by setting time range to scanner and then deleting the returned result set. 

    public class BulkDeleteDriver { 
    //Added colum family and column to lessen the scan I/O 
    private static final byte[] COL_FAM = Bytes.toBytes("<column family>"); 
    private static final byte[] COL = Bytes.toBytes("column"); 
    final byte[] TEST_TABLE = Bytes.toBytes("<TableName>"); 

    public static void main(final String[] args) throws IOException, 
    InterruptedException { 
    //Create connection to Hbase 
    Configuration conf = null; 
    Connection conn = null; 

    try { 
    conf = HBaseConfiguration.create(); 
    //Path to HBase-site.xml 
    conf.addResource(new Path(hbasepath)); 
    //Get the connection 
    conn = ConnectionFactory.createConnection(conf); 
    logger.info("Connection created successfully"); 
    } 
    catch (Exception e) { 
    logger.error(e + "Connection Unsuccessful"); 
    } 

    //Get the table instance 
    Table table = conn.getTable(TableName.valueOf(TEST_TABLE)); 
    List<Delete> listOfBatchDeletes = new ArrayList<Delete>(); 
    long recordCount = 0; 
    // Set scanCache if required 
    logger.info("Got The Table : " + table.getName()); 

    //Get calendar instance and get proper start and end timestamps 
    Calendar calStart = Calendar.getInstance(); 
    calStart.add(Calendar.DAY_OF_MONTH, day); 
    Calendar calEnd = Calendar.getInstance(); 
    calEnd.add(Calendar.HOUR, hour); 

    //Get timestamps 
    long starTS = calStart.getTimeInMillis(); 
    long endTS = calEnd.getTimeInMillis(); 

    //Set all scan related properties 
    Scan scan = new Scan(); 
    //Most important part of code set it properly! 
    //here my purpose it to delete everthing Present Time - 6 hours 
    scan.setTimeRange(starTS, endTS); 
    scan.setCaching(scanCache); 
    scan.addColumn(COL_FAM, COL); 

    //Scan the table and get the row keys 
    ResultScanner resultScanner = table.getScanner(scan); 
    for (Result scanResult : resultScanner) { 
    Delete delete = new Delete(scanResult.getRow()); 

    //Create batches of Bult Delete 
    listOfBatchDeletes.add(delete); 
    recordCount++; 
    if (listOfBatchDeletes.size() == //give any suitable batch size here) { 
    System.out.println("Firing Batch Delete Now......"); 
    table.delete(listOfBatchDeletes); 
    //don't forget to clear the array list 
    listOfBatchDeletes.clear(); 
    }} 
    System.out.println("Firing Final Batch of Deletes....."); 
    table.delete(listOfBatchDeletes); 
    System.out.println("Total Records Deleted are.... " + recordCount); 
    try { 
    table.close(); 
    } catch (Exception e) { 
    e.printStackTrace(); 
    logger.error("ERROR", e); 
    }}} 
+0

表有大約20萬行。獲取每一行將影響性能。如果我們只傳遞表名和時間戳,是否有可能通過這種方式刪除時間戳之前的所有數據,而不必實際傳遞該行? –

+0

時間戳是你的行鍵還是行鍵的一部分? 如果不是那麼你將如何找出刪除哪一行。您必須檢查時間戳是否正確,行鍵的時間戳或任何其他列時間戳。 如果是: 然後只需傳遞rowKey(Timestamp)作爲範圍掃描參數或使用模糊過濾器找出這些鍵。 無論你做什麼,掃描都會發生。 爲了使掃描速度更快,您可以查看CoProcessors,但如果您不是HBase的專家,請不要嘗試它們 –

4

HBase沒有範圍刪除標記的概念。這意味着如果您需要刪除多個單元格,則需要爲每個單元格放置刪除標記,這意味着您必須掃描客戶端或服務器端的每一行。這意味着您有兩種選擇:

  1. BulkDeleteProtocol:這使用協處理器端點,這意味着完整的操作將在服務器端運行。該鏈接有一個如何使用它的例子。如果您進行網絡搜索,您可以輕鬆找到如何在HBase中啓用協處理器端點。
  2. 掃描和刪除:這是一個乾淨且最簡單的選項。由於您說您需要刪除比特定時間戳更早的所有列族,因此可以通過使用服務器端篩選僅讀取每行的第一個鍵來大大優化掃描和刪除操作。

    Scan scan = new Scan(); 
    scan.setTimeRange(0, STOP_TS); // STOP_TS: The timestamp in question 
    // Crucial optimization: Make sure you process multiple rows together 
    scan.setCaching(1000); 
    // Crucial optimization: Retrieve only row keys 
    FilterList filters = new FilterList(FilterList.Operator.MUST_PASS_ALL, 
        new FirstKeyOnlyFilter(), new KeyOnlyFilter()); 
    scan.setFilter(filters); 
    ResultScanner scanner = table.getScanner(scan); 
    List<Delete> deletes = new ArrayList<>(1000); 
    Result [] rr; 
    do { 
        // We set caching to 1000 above 
        // make full use of it and get next 1000 rows in one go 
        rr = scanner.next(1000); 
        if (rr.length > 0) { 
        for (Result r: rr) { 
         Delete delete = new Delete(r.getRow(), STOP_TS); 
         deletes.add(delete); 
        } 
        table.delete(deletes); 
        deletes.clear(); 
        } 
    } while(rr.length > 0); 
    
相關問題