2013-02-25 71 views
0

我忙於實施Solr的網站關於度假屋。該網站使用Postgres作爲主數據庫。對於搜索結果,我們希望使用Solr作爲後端來獲取可用的度假屋。Solr導入鎖定表中的postgres

通過將DataImportHandler與JdbcDataSource一起使用來導入數據庫的某些部分。

的DataImportHandler的縮短版:

<?xml version="1.0" encoding="UTF-8"?> 
<dataConfig> 
    <dataSource type="JdbcDataSource" driver="org.postgresql.Driver" name="Solr" 
        url="jdbc:postgresql://host:port/database" user="*" password="*" readOnly="true"/> 
    <document> 
<entity name="availabilities" transformer="RegexTransformer" pk="id" 
     query=" 
      SELECT concat('A',pa.availability_id,'-',pad.start_date,'-',pad.period_type_id) as unique_availability_id, 
      pa.property_id, 
      NULLIF(CONCAT(ST_X(pl.position),',',ST_Y(pl.position)),',') as locationhash, 
      pl.position_accurate, 
      true as is_availability, 
      region.child_id as city_id, 
      region.ancestor_id as province_id, 
      (
       SELECT array_to_string(array(SELECT binnen.ancestor_id 
       FROM fewo_Location_Ancestry binnen 
       WHERE binnen.child_id = region.child_id 
       AND binnen.ancestor_type_id = 12), ',') 
      ) AS region_id, 
      pl.country_id, 
      pl.min_persons, 
      pl.max_persons, 
      fap.bedrooms, 
      pl.specifications, 
      pl.property_state_id, 
      pa.availability_id, 
      pad.period_type_id, 
      pad.start_date, 
      pad.end_date, 
      (
       SELECT COUNT(*) &gt; 0 FROM fewo_last_minute_details flmd 
       WHERE flmd.property_id = pa.property_id 
       AND flmd.details_id = pad.details_id 
       LIMIT 1 
      ) AS last_minute, 
      CASE (
       SELECT COUNT(*) &gt; 0 FROM fewo_last_minute_details flmd 
       WHERE flmd.property_id = pa.property_id 
       AND flmd.details_id = pad.details_id 
       LIMIT 1 
      ) WHEN true THEN pad.discount_price 
        ELSE pad.price 
      END as price, 
      pl.positioning_fee, 
      pl.sort_order 
      FROM fewo_property_availability_details pad 
       INNER JOIN fewo_property_availability pa USING (availability_id) 
       INNER JOIN fewo_Property_Location pl ON pa.property_id=pl.property_id 
       INNER JOIN fewo_all_properties fap ON pl.property_id=fap.property_id 
       INNER JOIN fewo_Location_Ancestry region ON (region.child_id =pl.location_id AND region.ancestor_type_id = 7) 
      WHERE pad.start_date &gt; current_date 
     "> 
    <field name="id" column="unique_availability_id"/> 
     <field name="property_id" column="property_id"/> 
     <field name="parent_id" column="property_id"/> 
     <field name="is_availability" column="is_availability"/> 
     <field name="positionCoord" column="locationhash"/> 
     <field name="position_accurate" column="position_accurate"/> 
     <field name="city_id" column="city_id"/> 
     <field name="province_id" column="province_id"/> 
     <field name="region_id" column="region_id" splitBy="," sourceColName="region_id"/> 
     <field name="country_id" column="country_id"/> 
     <field name="min_persons" column="min_persons"/> 
     <field name="max_persons" column="max_persons"/> 
     <field name="bedrooms" column="bedrooms"/> 
     <entity name="fewo_all_property_specifications" transformer="foo.SpecTransformer" pk="property_id" 
      cacheKey="property_id" 
      cacheLookup="availabilities.property_id" 
      query="SELECT property_id, specification_id, COALESCE(value,'true') as val FROM fewo_all_property_specifications" 
      processor="CachedSqlEntityProcessor"> 
     </entity> 
     <field name="property_state_id" column="property_state_id"/> 
     <field name="availability_id" column="availability_id"/> 
     <field name="period_type_id" column="period_type_id"/> 
     <field name="start_date" column="start_date"/> 
     <field name="end_date" column="end_date"/> 
     <field name="last_minute" column="last_minute" /> 
     <field name="price" column="price"/> 
     <field name="positioning_fee" column="positioning_fee"/> 
     <field name="sort_order" column="sort_order"/> 
    </entity> 
    </document> 
</dataConfig> 

約一小時,以進口1300萬條記錄到Solr的進口運行。問題在於,在導入期間,無法更新表fewo_property_availability_details,因爲有一個AccessShareLock鎖定表。這可以防止更新/插入數據到表中,並且這些查詢會排隊。一段時間後,它們堆積太多,數據庫失敗。

我的問題是:有沒有一種很好的方式來導入數據,而不會妨礙常規查詢太多?像在x個導入的記錄之後開始一個新的交易也給其他查詢時間運行?

我使用在Ubuntu 12.04上運行的Solr 4.0和Postgres 9.1。

由於

回答

2

AccessShareLock僅與AccessExclusiveLock衝突,每the documentation

ACCESS EXCLUSIVEALTER TABLEDROP TABLETRUNCATEREINDEXCLUSTERVACUUM FULL和不合格LOCK TABLE語句僅獲得。

看看pg_catalog.pg_locks看看你是否能獲得更多關於鎖關係的信息。您還可以在PostgreSQL wiki上找到一些有用的鎖定查詢:http://wiki.postgresql.org/wiki/Lock_Monitoring

完全可能的是,您的數據庫無法應付繁重的併發讀取/寫入負載,而不是鎖定。如果您的緩存很小,並且在沒有BBU RAID控制器的普通(非SSD)磁盤上運行,並且/或者您尚未針對您的環境調整PostgreSQL配置,則這種情況尤其可能。

0

另外,你可以創建一個物化視圖,根據你從你的DIH中選擇(在oracle和MySQL中是可能的),並且你可以將刷新選項設置爲FAST(這意味着該視圖將始終包含新數據)。你會實現什麼: - 更快的導入 - 表上沒有鎖 之後,你可以做部分導入(不是全部導入),只取得新的或更改的數據。看到這個link 希望這會有所幫助。