2014-09-01 47 views
0

我想在Ruby中使用活動記錄做一些基本的文本匹配。活動記錄文本匹配

Here is my code so far; 

require 'active_record' 
require 'yaml' 
require 'pg' 
require 'pry' 
require 'FileUtils' 

$config = ' 
adapter: postgresql 
database: edgar 
username: YYYYY 
password: 
host: 127.0.0.1' 

ActiveRecord::Base.establish_connection(YAML::load($config)) 
class Doc < ActiveRecord::Base; end 
class Eightk < ActiveRecord::Base; end 



directory = "disease"  #Creates a directory called disease 
FileUtils.mkpath(directory)  # Makes the directory if it doesn't exists 

cancer = Eightk.where("text ilike '%cancer%'") 
death = Eightk.where("text ilike '%death%'") 


cancer.each do |filing|  #filing can be used instead of eightks 
    filename = "#{directory}/#{filing.doc_id}.html" 
    File.open(filename,"w").puts filing.text 
    puts "Storing #{filing.doc_id}..." 


death.each do |filing| #filing can be used instead of eightks 
    filename = "#{directory}/#{filing.doc_id}.html" 
    File.open(filename,"w").puts filing.text 
    puts "Storing #{filing.doc_id}..." 

    end 
end 

我有一長串的條件我想搜索;

  1. 有沒有辦法讓我組合搜索列表。我想「癌症」 |「死亡」但沒有任何運氣
  2. 我願做一個精確匹配的話,而不是ILIKE但不知道該命令,

感謝

回答

0

也許類似

keywords = %w(cancer death anotherone) 
records = Eightk.where keywords.map{|w| "(text ILIKE '%#{w}%')"}.join(' OR ') 

records.each do |filing| 
    filename = "#{directory}/#{filing.doc_id}.html" 
    File.open(filename,"w").puts filing.text 
end 

否則,你可以使用「類似於」或「POSIX」 http://www.postgresql.org/docs/8.1/static/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP那麼你可以使用正則表達式。

Eightk.where "text SIMILAR TO '%(#{keywords.join '|' })%'" 

POSIX會允許你這樣做的起點和文字,讓你可以檢查僅在全字匹配的結束檢查(如將匹配,deathdeathdeath.但不deathbed等。

我會離開的正則表達式的東西的人有更大的正則表達式,富:)

+0

嗨@BigFive感謝。有沒有一種方法可以匹配確切的單詞而不是與活動記錄中的類似命令?我的搜索越來越臃腫 – wazza2013 2014-09-01 10:13:28