解析「說明（標記）」，以「說明，標籤」

我有一個文本文件，這樣行許多1000，這是類的描述與括號括起來解析「說明（標記）」，以「說明，標籤」

Chemicals (chem) 
Electrical (elec)

我的關鍵字更好的辦法需要將這些行逗號分隔值轉換就像這樣：

Chemicals, chem 
Electrical, elec

我現在用的就是這樣的：

lines = line.gsub!('(', ',').gsub!(')', '').split(',')

我想知道是否有更好的方法來做到這一點。

留給後人，這是完整的代碼（基於答案）

require 'rubygems' 
require 'csv' 

csvfile = CSV.open('output.csv', 'w') 
File.open('c:/categories.txt') do |f| 
    f.readlines.each do |line| 
    (desc, cat) = line.split('(') 
    desc.strip! 
    cat.strip! 
    csvfile << [desc, cat[0,cat.length-1]] 
    end 
end

來源

2011-06-08 Kinjal Dixit

爲什麼你在最後使用他拆分方法？將實際創建一個像'[描述，關鍵詞]' – robertodecurnex 2011-06-08 14:08:21

@NeX數組通過創建一個數組，我可以做一個csvfile <<線，其中csvfile是CSV.open，將採取一切逃跑的照顧。 – 2011-06-08 17:51:53

@sawa你是對的。多年來向初學者解釋的東西讓我習慣於用圓括號，大括號，尖括號和方括號來表示。 – 2011-06-08 17:54:16

嘗試是這樣的：

line.sub!(/ \((\w+)\)$/, ', \1')

的\1將與給定的正則表達式的第一場比賽被替換（在這種情況下，它將始終是category關鍵字）。所以它將基本上改變(chem)與, chem。

讓我們用一個文本文件中創建一個例子：

lines = [] 
File.open('categories.txt', 'r') do |file| 
    while line = file.gets 
    lines << line.sub(/ \((\w+)\)$/, ', \1') 
    end 
end

基於問題的更新，我可以提出這樣的：

require 'csv' 

csv_file = CSV.open('output.csv', 'w') 

File.open('c:/categories.txt') do |f| 
    f.each_line {|c| csv_file << c.scan(/^(.+) \((\w+)\)$/)} 
end 

csv_file.close

來源

2011-06-08 14:15:49 robertodecurnex

提供的代碼不會在描述和關鍵字之間插入逗號。但是謝謝你的努力。 – 2011-06-09 03:38:21

改爲接受的答案基準 – 2011-06-12 17:02:17

-1

一無所知紅寶石，但很容易在PHP

preg_match_all('~(.+)\((.+)\)~','Chemicals (chem)',$m); 

$result = $m[1].','.$m[2];

來源

2011-06-08 14:23:47 sam

從Ruby 1.9開始，您可以在一個方法調用中執行它：

str = "Chemicals (chem)\n" 
mapping = { ' (' => ', ', 
      ')' => ''} 

str.gsub(/ \(|\)/, mapping) #=> "Chemicals, chem\n"

來源

2011-06-08 14:29:49 steenslag

在Ruby中，一個更清潔，更高效，方式做到這一點是：

description, tag = line.split(' ', 2) # split(' ', 2) will return an 2 element array of 
             # the all characters up to the first space and all characters after. We can then use 
             # multi assignment syntax to assign each array element in a different local variable 
tag = tag[1, (tag.length - 1) - 1] # extract the inside characters (not first or last) of the string 
new_line = description << ", " << tag # rejoin the parts into a new string

這將是計算速度（如果你有很多行），因爲它使用直接的字符串操作，而不是正則表達式。

來源

2011-06-08 17:10:37 hundredwatt

@hundredwatt後，速度是很重要的。 – 2011-06-08 17:55:47

說得太快。有空格的詞，如「染料和中間體」。我已經修改在分裂「（」和從所述第一項目和第二項目中刪除最後一個字符的例子中 – 2011-06-08 18:09:19

這既是不太明顯，也比使用一個正則表達式慢。當我使用測試它針對「化學品（化學）」 Ruby 1.9，它需要NeX's或者steenslag的解決方案的兩倍以上。「 – 2011-06-08 18:11:11

無需操縱字符串。只需獲取數據並將其輸出到CSV文件即可。假設你有這樣的數據：

化工（化學）

電氣（ELEC）

染料&中間體（染料）

這應該工作：

File.open('categories.txt', 'r') do |file| 
    file.each_line do |line| 
    csvfile << line.match(/^(.+)\s\((.+)\)$/) { |m| [m[1], m[2]] } 
    end 
end

來源

2011-06-08 20:21:09 seph

@ 100watt答案中與討論相關的基準：

require 'benchmark' 

line = "Chemicals (chem)" 

# @hundredwatt 
puts Benchmark.measure { 
    100000.times do 
    description, tag = line.split(' ', 2) 
    tag = tag[1, (tag.length - 1) - 1] 
    new_line = description << ", " << tag 
    end 
} # => 0.18 

# NeX 
puts Benchmark.measure { 
    100000.times do 
    line.sub!(/ \((\w+)\)$/, ', \1') 
    end 
} # => 0.08 

# steenslag 
mapping = { ' (' => ', ', 
    ')' => ''} 
puts Benchmark.measure { 
    100000.times do 
    line.gsub(/ \(|\)/, mapping) 
    end 
} # => 0.08

來源

2011-06-09 18:29:18

每天都在學習新東西！ – 2011-06-12 17:00:55

解析「說明（標記）」，以「說明，標籤」

回答

相關問題