2016-04-27 115 views
0

如何從字符串中刪除電話號碼(如果它們的格式不同)?從文本中刪除電話號碼

比如我有:

text=' 
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
    Smart Functionality: Yes - xx TV Streaming Platform 
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78' 

也怎樣從文本中刪除這些格式

09414241441 095-41-41-441 (096)4141441 091-123-11-22 094 00 111 222 

如何刪除這些電話號碼?

(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 

我試過gsub,但它刪除了所有相似的數字。

+1

後你已經嘗試了什麼。你使用的是正則表達式嗎? –

+1

你需要刪除哪些電話號碼格式? [有很多。](https://en.wikipedia.org/wiki/National_conventions_for_writing_telephone_numbers) –

+0

有沒有一些特定的格式,它可以是不同的 – user

回答

3

您可以使用:

text.gsub(/\([0-9]*\)\s[0-9]*(-|\s)[0-9]*(-|\s)[0-9]*/, '') 

這個人會刪除手機中的你在文本中指定的格式:

  • (XXX)XXX-XX-XX
  • (XXX) XXX XX XX

並且總是在您嘗試編寫正則表達式時嘗試使用此Rubular

  • \([0-9]*\)需要捕獲數的括號(...)內,但括號在正則表達式的特殊字符,從而增加\之前,[0-9]意味着需要一個號碼,作爲內部,從而增加*均值爲0或它不僅1號更多數量應該是內部的,

  • \s需要加上一個空格,

  • (-|\s)需要破折號(-)(OR |)空間(\s

爲其他格式,如:

  • XXXXXXXXXX
  • XXX-XX-XX-XXX
  • (XXX)XXXXXXX

旁上方的一個,與以下相關:

text.gsub(/\(*[0-9]+(\)|-)+\s*[0-9]+(-|\s)*[0-9]+(-|\s)*[0-9]+|[0-9]{10}/, '') 
+0

正則表達式非常有用,但有點複雜的瞭解 – user

+0

也如何從文本中刪除這些格式 '09414241441 095-41-41-441(096)4141441' – user

+0

在帖子中寫新備註現在,只需一分鐘 –

1

根據您的格式,下面的正則表達式的作品

/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/ 

的Ruby代碼

print text.gsub(/\(\d{3}\)\s+\d{3}[-\s]\d{2,3}[-\s]\d{2}/, "") 

Ideone Demo

0

如果你的文字是固定的格式,這些數字將永遠是第一行在塊中,然後簡單地刪除第一行:

text=' 
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 
    Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
    Smart Functionality: Yes - xx TV Streaming Platform 
    Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78' 

text.strip 
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 
text.strip.lines 
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"] 
text.strip.lines[1..-1].join 
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 

或者:

lines = text.strip.lines 
# => ["(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"] 
lines.shift 
# => "(093) 123-34-56 (068) 123 45 67 (095) 123 456 78\n" 
lines.join 
# => " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 

使用正則表達式和gsub可以工作,但它也更容易成爲一個維護問題。

如果電話號碼將永遠是一條線,但不一定是第一,那麼我仍然使用lines打破文本到一個數組,但我會用reject用正則表達式來數模式相匹配檢查每一行,並拒絕一個與電話號碼般的正則表達式匹配:在使用strip導致領先的「\ n」被保留

lines = text.lines 
lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] } 
# => ["\n", " Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n", " Smart Functionality: Yes - xx TV Streaming Platform\n", " Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78"] 

lines.reject{ |l| l[/\(\d{3}\) \d{3}[ -]\d+{2,3}[ -]\d{2,3}/] }.join 
# => "\n Refresh Rate: 60Hz (Native). Backlight: LED (Full Array)\n Smart Functionality: Yes - xx TV Streaming Platform\n Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, TV with stand (inches) : 28.98x18.68x7.78" 

注意。

使用lines將文本轉換爲數組有助於隔離任何損壞,以防其他情況觸發模式匹配,從而導致文本無意中損壞。

這種方法出現故障時,電話號碼分散在整個文本中。儘管如此,我仍然可能會使用這種方法將文本減少到單獨的行,如果存在誤報,也可以減少可能的損害。

0
phone_formats = [/(\d{3}) \d{3}-\d{4}/, 
       /\d{3}-\d{3}-\d{4}/, 
       /\d{3} \d{3} \d{4}/, 
       /\(\d{3}\) \d{3} \d{3} \d{2}/, 
       /\(\d{3}\) \d{3} \d{2} \d{2}/, 
       /\(\d{3}\) \d{3}-\d{2}-\d{2}/, 
       /\d{3}-\d{3}-\d{2}-\d{2}/, 
       /\d{3}-\d{3}-\d{2}-\d{2}/] 

r = Regexp.union(phone_formats) 
    #=> /(?-mix:(\d{3}) \d{3}-\d{4})| 
    # (?-mix:\d{3}-\d{3}-\d{4})| 
    # (?-mix:\d{3} \d{3} \d{4})| 
    # (?-mix:\(\d{3}\) \d{3} \d{3} \d{2})| 
    # (?-mix:\(\d{3}\) \d{3} \d{2} \d{2})| 
    # (?-mix:\(\d{3}\) \d{3}-\d{2}-\d{2})| 
    # (?-mix:\d{3}-\d{3}-\d{2}-\d{2})| 
    # (?-mix:\d{3}-\d{3}-\d{2}-\d{2})/ 

(我已經打破各|以提高可讀性後Regexp.union的返回值。)

text =<<_ 
(093) 123-34-56 (068) 123 45 67 (095) 123 456 78 
Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
Smart Functionality: Yes - xx TV Streaming Platform 
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, 
TV with stand (inches) : 28.98x18.68x7.78 
_ 

puts text.gsub(r,'') 

Refresh Rate: 60Hz (Native). Backlight: LED (Full Array) 
Smart Functionality: Yes - xx TV Streaming Platform 
Dimensions (W x H x D): TV without stand (inches) : 28.98x17x3.18, 
TV with stand (inches) : 28.98x18.68x7.78