拆分一個字符串，如果一個子存在

「尼康AW130 16MP點和拍攝數碼相機黑色5倍光學變焦」

「尼康AW130 16 MP點&傻瓜相機黑色」

我想比較字符串這樣的，你可以看到他們兩個都是一樣的，基於空間時，我記號化，並在第二個字符串比較每個字16和MP之間的空間將CAU的e實際上並不存在的差異。

是否有無論如何我可以在第一個字符串中添加一個空間，其中16MP是在一起，這樣我就可以根據空間進行標記。

val productList=List("Nikon Coolpix AW130 16MP Point and Shoot Digital Camera Black with 5x Optical Zoom","Nikon Coolpix AW130 16 MP Point & Shoot Camera Black") 
val tokens = ListBuffer[String]() 
    productList.split(" ").foreach(x => { 
     tokens += x 
    }) 

    val res = tokens.toList

來源

2015-10-15 Naba

'replaceAll（「\\ b16 MP \\ b」，「16MP」）'？或'replaceAll（「\\ b16MP \\ b」，「16 MP」）' –

你到底想要什麼？比較兩個字符串，不管空間？ – dsharew

你能描述這些字符串的格式嗎？我認爲你不希望我們給你一個這些例子特有的答案 – Dici

你可以用RegEx來做到這一點。搜索兩種格式並將其替換爲一個特定格式。

來源

2015-10-15 12:18:30 Alexander

如果你只是想刪除一個號碼和一個固定MP串之間的空間，你可以使用下面的正則表達式：

scala> "Nikon Coolpix AW130 16 MP Point & Shoot Camera Black".replaceAll("""(\d+) ?(MP)""", "$1$2") 
res13: String = Nikon Coolpix AW130 16MP Point & Shoot Camera Black

的(\d+)一部分的任何數量的匹配至少有1位
的?（注意空格）匹配0或一個空格
的(MP)部分字符串匹配字面上的210。
$1$2將第一個圓括號(\d+)的匹配內容的內容打印到第二個匹配的(MP)的匹配項上 - 如果有空格，則省略該空格。

之後，16MP tokenS應該相等。不過，您仍然會遇到and與&的問題。

來源

2015-10-15 12:21:44

太棒了！這是我想要的還添加了幾個更多的模式，如：str.replaceAll（「」「（\ d +）？（MP | GB | mm | cm）」「」 – Naba

你不給足夠的細節有關這些字符串的格式，但是從這個例子我可以推斷出這樣的事情：(\w+) (\d+)\s*MP Point.*

然後，您可以解析字符串和閱讀正則表達式的羣體比較產品。

下面是一個例子：

def main(args: Array[String]): Unit = { 
    val s0 = "Nikon Coolpix AW130 16MP Point and Shoot Digital Camera Black with 5x Optical Zoom" 
    val s1 = "Nikon Coolpix AW130 16 MP Point & Shoot Camera Black" 
    println(Product.parse(s0) == Product.parse(s1)) // prints true 
} 

case class Product(name: String, resolution: Int) 
object Product { 
    private val regex = new Regex("(\\w+) (\\d+)\\s*MP Point.*", "productName", "resolution") 
    def parse(s: String) = regex.findFirstMatchIn(s) match { 
     case Some(m) => Product(m.group("productName"), m.group("resolution").toInt) 
     case None => throw new RuntimeException("Invalid format") 
    } 
}

來源

2015-10-15 12:22:47 Dici

而是分裂很容易做正則表達式替換;連續。

public static boolean equivalent(Sting a, String b) { 
    normalize(a).equalsIgnoreCase(normalize(b)); 
} 

private static String normalize(String s) { 
    return s.replaceAll("(\\d+)", "$0 ") // At least one space after digits. 
     .replaceAll("\\bLimited\\b", "Ltd") // Example. 
     .replace("'", "") // Example. 
     .replace("&", " and ") 
     .replaceAll("\\s+", " ") // Multiple spaces to one. 
     .trim(); 
}

或者對標準化字符串進行拆分（以獲取關鍵字）。

來源

2015-10-15 12:29:09

拆分一個字符串，如果一個子存在

回答

相關問題