SQL Server 2012字符串分割

我在我的數據庫中填充了大字符串的表，我試圖分析那些字符串中的文本。SQL Server 2012字符串分割

我有這樣的事情。

On August the third the pope talked on vatican square.... bla bla

我想知道的是這樣的

Word | Count 
ON  | 2 
August | 1 
the | 2 
third | 1

等一個，我知道我會打破這些字符串，找到空白「」，」‘’。「等等，這樣函數就知道這是一個字，之前是字符串<的長度。

結果應該顯示在上面的新表格中。

我到底會如何用SQL函數來實現？

來源

2016-01-21 XinkZ

首先，SQL Server擁有自己的全文搜索*和*數據挖掘服務。你不應該嘗試自己解析文本。其次，有幾十個重複問題和幾個選項來分割一個字符串[如下所示]（http://sqlperformance.com/2012/07/t-sql-queries/split-strings） –

你可能分裂和計數：

DECLARE @t NVARCHAR(400)='On August the third the pope talked on vatican square.' 

;WITH tally AS 
(
    SELECT TOP 1000 rn = ROW_NUMBER() OVER(ORDER BY 1/0) 
    FROM master..spt_values 
), cte AS(  
SELECT REPLACE(REPLACE(SUBSTRING(' ' + @t + ' ', rn + 1, 
CHARINDEX(' ', ',' + @t + ' ', rn + 1) - rn -1),'.', ''), ',','') AS word 
FROM tally 
WHERE rn <= LEN(' ' + @t + ' ') - 1 
    AND SUBSTRING(' ' + @t + ' ', rn, 1) = ' ' 
) 
SELECT word, COUNT(*) AS total 
FROM cte 
GROUP BY word;

LiveDemo

輸出：

╔═════════╦═══════╗ 
║ word ║ total ║ 
╠═════════╬═══════╣ 
║ August ║  1 ║ 
║ On  ║  2 ║ 
║ pope ║  1 ║ 
║ square ║  1 ║ 
║ talked ║  1 ║ 
║ the  ║  2 ║ 
║ third ║  1 ║ 
║ vatican ║  1 ║ 
╚═════════╩═══════╝

來源

2016-01-21 11:51:49 lad2025

有幾十個重複的問題。這只是[其中一個選項]（http://sqlperformance.com/2012/07/t-sql-queries/split-strings），並不是最快的。數量級上的最快速度是使用SQLCLR幫助函數 –

DECLARE @text NVARCHAR(MAX) = 'On August the third the pope talked on vatican square.... bla bla' 

SELECT t.display_term, COUNT(*) 
FROM sys.dm_fts_parser('"' + @text + '"', 1049, NULL, 1) t 
WHERE t.special_term = 'Exact Match' 
GROUP BY t.display_term

輸出 -

--------------- ----------- 
august   1 
bla    2 
on    2 
pope   1 
square   1 
talked   1 
the    2 
third   1 
vatican   1

來源

2016-01-21 11:52:13 Devart

+ 1用於與*無關的重複問題中發佈的*技術。雖然表現如何？ –

好問題;）我真的測試過'2Gb'表上的'dm_fts_parser'（每行'〜1Mb'）。比'CTE'和'XML'快，但比'CLR'慢。另外，用戶必須擁有'sysadmin'權限...... – Devart

SQL Server 2012字符串分割

回答

相關問題