我正在編寫一款軟件,旨在識別已放入Web服務器(CMS)但不再需要且應該/可以刪除的文件。如何優化此SQL查詢
要開始我試圖手動重現所有必需的步驟。
我正在使用在webroot中執行的批處理腳本來識別服務器上的所有(相關)文件。然後,我在導入列表,SQL Server和表看起來像這樣:
id filename
1 filename1.docx
2 files/file.pdf
3 files/filename2.docx
4 files/filename3.docx
5 files/file1.pdf
6 file2.pdf
7 file4.pdf
我也有一個CMS數據庫(Alterian /即時性CMC 6.X),其中有2個表中存儲網頁內容:page_data和PageXMLArchive。
我想掃描數據庫以查看是否從站點內容的任何位置引用了第一個表中的文件 - page_data表中的p_content列和PageXMLArchive表中的PageXML列。
所以我有一個循環獲取每個文件名並檢查它是否在任何這些表中引用,如果它是跳過它,如果它不是它將它添加到臨時表。
在查詢結束時顯示臨時表。下面
查詢:
DECLARE @t as table (_fileName nvarchar(255))
DECLARE @row as int
DECLARE @result as nvarchar(255)
SET @row = 1
WHILE(@row <= (SELECT COUNT(*) FROM ListFileReport))
BEGIN
SET @result = (SELECT [FileName] FROM ListFileReport WHERE id = @row)
IF ((SELECT TOP(1) p_content FROM page_data WHERE p_content LIKE '%' + LTRIM(RTRIM(@result)) + '%') IS NULL) OR ((SELECT TOP(1) PageXML FROM PageXMLArchive WHERE PageXML LIKE '%' + LTRIM(RTRIM(@result)) + '%') IS NULL)
BEGIN
INSERT INTO @t (_fileName) VALUES(@result)
END
SET @row = @row + 1
END
select * from @t
不幸的是,由於我的可憐的SQL技能的查詢需要2個多小時,執行和超時。
如何禁用該查詢或將其更改爲實現類似的功能,而無需在ntext字段上運行1000個WHERE x LIKE語句?我無法對數據庫進行任何更改,它必須保持不變(否則不會得到支持 - 對我們的客戶來說是一件大事)。
感謝
編輯: 目前我工作圍繞這一問題通過批處理的結果幾百個在同一時間。它的工作,但需要永遠。
編輯:
我還能使用全文搜索來實現這一目標?我願意爲數據庫創建一個快照並在複製的基礎上進行工作,如果有改變模式的方式來達到預期的結果的話。
page_data表:
USE [TD-VMB-01-STG]
GO
/****** Object: Table [dbo].[page_data] Script Date: 12/13/2011 13:19:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[page_data](
[p_page_id] [int] NOT NULL,
[p_title] [nvarchar](120) NULL,
[p_link] [nvarchar](250) NULL,
[p_content] [ntext] NULL,
[p_parent_id] [int] NULL,
[p_top_id] [int] NULL,
[p_stylesheet] [nvarchar](50) NULL,
[p_author] [nvarchar](50) NULL,
[p_last_update] [datetime] NULL,
[p_order] [smallint] NULL,
[p_window] [nvarchar](10) NULL,
[p_meta_keywords] [nvarchar](1000) NULL,
[p_meta_desc] [nvarchar](2000) NULL,
[p_type] [nvarchar](1) NULL,
[p_confirmed] [int] NOT NULL,
[p_changed] [int] NOT NULL,
[p_access] [int] NULL,
[p_errorlink] [nvarchar](255) NULL,
[p_noshow] [int] NOT NULL,
[p_edit_parent] [int] NULL,
[p_hidemenu] [int] NOT NULL,
[p_subscribe] [int] NOT NULL,
[p_StartDate] [datetime] NULL,
[p_EndDate] [datetime] NULL,
[p_pageEnSDate] [int] NOT NULL,
[p_pageEnEDate] [int] NOT NULL,
[p_hideexpiredPage] [int] NOT NULL,
[p_version] [float] NULL,
[p_edit_order] [float] NULL,
[p_order_change] [datetime] NOT NULL,
[p_created_date] [datetime] NOT NULL,
[p_short_title] [nvarchar](30) NULL,
[p_authentication] [tinyint] NOT NULL,
CONSTRAINT [aaaaapage_data_PK] PRIMARY KEY NONCLUSTERED
(
[p_page_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_order] DEFAULT (0) FOR [p_order]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_con__1CF15040] DEFAULT (0) FOR [p_confirmed]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_cha__1DE57479] DEFAULT (0) FOR [p_changed]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_acc__1ED998B2] DEFAULT (1) FOR [p_access]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_nos__1FCDBCEB] DEFAULT (0) FOR [p_noshow]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_edi__20C1E124] DEFAULT (0) FOR [p_edit_parent]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_hid__21B6055D] DEFAULT (0) FOR [p_hidemenu]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_subscribe] DEFAULT (0) FOR [p_subscribe]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_pageEnSDate] DEFAULT (0) FOR [p_pageEnSDate]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_pageEnEDate] DEFAULT (0) FOR [p_pageEnEDate]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_hideexpiredPage] DEFAULT (1) FOR [p_hideexpiredPage]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_version] DEFAULT (0) FOR [p_version]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_edit_order] DEFAULT (0) FOR [p_edit_order]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_order_change] DEFAULT (getdate()) FOR [p_order_change]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_created_date] DEFAULT (getdate()) FOR [p_created_date]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_authentication] DEFAULT ((0)) FOR [p_authentication]
GO
PageXMLArchive表:
USE [TD-VMB-01-STG]
GO
/****** Object: Table [dbo].[PageXMLArchive] Script Date: 12/13/2011 13:20:00 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PageXMLArchive](
[ArchiveID] [bigint] IDENTITY(1,1) NOT NULL,
[P_Page_ID] [int] NOT NULL,
[p_author] [nvarchar](100) NULL,
[p_title] [nvarchar](400) NULL,
[Version] [int] NOT NULL,
[PageXML] [ntext] NULL,
[ArchiveDate] [datetime] NOT NULL,
CONSTRAINT [PK_PageXMLArchive] PRIMARY KEY CLUSTERED
(
[ArchiveID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[PageXMLArchive] ADD CONSTRAINT [DF_PageXMLArchive_ArchiveDate] DEFAULT (getdate()) FOR [ArchiveDate]
GO
你可以在問題中包含你的page_data和PageXMLArchive表的結構嗎? –
對於初學者,您應該將'SELECT COUNT(*)'移到WHILE之上,並將結果放入一個變量中。避免爲每一行執行該查詢。 – Johan
@MarkBannister添加了表格結構。謝謝 – LukeP