2012-01-11 71 views
0

我在SQL Server列中有一些html內容,我想從html中讀取內容。如何從SQL Server列中的html中獲取數據

例如:

<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421"> 
    <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, 'Options are required.')" onclick="design_validate_choice(1, -1, this, 'Options are required.')" onblur="design_validate_choice(1, -1, this, 'Options are required.')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical"> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" /> 
     <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" /> 
     <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" /> 
     <label contenteditable="true" unselectable="off" for="ID477116">5-7</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" /> 
     <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label> 
    </li> 
    </ol> 
</ektdesignns_choices><input type="submit" value="Vote" /> 

我想讀這個網站的所有標籤。任何人有任何想法,我該怎麼辦?

回答

1

如果您的HTML確實符合XHTML,如果您有存儲在XML列在SQL Server表中的HTML,那麼你可以從它在T-SQL中使用XQuery檢索您的標籤:

DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML) 

INSERT INTO @HtmlTbl(Html) VALUES('<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421"> 
    <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, ''Options are required.'')" onclick="design_validate_choice(1, -1, this, ''Options are required.'')" onblur="design_validate_choice(1, -1, this, ''Options are required.'')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical"> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" /> 
     <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" /> 
     <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" /> 
     <label contenteditable="true" unselectable="off" for="ID477116">5-7</label> 
    </li> 
    <li> 
     <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" /> 
     <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label> 
    </li> 
    </ol></ektdesignns_choices><input type="submit" value="Vote" />') 

這將檢索所有從你的(X)元素HTML作爲一個單獨的XML字符串:

SELECT 
    Html.query('//label') 
FROM @HtmlTbl 
WHERE ID = 1 

輸出:

<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label> 
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label> 
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label> 
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label> 

或者這將選擇<label>標籤的所有內容,每行一個:

SELECT 
    C.value('(.)[1]', 'varchar(1000)') 
FROM @HtmlTbl 
CROSS APPLY Html.nodes('//label') AS T(C) 
WHERE ID = 1 

輸出:

1 or fewer 
2-4 
5-7 
8 or more 
0

從數據庫中提取數據,然後使用HTML解析器提取所需的信息。它會讓你的生活變得更容易很多

不管你做什麼,請不要嘗試,除非你是尋找一個正則表達式匹配的數據使用RegExs。 (因爲HTML不是常規語言,它通常會導致比解決問題更多的問題)

0

如果您擁有的所有HTML格式都與此格式相同,則可以將其轉換爲XML並使用一些XQuery來查找標籤節點,

select T.N.value('.', 'nvarchar(100)') 
from Table 
    cross apply XMLCol.nodes('//label') as T(N) 
相關問題