I've got data in SQL Server 2005 that contains HTML tags and I'd like to strip all that out, leaving just the text between the tags. Ideally also replacing things like <
with <
, etc.
Is there an easy way to do this or has someone already got some sample T-SQL code?
I don't have the ability to add extended stored procs and the like, so would prefer a pure T-SQL approach (preferably one backwards compatible with SQL 2000).
I just want to retrieve the data with stripped out HTML, not update it, so ideally it would be written as a user-defined function, to make for easy reuse.
So for example converting this:
<B>Some useful text</B>
<A onclick="return openInfo(this)"
href="http://there.com/3ce984e88d0531bac5349"
target=globalhelp>
<IMG title="Source Description" height=15 alt="Source Description"
src="/ri/new_info.gif" width=15 align=top border=0>
</A>> <b>more text</b></TD></TR>
to this:
Some useful text > more text
There is a UDF that will do that described here:
User Defined Function to Strip HTML
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO
Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.