This algorithm can be used to strip out HTML tags too.
With reference to http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=89973
and http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=90000CREATE FUNCTIONdbo.fnParseRTF
(
@rtf VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE@Stage TABLE
(
Chr CHAR(1),
Pos INT
)
INSERT@Stage
(
Chr,
Pos
)
SELECTSUBSTRING(@rtf, Number, 1),
Number
FROMmaster..spt_values
WHEREType = 'p'
AND SUBSTRING(@rtf, Number, 1) IN ('{', '}')
WHILE 1 = 1
BEGIN
SELECT TOP 1@Pos1 = s1.Pos,
@Pos2 = s2.Pos
FROM@Stage AS s1
INNER JOIN@Stage AS s2 ON s2.Pos > s1.Pos
WHEREs1.Chr = '{'
AND s2.Chr = '}'
ORDER BYs2.Pos - s1.Pos
I have a table with a column that has html text. The column with html text is pretty big datatye varchar(max)... I wanted to check if any of you have any function that I can use to Strip out the HTML tags... I saw couple of version online, but it was running too slow..
I had a problem with the ntext datatype. I need to strip the HTML tags out of a ntext datatype column. I have sample query for that, which works fine for STRING, as stuff is the string function, what to do for ntext field.
=======The Process follows like this =========
--************************************** -- -- Name: A relational technique to strip -- the HTML tags out of a string -- Description:A relational technique to -- strip the HTML tags out of a string. Th -- is solution demonstrates how to use simp -- le tables & search functions effectively -- in SQL Server to solve procedural / ite -- rative problems.
-- This table contains the tags to be re -- placed. The % in <head%> -- will take care of any extra informati -- on in the tag that you needn't worry -- about as a whole. In any case, this t -- able contains all the tags that needs -- to be search & replaced. CREATE TABLE #html ( tag varchar(30) ) INSERT #html VALUES ( '<html>' ) INSERT #html VALUES ( '<head%>' ) INSERT #html VALUES ( '<title%>' ) INSERT #html VALUES ( '<link%>' ) INSERT #html VALUES ( '</title>' ) INSERT #html VALUES ( '</head>' ) INSERT #html VALUES ( '<body%>' ) INSERT #html VALUES ( '</html>' ) go -- A simple table with the HTML strings CREATE TABLE #t ( id tinyint IDENTITY , string varchar(255) ) INSERT #t VALUES ( '<HTML><HEAD><TITLE>Some Name</TITLE> <LINK REL="stylesheet" HREF="/style.css" TYPE="text/css" ></HEAD> <BODY BGCOLOR="FFFFFF" VLINK="#444444"> SOME HTML text after the body</HTML>' ) INSERT #t VALUES ( '<HTML><HEAD><TITLE>Another Name</TITLE> <LINK REL="stylesheet" HREF="/style.css"></HEAD> <BODY BGCOLOR="FFFFFF" VLINK="#444444">Another HTML text after the body</HTML>' ) go -- This is the code to strip the tags out. -- It finds the starting location of eac -- h tag in the HTML string , -- finds the length of the tag with the -- extra properties if any. This is -- done by locating the end of the tag n -- amely '>'. The same is done -- in a loop till all tags are replaced.
BEGIN TRAN WHILE exists(select * FROM #t JOIN #html on patindex('%' + tag + '%' , string ) > 0 ) UPDATE #t SET string = stuff( string , patindex('%' + tag + '%' , string ) , charindex( '>' , string , patindex('%' + tag + '%' , string ) ) - patindex('%' + tag + '%' , string ) + 1 , '' ) FROM #t JOIN #html ON patindex('%' + tag + '%' , string ) > 0 SELECT * FROM #t rollback
Does anyone know how to get rid of rtf tags that are stored in the table? I need to filter out the data and wondering if there is a utility on the SQL Server that can do it.
I have generated a database for my website, I intend on using software that will convert the database into static web pages. Big problem I have I am not a programmer, but I know a tiny bit about tags etc. for search engines. The meta tag description is what I want to create using a field in this database. The software I am about to use has a sql builder is there anyway it could be done be highlighting the relavent field and using sql language. PLEASE someone Help This problem has been driving me around the twist.
There are two tables A and B where asset tags are present, but in one table in rows and in another in column wise.
for eg ASSet Tag SR-062009-00032966 SR-062009-00032962 SR-072009-00020572 SR-072009-00020571 SR-072009-00020585 HH-092009-00038342
Table B field 1 -->Asset TAG Record 1-->SR-072009-00020572,SR-072009-00020571,SR-062009-00020685,SR-072009-00001592,SR-072009-00001376,SR-062009-00020683,SR-092009-00001617
field 2 --> Material code REcord 1-->121 REcord 2-->123
What is the query so that asset tag of A matches with each and every asset tag table of B and output comes as
Output Asset TAg -------- MAterial Code SR-062009-00032966 SR-062009-00032962 SR-072009-00020572 ------121 SR-072009-00020571 -------121 SR-072009-00020585
I'm currently using an Execute SQL Task to return XML data from a query into an SSIS string variable. In my FOR XML clause in SQL I'm specifying a certain name for my root tag, called "Accounts". This works great in Management Studio, however, the Execute SQL Task appends a <ROOT> and </ROOT> tag to the start and end of the string, so now it looks like:
<ROOT><Accounts>...all my elements...</Accounts></ROOT>
I'd like to remove the ROOT tags so that the <Accounts> tags are actually the root for this doc. What would be the best way to remove the ROOT tags from the SSIS string variable?
declare @xmldoc as xml select @xmldoc = '<Text>This is firstline<Break />This is second line<Break />This is third line</Text>' select @xmldoc.value('(/Text)[1]','varchar(max)')Result is: "This is firstlineThis is second lineThis is third line"
My problem is, that the <Break /> tags within the text are removed in the conversion to varchar. How to preserve the such tags in the varchar output? Or to get the <Break /> tags "translated" to e.g. CHAR(10)?
Hi, I was wondering if any SQL Server gurus out there could help me...I have a table which contains text resources for my application. The text resources are multi-lingual so I've read that if I add a html language indicator meta tag e.g.<META NAME="MS.LOCALE" CONTENT="ES">and store the text in a varbinary column with a supporting Document Type column containing ".html" of varchar(5) then the full text index service should be intelligent about the language word breakers it applies when indexing the text. (I hope this is correct technique for best multi-lingual support in a single table?)However, when I come to query this data the results always return 0 rows (no errors are encountered). e.g.DECLARE @SearchWord nvarchar(256)SET @SearchWord = 'search' -- Yes, this word is definitely present in my resources.SELECT * FROM Resource WHERE CONTAINS(Document, @SearchWord)I'm a little puzzled as Full Text search is working fine on another table that employs an nvarchar column (just plain text, no html).Does the filter used for full text indexing of html expect certain tags to be present as standard? E.g. <html> and <body> tags? At present the data I have stored might look like this (no html or body wrapping tags):Example record 1 data: <META NAME="MS.LOCALE" CONTENT="EN">Search for keywords:Example record 2 data: <META NAME="MS.LOCALE" CONTENT="EN">Sorry no results were found for your search.etc.Any pointers / suggestions would be greatly appreciated. Cheers,Gavin.UPDATE: I have tried wrapping the text in more usual html tags and re-built the full text index but I still never get any rows returned for my query results. Example of content wrapping tried - <HTML><HEAD><META NAME="MS.LOCALE" CONTENT="EN"></HEAD><BODY>Test text.</BODY></HTML>I've also tried stripping all html tags from the content and set the Document Type column = .txt but I still get no rows returned?!?
Hi, I was wondering if any SQL Server gurus out there could help me...
I have a table which contains text resources for my application. The text resources are multi-lingual so I've read that if I add a html language indicator meta tag e.g. <META NAME="MS.LOCALE" CONTENT="ES"> and store the text in a varbinary column with a supporting Document Type column containing ".html" of varchar(5) then the full text index service should be intelligent about the language word breakers it applies when indexing the text. (I hope this is correct technique for best multi-lingual support in a single table?)
However, when I come to query this data the results always return 0 rows (no errors are encountered). e.g. DECLARE @SearchWord nvarchar(256) SET @SearchWord = 'search' -- Yes, this word is definitely present in my resources. SELECT * FROM Resource WHERE CONTAINS(Document, @SearchWord)
I'm a little puzzled as Full Text search is working fine on another table that employs an nvarchar column (just plain text, no html).
Does the filter used for full text indexing of html expect certain tags to be present as standard? E.g. <html> and <body> tags? At present the data I have stored might look like this (no html or body wrapping tags):
Example record 1 data: <META NAME="MS.LOCALE" CONTENT="EN">Search for keywords:
Example record 2 data: <META NAME="MS.LOCALE" CONTENT="EN">Sorry no results were found for your search.
etc.
Any pointers / suggestions would be greatly appreciated. Cheers, Gavin.
UPDATE: I have tried wrapping the text in more usual html tags and re-built the full text index but I still never get any rows returned for my query results. Example of content wrapping tried - <HTML><HEAD><META NAME="MS.LOCALE" CONTENT="EN"></HEAD><BODY>Test text.</BODY></HTML>
I've also tried stripping all html tags from the content and set the Document Type column = .txt but I still get no rows returned?!?
I have a column of 5 comma-separated-value strings:
stringA, stringB, stringC, stringD, stringE
The strings are GUID's with the hyphen stripped and made all uppercase so they are completely random. I need to be able to remove any one of the strings including the comma, in a stored procedure and I am not sure how to accomplish this.
SELECT tickets FROM users WHERE CONTAINS (tickets, @ticket)
IF @@rowcount > 0 REMOVE STUFF HERE SET @valid = 1 ELSE SET @valid = 0
So if stringB gets passed in as @ticket then the new value in the column would be :
I have a phone number string (416) 555-5555 in a table. I'd like to perform a search on the string so that the user is able to pass any number, and the query returns all phone numbers like it. What I'd like to do is to strip out the brackets and dashes and perform a like search.
Hi, I use RS 2005 SP2. I add a text item value " smth..." No problem in preview, but export to pdf strip out whitespace characters and then my report in pdf like: "smth..."
hey, what the best way of stripping out a list of characters from a specified field in a table. e.g If first name consists of ABCD'E-FSA, we wnat to strip the ' and the -. There is about 15-20 characters like that. what's the best way of doing it other encapsulating in the replace function that many times. thanks zoey
I have a table that I've imported into SQL - there is a field in there for date that must have used now(); as its default value (access).
So the values are something like:
01/11/2004 12:16:42
I need a way to change the data so that the time element removed is the field just holds the date. Failing that a way to insert this from the existing field into a new field stripping the date off en route would be a great help.
Is there any non-obnoxious way (eg: without have to result to using datepart a million times) to do this? For instance, Oracle provides a function called Trunc which does it, but I cannot find an SQL Server equivalent. Anyone? TIA!!!
I am not aware of this if it exists in MS SQL server. But I need to return results in alphabetic order ignoring "The" if it's the first word of a title...
so for example title "The Cliffhanger" would be returned along with other titles that start with letter C, but "The" also must be returned as part of the title, but just ignored while alphabetizing.
I'm sure that I'm not the first one to ever need this and don't want to re-invent the wheel, so if you have any ideas as to the best way to accomplish this, help me out.
the ssis expression language getdate() function returns the current date with the current time. i only need to get the current date, without the current time. for example: 9/1/2006
how would i construct the proper expression to return this value?
I am trying to do string scrubbing in a sql clr function, including removing certain HTML formatting. I would like to use HtmlDecode method, but it's my understanding that System.Web is not available for Sql Clr (without marking code unsafe - not an option for me as this is for an application we sell externally, and unsafe calls woudl not go over well with customers). Is there any class that IS supported for Sql Clr that exposes this functionality? Thanks.