Best way to parse bbcode

Loïc Faure-Lacroix picture Loïc Faure-Lacroix · Jan 28, 2009 · Viewed 12.5k times · Source

I'd like to work on a bbcode filter for a php website. (I'm using cakephp, it would be a bbcode helper) I have some requirement.

Bbcodes can be nested. So something like that is valid.

[block]  
    [block]  
    [/block]  
    [block]  
        [block]  
        [/block]  
    [/block]  
[/block]  

Bbcodes can have 0 or more parameters.

Exemple:

[video: url="url", width="500", height="500"]Title[/video]

Bbcodes might have mutliple behaviours.

Let say, [url]text[/url] would be transformed to [url:url="text"]text[/url] or the video bbcode would be able to choose between youtube, dailymotion....

I think it cover most of my needs. I alreay done something with regex. But my biggest problem was to match parameters. In fact, I got nested bbcode to work and bbcode with 0 parameters. But when I added a regex match for parameters it didn't match nested bbcode correctly.

"\[($tag)(=.*)\"\](.*)\[\/\1\]" // It wasn't .* but the non-gready matcher

I don't have the complete regex with me right now, But I had something that looked like that(above).

So is there a way to match bbcode efficiently with regex or something else. The only thing I can think of is to use the visitor pattern and to split my text with each possible tags this way, I can have a bit more of control over my text parsing and I could probably validate my document so if the input text doesn't have valid bbcode. I could Notify the user with a error before saving anything.

I would use sablecc to create my text parser. http://sablecc.org/

Any better idea? or anything that could lead to a efficient flexible bbcode parser?

Thank you and sorry for my bad english...

Answer

Chad Birch picture Chad Birch · Jan 28, 2009

There are several existing libraries for parsing BBCode, it may be easier to look into those than trying to roll your own:

Here's a couple, I'm sure there are more if you look around:
PECL bbcode
PEAR HTML_BBCodeParser