How to preg_match a pattern having linebreaks?

Can you please help on matching a pattern having potential linebreaks.

On [PHPRegexLive](https://www.phpliveregex.com/), I use the regex pattern = {{\s*IF(.+)}}(.+){{\s*ENDIF}} on search string:

before if....{{IF !empty('')}} <div class='h6 mt-4 mb-2 edit-btn-container'>About</div> {{ENDIF}} after if....

The result is fine, array[0] = entire {{IF <condition>}}...{{ENDIF}} string, array[1] = <condition>, and array[2] = whatever between {{IF <con>}} and {{ENDIF}}.

The problem is when the entire {{IF <con>}}...{{ENDIF}} spans more than one line, such as


before if....{{IF !empty('')}}
<div class='h6 mt-4 mb-2 edit-btn-container'>About</div>
{{ENDIF}} after if....

I tried different combinations of \n*, \n*\r*, etc, and s, m modifier but cannot get it to work.
 
Look into the preg_match ending modifiers 's' (Single line) and 'm' (Multiline)

Also, instead of: IF(.+)}}
I would try: IF(.*?)}
the reason is the the first way the ending }} could match the }} that follows ENDIF}}
.*? means a non-greedy match; i.e. it will stop at the first }} match
 
Btw try to avoid using "unlimited wildcars" like * and +. Use {1,n} instead of + and {0,n} instead of * where n is the number of max characters you expect to a positive match. The reason is unlimited search can lead very slow regex matches in case of big input data and where end results is not guaranteed.

Also instead of . You can use a definite set of characters or a stopper. It seems like you never want to go further than { and/or } character so you can write [^}] and [^{] instead of the dot.

And finally it is better to escape { and } as normally they used for ranges (see above).

So the final regex might look a bit more obfuscated than yours but much safer and faster to use:

Code:
\{\{\s*IF([^}]{1,100})\}\}([^{]{1,100})\{\{\s{0,100}ENDIF\}\}
 
Since PHP has nested IF statements, a single regular expression that searches for an IF followed by an ENDIF would not be able to match arbitrary nested IFs properly.

https://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
No. It's that easy. A finite automaton (which is the data structure underlying a regular expression) does not have memory apart from the state it's in, and if you have arbitrarily deep nesting, you need an arbitrarily large automaton, which collides with the notion of a finite automaton.

You can match nested/paired elements up to a fixed depth, where the depth is only limited by your memory, because the automaton gets very large. In practice, however, you should use a push-down automaton, i.e a parser for a context-free grammar
 
You might wanna look into compiler theory with lex and yacc. You have a lexer that identifies individual grammar tokens, in your case IF, ENDIF, "{" ... and then uses a syntax specification to parse them into something meaningful (an abstract syntax tree in compiler / interpreter case). Not sure what you're trying to achieve but parsing grammars is not trivial, prepare for pain :P
 
Back
Top