Regex macros and capturing groups

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Regex macros and capturing groups

DavidJCobb
This post was updated on .
If, during a regex operation, you attempt to access a capturing group that isn't there (through either <n.did_find group="x"/> or through "$x" codes in a replacement string), NAML completely chokes. There's no way to catch or handle the relevant exception, which can cause some clunkiness when designing certain systems (in my case, a profile customization system that uses a list of regexes to process different URLs for different profile options).

It's my understanding that this is because Java itself throws an exception, and I'm aware that you guys are working on an overhaul for the forum software, so I'll understand if this is left alone. Thought it'd be good to point it out, though.

On a related note, sorry about recent very high bandwidth consumption on Testing Perfection. I was trying out some regex manipulation in an attempt to work around that above problem... Tip: never put "()" in a regex. Not even once. :\

EDIT: Apparently, ^(a)(b)(c)$ also causes it to choke -- this time with a server crash, not an exception. Which is weird enough that I'm starting to think I'm not the problem... :\

EDIT2: I'm gonna leave it alone for a few days, and see if I can find a non-server-based Java app to jam some regexes into.

For those who are curious about the crashing business, the basic problem is that if you try to run /(a)(b)(c)/ and use a replacement string "$1$2$3$4", that's one group too many, so you get an exception. This is bad if you need to have a varied set of regexes run through one piece of code. I was trying to work around this by having my regex call append dummy capturing groups to the end of the regex variable, with the intent of creating additional capturing groups without actually affecting what gets matched; this way, there would always be enough groups to prevent an exception. Unfortunately, regexes can be very dangerous when you're doing particularly... eccentric... things with them -- something I had previously forgotten about, and was rather alarmingly reminded of when Testing Perfection went down.
Reply | Threaded
Open this post in threaded view
|

Re: Regex macros and capturing groups

Hugo <Nabble>
Thanks for reporting this, David. I am trying to reproduce the problem, but I am not sure I understand the whole issue. For example, consider this code:
<macro name="test">
    <n.regex_replace_all. pattern="(a)(b)(c)" replacement="$1-$2-$3">
        begin abc end
    </n.regex_replace_all.>    
</macro>
This code can be easily tested with a URL like this:
<your-app-domain>/template/NamlServlet.jtp?macro=test

This code above works fine (it prints "begin a-b-c end"). If the replacement has a fourth group ($1-$2-$3-$4) we do get an exception: "No group 4", caused by an IndexOutOfBoundsException. It should be easy to catch these exceptions in the java code and re-throw them as template exceptions that can be handled with NAML. If you think this makes sense, we can do this pretty quickly.

Are there other examples we should consider? I believe you can test the regex code with small macros like the one above. If the server goes down, then we should really fix the problem. I am interested in all examples you have available. Just let us know and we can fix the problems. This experience will also be important for the Lua-based implementation to be released in the near future. Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Regex macros and capturing groups

DavidJCobb
Hugo <Nabble> wrote
If the replacement has a fourth group ($1-$2-$3-$4) we do get an exception: "No group 4", caused by an IndexOutOfBoundsException. It should be easy to catch these exceptions in the java code and re-throw them as template exceptions that can be handled with NAML. If you think this makes sense, we can do this pretty quickly.
That makes sense and would be helpful, though I think I may have found a (non-crashing) workaround for the issue.

Hugo <Nabble> wrote
Are there other examples we should consider? I believe you can test the regex code with small macros like the one above. If the server goes down, then we should really fix the problem. I am interested in all examples you have available. Just let us know and we can fix the problems. This experience will also be important for the Lua-based implementation to be released in the near future. Thanks!
The weird thing is, I can no longer reproduce the crash. Not through regex manipulation, anyway.

The crashes occurred when I tried to work around the capturing group exception, by adding dummy capturing groups. The regular expression that led to a crash was (assuming that my NAML code was supplying the values properly):

(?i)^(?:https?\:\/\/)?(?:www\.|admin\.|halo\.)?bungie\.net\/Account\/Profile\.aspx\?(?:.*?&)*?(memberID|uid)(=\d+)(?:&.*?|)$|^()()()()()$

Where the original regex, without my workarounds, was:

(?i)^(?:https?\:\/\/)?(?:www\.|admin\.|halo\.)?bungie\.net\/Account\/Profile\.aspx\?(?:.*?&)*?(memberID|uid)(=\d+)(?:&.*?|)$

Though this appears to have been what led to a crash earlier, it now seems to work just fine. I don't have the original code that caused a crash (I didn't know that the server would be down for as long as it was, so I tried to fix the NAML mid-crash), but code that should be functionally identical is failing to break anything. So I guess it might not be regex-related. (This might be the worst bug report I've ever submitted. :P)

Well, if I see another crash, I'll leave the NAML untouched, copy down the names of the relevant macros, and post another thread here. :\
Reply | Threaded
Open this post in threaded view
|

Re: Regex macros and capturing groups

Hugo <Nabble>
Thanks, and don't worry. Just let us know when you have something that we should know about.