Front page | perl.perl5.porters |
Postings from July 2013
Re: Perl 5.18 and Regexp::Grammars
Thread Previous
|
Thread Next
From:
Damian Conway
Date:
July 15, 2013 12:58
Subject:
Re: Perl 5.18 and Regexp::Grammars
Message ID:
CAATtAp7WiYyhRxAifLMhrVfy1j1bUwBqsW-c+F6yPCM0ObSo-w@mail.gmail.com
Two weeks ago Dave Mitchell wrote:
> (Please respond to this only when you have the time).
I now have time. :-)
> Suggested workaround
> --------------------
The workaround Dave demonstrated recreated precisely the technique that
Regexp::Grammars currently uses. All the problems I reported occur when
using this very workaround. See below.
> I think the following code demonstrates all the above working. It relies
> on the fact that concatenation overload triggers disabling of compile-time
> code blocks, and forces everything to run-time. So it relies on
> 'use re eval' being in scope at the caller.
The code Dave offered did indeed work for the examples he included.
But the code does not work for everything that Regexp::Grammars does.
The problems are as follows...
>> * Specifically any 'qr' overloading that returns an object that
>> stringifies to a pattern "text" that contains (?{...}) or
>> (??{...}) will now *sometimes* trigger the dreaded 'use re
>> "eval"' warning, even if there is a 'use re "eval"' in the
>> scope where the pattern was originally defined.
>
> I can't reproduce this; I would need sample code.
This is the simplest example I could find. It works correctly under 5.14
(i.e. if prints "DONE", then prints "matched"). Under 5.18 it generates
a compile-time error:
Eval-group not allowed at runtime, use re 'eval' in regex
m/ [^#]+ (?{ say 'DONE'; }) /
at (eval 1) line 6.
Note that the RegexpProcessor code here is identical to Dave's original
workaround code, except that the commented line has been added to the
stringification operator to simulate the types of (much more complex)
code-block injections that Regexp::Grammars actually carries out:
-----cut----------cut----------cut----------cut----------cut----------cut-----
# Ensure 'eval' is allowed throughout the entire example code...
use re 'eval';
package RegexProcessor;
use overload (
q{""} => sub {
my ($pat) = @_;
my $complete_pattern = $pat->[0];
# This is an extremely simplified version of the type of
# code block injection that Regexp::Grammars performs...
return qq{
$complete_pattern
(?{ say 'DONE'; })
};
},
q{.} => sub {
my ($a1, $a2) = @_;
$a1 = $a1->[0] if ref $a1;
$a2 = $a2->[0] if ref $a2;
return bless [ "$a1$a2" ], 'RegexProcessor';
},
);
package main;
BEGIN {
overload::constant qr => sub { return bless [ $_[0] ],
'RegexProcessor' };
}
my $bracketed = qr{
[^#]+
}xms;
say 'matched' if '#foobar#' =~ $bracketed && $& eq 'foobar';
-----end----------end----------end----------end----------end----------end-----
>> * The second problem that has arisen in 5.18 is that variables
>> that appear in (?{...}) or (??{...}) blocks are now checked
>> for 'use strict' compliance *before* the 'qr' overloading is
>> triggered, making it impossible to provide rewritings that
>> sanitize such variables.
>
> Yep, you can't rewrite code blocks any more, unless you can force them to
> become run-time, then overload-concatenate them, as shown above.
Even when they are forced to become run-time (using your workaround code),
'use strict' compliance seems to be tested too early (i.e. before the
qr-overloading has a chance to "vanish" the variable in question).
For example, the following code works as expected under 5.14
(i.e. the post-processed regex correctly matches), but under 5.18
it generates an odd "double fatality" compile-time error:
Global symbol "$MAGIC_VAR" requires explicit package name at
demo.pl line 32.
Global symbol "$MAGIC_VAR" requires explicit package name at (eval
1) line 1.
Once again, the RegexpProcessor code is identical to Dave's workaround
code, except that this time the commented line has been added to the
qr-overloading in order to replace $MAGIC_VAR in the source with 'foo'
(this is a minimal version of the various kinds of much more complex
manipulations that Regexp::Grammars actually does):
-----cut----------cut----------cut----------cut----------cut----------cut-----
package RegexProcessor;
use overload (
q{""} => sub {
my ($pat) = @_;
return $pat->[0];
},
q{.} => sub {
my ($a1, $a2) = @_;
$a1 = $a1->[0] if ref $a1;
$a2 = $a2->[0] if ref $a2;
return bless [ "$a1$a2" ], 'RegexProcessor';
},
);
package main;
use re 'eval';
BEGIN {
overload::constant qr => sub {
my ($regex_pattern) = @_;
# Replace raw $MAGIC_VAR with 'foo'...
# (A greatly simplified version of what Regexp::Grammars does)
$regex_pattern =~ s/\$MAGIC_VAR/'foo'/g;
return bless [ $regex_pattern ], 'RegexProcessor'
};
}
use strict;
say 'matched' if "foobar" =~ m{ (??{ $MAGIC_VAR }) bar }xms;
-----end----------end----------end----------end----------end----------end-----
>> * The third problem that has arisen in 5.18 is when the module
>> injects a code block that accesses an in-scope lexical
>> variable. Those blocks, when compiled, appear to
>> *sometimes* be failing to close over the correct variable.
>>
>> * For example, the R::G <%hash> construct is rewritten into a
>> block like so:
>>
>> (??{
>> exists $hash{$^N} ? q{} : q{(?!)}
>> })
>>
>> But, when matching, the lexical variable %hash appears to be
>> empty inside the code block, even though it is not definitely
>> empty in the enclosing lexical scope.
>
> Again, I'd need sample code that reproduces this.
The following code demonstrates the problem. It works
as expected under 5.14 (i.e. prints the three hash keys)
but under 5.18 it generates a compile-time error:
"Global symbol "%hash" requires explicit package name at (eval 1) line 1."
Once again the RegexpProcessor code is just Dave's workaround:
-----cut----------cut----------cut----------cut----------cut----------cut-----
package RegexProcessor;
use overload (
q{""} => sub {
my ($pat) = @_;
return $pat->[0];
},
q{.} => sub {
my ($a1, $a2) = @_;
$a1 = $a1->[0] if ref $a1;
$a2 = $a2->[0] if ref $a2;
return bless [ "$a1$a2" ], 'RegexProcessor';
},
);
package main;
use re 'eval';
BEGIN {
overload::constant qr => sub { return bless [ $_[0] ],
'RegexProcessor' };
}
my %hash = ( a => 1, b => 2, c => 3);
"" =~ m{ (?{say join ', ', keys %hash}) }xms;
-----end----------end----------end----------end----------end----------end-----
I believe that these three examples now cover all the remaining
failures that are preventing Regexp::Grammars from working correctly
under Perl 5.18 (apart from the segfaulting issue that has already been
dealt with).
Thanks,
Damian
Thread Previous
|
Thread Next