Front page | perl.perl5.porters |
Postings from February 2016
[perl #127617] /n regexp modifier and backreferences to previousgroups
Thread Next
From:
Ed Avis
Date:
February 26, 2016 12:14
Subject:
[perl #127617] /n regexp modifier and backreferences to previousgroups
Message ID:
rt-4.0.18-7812-1456488822-41.127617-75-0@perl.org
# New Ticket Created by "Ed Avis"
# Please include the string: [perl #127617]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/Ticket/Display.html?id=127617 >
This is a bug report for perl from eda@waniasset.com,
generated with the help of perlbug 1.40 running under perl 5.22.1.
-----------------------------------------------------------------
[Please describe your issue here]
The /n regexp modifier, according to perlvar, 'will stop $1, $2,
etc... from being filled in'. However it has another behaviour which
is not documented, and in my opinion, is not helpful. It also stops
the group from being referenced by (?-1) and similar within the same
regexp.
So for example, with the current behaviour:
% perl -E '$_ = "aa"; /(a)(?-1)/ or die; say $1 // "undef"'
a
% perl -E '$_ = "aa"; /(a)(?-1)/n or die; say $1 // "undef"'
Reference to nonexistent group in regex...
This applies too if the modifier is set within a part of the regexp:
% perl -E '$_ = "aa"; /(?n:(a)(?-1))/ or die; say $1 // "undef"'
Reference to nonexistent group in regex...
I would prefer it to still allow referring to the group within the
regexp itself, even if the external effect of setting $1, etc does not
happen. So my preferred behaviour would be
% perl -E '$_ = "aa"; /(?n:(a)(?-1))/ or die; say $1 // "undef"'
undef
Although this would be a change to the current semantics, it is more
closely in line with what perlvar currently documents, so might be
considered more of a bug fix than an incompatible change.
Now I will give a bit of background about why I this would be useful.
Suppose I have a regular expression matching a simple regular
language. Strings in the language are sequences of one or more 'a'.
$lang_re = qr/a+/;
I may define this regexp in a library and then use it in client code
which matches a string in the language followed by a digit:
/\A ($lang_re) ([0-9]) \z/x or die;
my ($lang_str, $digit) = ($1, $2);
Now suppose I change the definition of the language so that valid
strings are now either a sequence of 'a' as before, or <X> where
X is a valid string.
$lang_re = qr/ ( a+ | < (?-1) > ) /x;
(For this trivially simple language there may be other ways to do it
but in general a recursively defined language requires recursive
subpatterns in the regexp.)
The modified $lang_re works but now it has a side effect of setting a
capturing group. The existing client code that expected to include
$lang_re in a larger regexp and then get ($1, $2) will be broken by
this change.
To avoid adding a new externally visible capturing group I would like
to use the /n modifier:
$lang_re = qr/ (?n: ( a+ | < (?-1) > ) ) /x;
The intention is that while $lang_re may use a recursive subpattern
internally, it does not expose a new capturing group to the outside
world. So it can be used as a building block in a larger pattern
without bumping around the $1,$2,$3 results whenever the
implementation of $lang_re changes.
Although using named captures everywhere mitigates the problem it does
not solve it, since of course there is no guarantee that the names of
capturing groups will be globally unique. And of course if $lang_re
is provided by a regexp library, the library author cannot know that
all client code is always using named captures rather than $1,$2,$3.
I think that changing the semantics of /n, so that it stops
*capturing*, but still allows the group to be referenced with
recursive subpatterns, would make it much more useful and would more
closely match the documentation.
(There may also be room for a regexp modifier X which hides groups
from recursive subpattern matches *outside* the (?:X ... ) but allows
them to be visible *inside*. This would be a further improvement to
building reusable, composable regexps. The letter X is just an
example of course. Possibly this could even be the behaviour
of (?:n ... ). But I would not want this to distract from the more
important issue of making /n's current behaviour match the docs.)
(FWIW, the real code which prompted this is a regexp library to match a
'simple arithmetic expression', being numbers with operators like +
and - and parentheses. Such a 'simple expression' is in some sense
safe to evaluate using eval(STRING) to get a number.)
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl 5.22.1:
Configured by Red Hat, Inc. at Mon Dec 14 11:14:02 UTC 2015.
Summary of my perl5 (revision 5 version 22 subversion 1) configuration:
Platform:
osname=linux, osvers=4.3.0-1.fc24.x86_64, archname=x86_64-linux-thread-multi
uname='linux buildvm-04-nfs.phx2.fedoraproject.org 4.3.0-1.fc24.x86_64 #1 smp mon nov 2 16:27:20 utc 2015 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Doptimize=none -Dccflags=-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Dldflags=-Wl,-z,relro -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.22.1 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallu
srbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize=' -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fwrapv -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='5.3.1 20151207 (Red Hat 5.3.1-2)', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678, doublekind=3
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16, longdblkind=3
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='gcc', ldflags ='-Wl,-z,relro -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib
libs=-lpthread -lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
perllibs=-lpthread -lresolv -lnsl -ldl -lm -lcrypt -lutil -lc
libc=libc-2.22.so, so=so, useshrplib=true, libperl=libperl.so
gnulibc_version='2.22'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro '
cccdlflags='-fPIC', lddlflags='-shared -Wl,-z,relro -L/usr/local/lib -fstack-protector-strong'
Locally applied patches:
Fedora Patch1: Removes date check, Fedora/RHEL specific
Fedora Patch3: support for libdir64
Fedora Patch4: use libresolv instead of libbind
Fedora Patch5: USE_MM_LD_RUN_PATH
Fedora Patch6: Skip hostname tests, due to builders not being network capable
Fedora Patch7: Dont run one io test due to random builder failures
Fedora Patch15: Define SONAME for libperl.so
Fedora Patch16: Install libperl.so to -Dshrpdir value
Fedora Patch22: Document Math::BigInt::CalcEmu requires Math::BigInt (CPAN RT#85015)
Fedora Patch26: Make *DBM_File desctructors thread-safe (RT#61912)
Fedora Patch27: Make PadlistNAMES() lvalue again (CPAN RT#101063)
Fedora Patch28: Make magic vtable writable as a work-around for Coro (CPAN RT#101063)
Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux
---
@INC for perl 5.22.1:
/usr/local/lib64/perl5
/usr/local/share/perl5
/usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl
/usr/lib64/perl5
/usr/share/perl5
.
---
Environment for perl 5.22.1:
HOME=/home/eda
LANG=en_GB.UTF-8
LANGUAGE (unset)
LC_COLLATE=C
LC_CTYPE=en_GB.UTF-8
LC_MESSAGES=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
LC_NUMERIC=en_GB.UTF-8
LC_TIME=en_GB.UTF-8
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/eda/bin:/home/eda/bin:/usr/local/bin:/usr/bin:/sbin:/usr/sbin:/sbin:/usr/sbin
PERL_BADLANG (unset)
SHELL=/bin/bash
This email is intended only for the person to whom it is addressed and may contain confidential information. Any retransmission, copying, disclosure or other use of, this information by persons other than the intended recipient is prohibited. If you received this email in error, please contact the sender and delete the material. This email is for information only and is not intended as an offer or solicitation for the purchase or sale of any financial instrument. Wadhwani Asset Management LLP is a Limited Liability Partnership registered in England (OC303168) with registered office at 40 Berkeley Square, 3rd Floor, London, W1J 5AL. It is authorised and regulated by the Financial Conduct Authority.
Thread Next
-
[perl #127617] /n regexp modifier and backreferences to previousgroups
by Ed Avis