NWCLARK TPF grant March report

Front page | perl.perl5.porters | Postings from April 2013

NWCLARK TPF grant March report

Thread Next

From:

Nicholas Clark

Date:

April 22, 2013 15:28

Subject:

NWCLARK TPF grant March report

Message ID:

20130422152748.GP4940@plum.flirble.org

As per my grant conditions, here is a report for the March period.

gcc 4.8.0 was released on 22nd March. This version of gcc has integrated
Address Sanitizer, a Google project to provide a fast runtime heap and stack
validation tool. I set it off building gcc from source, which pleasingly
worked first time on the system I chose for the task. In turn blead built
with it without problems, which is good in itself, and a known good starting
point for building with Address Sanitizer enabled.

I didn't expect to find many problems with gcc's ASAN, as I believe we're
currently clean when running under valgrind, and when building with the ASAN
in clang 3.1. Fortunately I was right - I didn't find many. Unfortunately it
wasn't zero, as I did find 3 problems, all of which are now fixed in blead.

Firstly, PerlIO_find_layer() was using the wrong length for memEQ(). So if
it was looking for "perlio", it would compare 6 bytes, even if the current
layer was named "unix". I believe that it's unlikely to be a problem in
production (in this example, as "u" != "p", the comparison will fail without
needing data from beyond the end of the shorter name). It would only exhibit
itself if someone had layers named where one is initial substring of other
and the random memory read happened to be identical to the longer layer, and
if so would result in the shorter named layer being used where the longer
should have been found. It is now fixed in blead and will be in v5.18.0.

Secondly, a test for heredoc terminators was sometimes reading beyond the
end in a memNE(). The failing test case was for an unterminated heredoc used
inside code a string eval, something which used actually to SEGV until fixed
by Father Chrysostomos post v5.16.0. I think that the bug isn't triggered
without something similarly obscure, but it is now fixed.

Thirdly, the memory referenced by the CV for &DynaLoader::boot_DynaLoader to
record information about the C code used to generate it was pointing to the
C stack, and so anything reading from it would pick up garbage. The tests
for B::Deparse happened to read this name while searching for a different
CV. The worst case would be that code using B would have a Perl string
containing some of the contents of the C stack, when they were expecting
something sensible. This is only going to cause bugs if that code was very
interested in &DynaLoader::boot_DynaLoader specifically, and it's fixed now.

Something which might seem to be simple, but is actually an inherently
complex problem, is the build system. The distribution needs to bootstrap
itself up from source files to a complete working tested perl, without
assuming that there is already a perl installed on the system (else you have
a chicken and egg problem), but also without getting confused if there *is*
a perl installed on the system (as it is probably a different version, or
has incompatible build options). Moreover, most of the build system is
written in Perl (being the same code as the CPAN tool chain), so that has to
run against an uninstalled perl, and without touching files outside the
build directory (so it can't use ~/.cpan) or talking to the Internet.
Finally, there's no single tool you can rely on existing with which to start
the bootstrap (DCL on Win32? Shell scripts on a vanilla VMS install? Batch
files on *nix?), so there's more than one of it to get your head around.

[For completeness, I should also note that

1) there's a clear distinction between the steps for configuration, building
and testing*,
2) that all tests, core and modules, are run as one stage with the results
reported together
3) that where possible, the build and tests run in parallel, which speeds
up the build & test cycle by a factor of at least 8 on decent current
hardware, which is a significant productivity boost if you're doing this
on a daily or hourly basis
]

Anyway, the reason this is relevant is that Zefram submitted a patch which
provides XS implementations for some of the key methods in File::Spec.
Whilst we agreed that for something as fundamental as File::Spec, it's too
close to v5.18.0 to safely put it in, I did try to make a branch of blead,
to test that it worked in blead.

This turned out to be rather more involved that I thought it would be.

I expected it to be potentially interesting, because File::Spec and Cwd are
used early during the bootstrap of the toolchain, which means that a bunch
of scripts early in the build process need to be able to run it (pure-Perl)
from dist/

But it proved to be a bit more interesting that just that. For starters, I
hit a failure mode that I wasn't expecting. We have this rule to create
lib/buildcustomze.pl, the setup file which primes miniperl to be able to run
uninstalled and change directory. (Which you need for the build):

lib/buildcustomize.pl: $(MINIPERL_EXE) write_buildcustomize.pl
$(MINIPERL) write_buildcustomize.pl >lib/buildcustomize.pl

Problem is that the upgrade caused a compile time error in
write_buildcustomize.pl, because Cwd now requires constant and re.
It's roughly akin to this:

$ git diff
diff --git a/write_buildcustomize.pl b/write_buildcustomize.pl
index ec3b36e..e8efec1 100644
--- a/write_buildcustomize.pl
+++ b/write_buildcustomize.pl
@@ -1,5 +1,6 @@
#!./miniperl -w

+die;
use strict;
if (@ARGV) {
my $dir = shift;
$ make lib/buildcustomize.pl
./miniperl -Ilib write_buildcustomize.pl >lib/buildcustomize.pl
Died at write_buildcustomize.pl line 3.
make: *** [lib/buildcustomize.pl] Error 255
$ echo $?
2
$ ls -l lib/buildcustomize.pl
-rw-r--r-- 1 nick nick 0 Mar 2 16:15 lib/buildcustomize.pl

ie the build fails, but the generated file is not removed. So if you attempt
to continue by running make again, that file is assumed to be good, and
something ugly fails soon after for all the wrong reasons, spewing error
messages that aren't related to the original problem.

So, having figured out that there is a second bug obscuring the real bug, it
becomes easier to fix the actual causes :-) Although the bad news is that
this means changes to the Win32 and VMS makefiles too. I pushed a tentative
fix to the Makefile bootstrapping logic in
smoke-me/nicholas/build-bootstrap, which George Greer's Win32 smoker seems
fine with. However, I think I can see how to improve it by more use of
buildcustomize (removing -Ilib), but things have been sufficiently busy that
I've not had a chance to look further. In particular, we don't have any VMS
smokers, so I'll have to test things manually on VMS, which makes it all
more time consuming.

I also worked on a couple of things which ultimately didn't make the cut for
code freeze. Partly this was due to to not being around for a chunk of March
due to a trip to my parents and then a trip to Berlin for the German Perl
Workshop.

[The German Perl Workshop was excellent. It had probably the best ever venue
for a conference social event - the Computer Games Museum:

http://www.computerspielemuseum.de/1210_Home.htm
]

gcc provides a function named __builtin_expect(), which is intended to
permit the programmer to tell give an optimisation hint to the compiler's
code generator about which code path is more common, so that the faster path
through the generated code is for the more common case.

We added a probe for this a while back, and some macros (LIKELY and UNLIKEY)
in perl.h, which expand to use __builtin_expect() when compiling with a gcc
which supports it, and (obviously enough) everywhere else don't use it.
However, we didn't then use it very much, as it turned out to be very hard
to measure that these actually helped by speeding up code, and it seemed a
bad use of time churning the source code, possibly introducing errors or
inefficiencies, for no measurable gain.

Steffen Mueller recently added UNLIKELY() and LIKELY() (mostly UNLIKELY) to
a few places in the code where one can be confident that they hold true.
(And, as much, that the design and code presumes that this is the case, so
they are documenting assumptions.) I had a look at the generated code on
ARM, and it's quite different with the use of the macros. There is rather
too much to figure out, but as best I could tell from the first diff hunk,
it's inverting the sense of branches. I don't know how much branch
prediction current ARMs have (15 years ago they had none), so this is going
to make a difference of a cycle per branch avoided. Which isn't going to be
much, but is in the right direction *provided* the annotation is more
correct than the compiler's guesswork. Which is probably going to be true
for the easy to reason cases.

This caused Zefram to notice that the macros as added, are actually subtly
wrong:

#ifdef HAS_BUILTIN_EXPECT
# define EXPECT(expr,val) __builtin_expect(expr,val)
#else
# define EXPECT(expr,val) (expr)
#endif
#define LIKELY(cond) EXPECT(cond,1)
#define UNLIKELY(cond) EXPECT(cond,0)

The intent is that one passes them an expression that is likely true, or
something that is likely false, and by implication that that expression is
considered to be a boolean. However, the documentation of __builtin_expect()
is

long __builtin_expect (long exp, long c)

The return value is the value of exp, which should be an integral
expression. The semantics of the built-in are that it is expected that
exp == c.

ie, that long integer comparison is used for the evaluation. This raises the
potential of bugs. In particular, in C, NULL pointers are false, non-NULL
pointers are true. So, one would expect to write LIKELY(pointer) to indicate
that one expects that pointer is non-NULL. However, the way the code currently
is, that will compare the pointer with the integer 1 (cast to a pointer), and
declare to the code generator that this comparison is likely to be true.
Which, of course, it almost certainly is not. So for that case it's completely
counter-productive.

Those macros *should* change to this:

#define LIKELY(cond) EXPECT(cBOOL(cond),TRUE)
#define UNLIKELY(cond) EXPECT(cBOOL(cond),FALSE)

(using our cast-to-bool macro, to avoid *other* problems some C99 compilers
have with the bool type), and they are now pushed to a smoke-me branch to
verify that they are OK, and ensure that they don't get lost.

However, that also changes the return value of LIKELY() and UNLIKELY() -
before it was the value passed in, now it's TRUE or FALSE. This might just
matter, so I searched CPAN to ensure that nothing was using the return value
of our macros. All seemed clear, although the search was complicated by the
fact that several XS modules define their own macros. However, as it's not a
regression, and it is a potential behaviour change, it didn't seem the right
time to change them on blead with the code freeze and release imminent.

The other thing I worked on which didn't make the code freeze was a fix for
RT #114576. The bug is an unintended performance regression introduced as
a side effect of a space optimisation. It's not new - it dates from three
years ago, and shipped with v5.12.0. The desired to optimise (and hence
make trade offs) comes because their is only one "hash" type in Perl 5, and
it's horribly overloaded. It's used for 3 arguably disjoint purposes

1) the program's associative arrays
2) the internals of objects
3) the interpreter's symbol tables

and sadly those three have differing needs. If a program is storing data in
a few associative arrays, the fixed overhead of each is not significant, but
access and manipulation costs are. If a program is creating lots of objects,
then the fixed overheads rapidly start to mount up, and can be comparable in
size to the object members themselves. Plus, almost no code does things like
iterate over the members of an object, but plenty of code does just that
with an associative array.

The second tension is that the value of a hash in scalar context is documented
as being 0 if the hash is empty, otherwise a truthful value that is fiddly
to format, and requires overhead to calculate. Almost no code is actually
using the details of that truthful value, but it's documented behaviour,
longstanding, and we try to avoid changing things that would break working
code.

So the optimisation assumed that nearly all boolean cases were spotted by
the peephole optimiser, which would permit the implementation to "cheat" and
return 0 or "truth", rather than the documented value. This would mean that
it was viable to avoid storing information on the number of hash chains
populated in every hash, shrinking every hash (and hence every object) by
one integer.

The regression is that there are cases where this doesn't work out.
I have a fix (which avoids increasing the size of objects, but restores
performance for large hashes) pushed to a smoke-me branch, but as I was
away and busy just before the code freeze, and with the priority being on
security related hash fixes, it didn't get considered for merging.
I believe that it's technically viable to backport it later as a bug fix, if
it meets the other review criteria.

Sadly not all progress is forwards.

I worked on on RT #116943, "deprecating ** and friends", attempting to get
it ready for Ricardo to approve it going into blead before v5.17.10.

The magic variable $* was used to enable multi-line matching, but has been
deprecated since 5.000, superseded by the /s and /m modifiers. The magic was
removed in v5.10.0, and the (then default-off) deprecation warning when
using it replaced by a default-on warning that it no longer does anything:

$ ~/Sandpit/5000/bin/perl -lwe '$* = 1
Use of $* is deprecated at -e line 1.
$ ~/Sandpit/5100/bin/perl -e '$* = 1'
$* is no longer supported at -e line 1.

The other variables in the typeglob, @*, &*, ** and %*, have never had any
special meaning.

The goal of the changes is to permit the parser to be modified in future to
stop treating $* etc as magic variables, and instead enable * to be used as
part of new future syntax, for example $*foo, @*bar and %*baz. Searching
CPAN for existing users is tricky, as strings such as @* and %* seem to
produce a lot of false positives, but after weeding out non-Perl code, and
format strings, it seemed that there are virtually no users, hence the
trade-off seemed worthwhile - remove something unused by virtually all
production code, to enable new features that will be of use to production
code.

So I set off to replicate the existing code for warning on use of $* (the
SCALAR in the typeglob) into the other users of the typeglob. On the way I
discovered that several regression tests are using @*. Some use it because
it's a convenient punctuation variable to test, and others were using it to
exploit a bug - mention @* first, and $* doesn't warn:

$ ~/Sandpit/5000/bin/perl -lwe '$* = 1; print $]'
Use of $* is deprecated at -e line 1.
5.000

$ ~/Sandpit/5000/bin/perl -lwe '0 if @*; $* = 1; print $]'
5.000

So I fixed that bug, verified that the other tests could be rewritten using
other punctuation variables, and so was now ready to "deprecate" the rest,
as there were now no use cases outside the tests for warnings.

This I did, pushed the code as a smoke-me, and then when that exposed no
problems, checked with Ricardo and merged it to blead in time for v5.15.10

Unfortunately, that then revealed the actual problems with this approach.
Which, I guess at least proves me right on one thing - I wanted to get it
into a blead release to be confident that it wasn't going to cause
surprises. It's much better to get your surprises now, rather than in a
release candidate, or even worse, after you ship and you can't turn the
clock back. But the problems are for a future report.

At the start of March, because it was fresh in my head, I continued working
on Unicode name lookup, finishing off the Perl prototype code by
implementing the algorithmic lookup for CJK ideographs and Hangul
syllables. CJK ideographs are easy, Hangul syllables less so. Currently Perl
5 uses a regex for them, but accessing the regex engine from C is, well,
harder work than setting up the call stack and calling back into Perl. Plus
neither are pretty, nor are they simple. Instead I found a way to implement
them in fairly simple C, partly by calling back into the trie lookup code
used for all the other names, and completed a prototype of this in Perl,
Perl written to be easy to transcode to C. With this milestone completed,
I stopped.

Three weeks later, needing a change after head-hurting C debugging, I
returned to it. A Perl prototype is all very well, but it had but no C data
structures [to validate that my estimates of size were correct], and no C
code to read them [to validate that it would work at all! :-)] So I fixed
this, by writing code to generate C data structures for the needed tables,
and then by transcoding the prototype lookup code from Perl to C. This
approach worked particularly well - as I had all this done in two days, and
the C code worked *almost* first time (the only error was actually in one of
the constants in the code for Hangul syllables). The trick is to write your
Perl like C - obviously no regular expressions, but maybe less obviously,
write really clunky code that chews through strings using ord, s/\A.// and
substr, and expresses all comparisons using nothing more complex than == or
eq.

I cross checked the generated C code against a set of "Golden Results"
generated from Perl and it performed perfectly. (I was hoping for this as
the Perl prototype code also performed perfectly, but a lot lot more
slowly.) As I was on a roll, I continued by implementing some space
optimisations for data representation which cut the size of the tables down
from about 900K to 400K. At this point I have small stand alone executable
which can lookup Unicode code points by name. All that's left is the small
matter of integrating it into toke.c, and the build system. All...

I also tried to build and test blead on my Raspberry Pi. This used to work
(it takes a few hours, but runs to completion). It doesn't now - it aborts
somewhere near the end of the tests, seemingly due to running out of
memory. So what changed? I set the pi off smoking to figure it out:

export TEST_JOBS=1; bisect.pl -j1 --start 9f68b0f7acd1bcb04e7baa4bdb7cfec8e5d985c8 --end blead^ -Doptimize=-g -Dusethreads -Duse64bitint -- sh -c 'cd t && ./perl harness; echo $?; ls -l test_state'

ie bisect.pl builds up to C<test_prep> (the default), and then the test case is
to run the entire test suite under t/harness, and see whether t/harness runs
to completion and writes out the test state in t/test_state.

Given that each iteration of the bisection will take something like 7¼ hours,
this was going to take a while...

It *eventually* produced this, after more than 3 days non-stop on the job:

HEAD is now at 3283393 Restrict the valid identifier syntax, fix some identifier bugs.
bad - non-zero exit from sh -c cd t && ./perl harness; echo $?; ls -l test_state
32833930e32dc619abdaaab54e88de2a2765fb86 is the first bad commit
commit 32833930e32dc619abdaaab54e88de2a2765fb86
Author: Brian Fraser <fraserbn@gmail.com>
Date: Tue Mar 5 18:18:49 2013 -0300

Restrict the valid identifier syntax, fix some identifier bugs.

Fixes:
* Length-one identifiers are now restricted to
[\p{XIDS}\p{POSIX_Punct}\p{POSIX_Digit}\p{POSIX_Cntrl}]
plus, if under 'no utf8', the 128 non-ASCII characters in the
Latin1 range.
* Identifiers that start with ASCII letters can be followed with
XIDC characters

(The committer made some small edits in the pod)
:100644 100644 e8f5402a77dd011f065f7d91db354fdfd6da6830 8ac08abf50107cbfba3a8296df34556449134ba6 M gv.c
:040000 040000 9ae3a108394119259ca10d38ef44ef23c237fb91 3a55dc5feeea97da7346fc74df0601d3200e32e3 M pod
:040000 040000 695729610bd917237944558cfe729d529dfb6514 714c16587ae882fc635c17fd82362e5ff804f16b M t
:100644 100644 27485462ffca6939c895961092a743775401fa14 1ea0da7fde5aa50521ffe68378a87980048578cd M toke.c
bisect run success
That took 274968 seconds

If I reverse just the test cases added by that commit then the tests all
pass:

Test Summary Report
-------------------
uni/variables.t (Wstat: 0 Tests: 5 Failed: 0)
TODO passed: 5
Files=2388, Tests=620014, 18222 wallclock secs (2020.18 usr 55.35 sys + 13406.55 cusr 290.67 csys = 15772.75 CPU)
Result: PASS

whereas with that test restored again, the harness fails:

../lib/charnames.t ................................................ ok

Could not execute (./perl -I.. -MTestInit=U1 ../lib/diagnostics.t): open3: fork failed: Cannot allocate memory at ../lib/TAP/Parser/Iterator/Process.pm line 168.
make: *** [test_harness] Error 12

If I double the amount of swap available, everything passes once more:

All tests successful.
Files=2388, Tests=685775, 18699 wallclock secs (2216.11 usr 60.91 sys + 14095.49 cusr 300.45 csys = 16672.96 CPU)
Result: PASS

So, yes, it's to do with the amount of memory available. Test::Builder is
innocent. That test itself is innocent. All it does wrong is to add a *lot*
of testcases:

-plan (tests => 5);
+plan (tests => 65850);

Which makes me wonder - just how much memory is TAP::Parser using as it
remembers the results from all the test scripts run?

I've sent my findings to the perl-qa list, in the hope that someone finds it
an interesting problem to dig into, and produces a patch to the code that
reduces it for the common case. Source code being here:

https://github.com/Perl-Toolchain-Gang/Test-Harness

A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/
BBC is "bleadperl breaks CPAN" - Andreas König's test reports for CPAN modules

[Hours] [Activity]
1.25 ASAN
EBCDIC
2.75 File::Spec & bootstrap
2.00 LIKELY/UNLIKELY
0.50 PL_stderrgv
6.50 RT #114576
1.50 RT #116477, RT #116615
0.75 RT #116833
14.00 RT #116943 -- deprecating ** and friends
1.00 RT #116971
0.25 RT #117141
0.25 RT #117315
37.00 Unicode
Unicode Names
0.75 bison 2.7
1.00 clang ASAN
9.75 gcc 4.8.0
gcc 4.8.0 ASAN
4.25 pahole interpreter struct
2.00 process, scalability, mentoring
0.75 randomised iteration
34.00 reading/responding to list mail
0.25 regcomp/setjmp
0.50 smoke pi
smoke pi (when did blead exhaust swap)
======
121.00 hours total

Nicholas Clark

* ie you don't end up with the *build* finishing with "oh, I couldn't
actually build this extension. See some buried log for what you need to do
before cleaning and restarting". Which was a source of much merriment with
one other language which I found myself iteratively re-building until I
could get it to build the extension that was a requirement for the tool I
ultimately needed.

Thread Next

NWCLARK TPF grant March report by Nicholas Clark

Re: NWCLARK TPF grant March report by Ricardo Signes

Re: NWCLARK TPF grant March report by Marcus Holland-Moritz

nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About