develooper Front page | perl.perl5.porters | Postings from October 2017

announcing a new op, OP_MULTICONCAT

Thread Next
From:
Dave Mitchell
Date:
October 25, 2017 12:15
Subject:
announcing a new op, OP_MULTICONCAT
Message ID:
20171025121518.GP3083@iabyn.com
TL;DR: the branch I've just pushed for smoking,

    smoke-me/davem/mconcat

allows multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN
or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can
make things a *lot* faster: 4x or more. I'll merge it into blead in a few
days, barring any issues.

This work has taken up most of my time for the last 3 months; its been
kindly funded by Booking.com.

In more detail: it will optimise into a single OP_MULTICONCAT, most
expressions of the form

    LHS RHS

where LHS is one of

    (empty)
    my $lexical =
    $lexical    =
    $lexical   .=
    expression  =
    expression .=

and RHS is one of

    (A . B . C . ...)            where A,B,C etc are expressions and/or
                                 string constants

    "aAbBc..."                   where a,A,b,B etc are expressions and/or
                                 string constants

    sprintf "..%s..%s..", A,B,.. where the format is a constant string
                                 containing only '%s' and '%%' elements,
                                 and A,B, etc are scalar expressions (so
                                 only a fixed, compile-time-known number of
                                 args: no arrays or list context function
                                 calls etc)

It doesn't optimise other forms, such as

    ($a . $b) . ($c. $d)

    ((($a .= $b) .= $c) .= $d);

(although sub-parts of those expressions might be converted to an
OP_MULTICONCAT). This is partly because it would be hard to maintain the
correct ordering of tie or overload calls.

The compiler uses heuristics to determine when to convert: in general,
expressions involving a single OP_CONCAT aren't converted, unless some
other saving can be made, for example if an OP_CONST can be eliminated, or
in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply
OPpTARGET_MY to, but OP_CONST can't.

The multiconcat op is of type UNOP_AUX, with the op_aux structure directly
holding a pointer to a single constant char* string plus a list of segment
lengths. So for

    "a=$a b=$b\n";

the constant string is "a= b=\n", and the segment lengths are (2,3,1).
If the constant string has different non-utf8 and utf8 representations
(such as "\x80") then both variants are pre-computed and stored in the aux
struct, along with two sets of segment lengths.

For all the above LHS types, any SASSIGN op is optimised away. For a LHS
of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too.

For example where $a and $b are lexical vars, this statement:

    my $c = "a=$a, b=$b\n";

formerly compiled to

    const[PV "a="] s
    padsv[$a:1,3] s
    concat[t4] sK/2
    const[PV ", b="] s
    concat[t5] sKS/2
    padsv[$b:1,3] s
    concat[t6] sKS/2
    const[PV "\n"] s
    concat[t7] sKS/2
    padsv[$c:2,3] sRM*/LVINTRO
    sassign vKS/2

and now compiles to:

    padsv[$a:1,3] s
    padsv[$b:1,3] s
    multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY

In terms of how much faster it is, this code:

    my $a = "the quick brown fox jumps over the lazy dog";
    my $b = "to be, or not to be; sorry, what was the question again?";

    for my $i (1..10_000_000) {
        my $c = "a=$a, b=$b\n";
    }

runs 2.7 times faster, and if you throw utf8 mixtures in it gets even
better. This loop runs 4 times faster:

    my $s;
    my $a = "ab\x{100}cde";
    my $b = "fghij";
    my $c = "\x{101}klmn";

    for my $i (1..10_000_000) {
        $s = "\x{100}wxyz";
        $s .= "foo=$a bar=$b baz=$c";
    }


The main ways in which OP_MULTICONCAT gains its speed are:

* any OP_CONSTs are eliminated, and the constant bits (already in the
  right encoding) are copied directly from the constant string attached to
  the op's aux structure.

* It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in
  all cases; OP_CONCAT only did this in very limited circumstances.

* Because it has a holistic view of the entire concatenation expression,
  it can do the whole thing in one efficient go, rather than creating and
  copying intermediate results. pp_multiconcat() goes to considerable
  efforts to avoid inefficiencies. For example it will only SvGROW() the
  target once, and to the exact size needed, no matter what mix of utf8
  and non-utf8 appear on the LHS and RHS.  It never allocates any
  temporary SVs except possibly in the case of tie or overloading.

* It does all its own appending and utf8 handling rather than calling
  out to functions like sv_catsv().

* It's very good at handling the LHS appearing on the RHS; for example in

    $x = "abcd";
    $x = "-$x-$x-";

  It will do roughly the equivalent of the following (where targ is $x);

    SvPV_force(targ);
    SvGROW(targ, 11);
    p = SvPVX(targ);
    Move(p,   p+1,  4, char);
    Copy("-", p,    1, char);
    Copy("-", p+5,  1, char);
    Copy(p+1, p+6,  4, char);
    Copy("-", p+10, 1, char);
    SvCUR(targ) = 11;
    p[11] = '\0';

  Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to
  handle situations like that.

The code is quite big; both S_maybe_multiconcat() and pp_multiconcat()
(the main compile-time and runtime parts of the implementation) are over
700 lines each. It turns out that when you combine multiple ops, the
number of edge cases grows exponentially ;-)

Here are some benchmarks, sorted by improving conditional branch count.

The numbers represent relative counts per loop iteration, compared to
blead at 100.0%.  Higher is better: for example, using half as many
instructions gives 200%, while using twice as many gives 50%.

     Ir     Dr     Dw   COND    IND 
 ------ ------ ------ ------ ------ 
  88.52 121.28 104.76 104.00 140.00  $lex1 .= $lex2
  96.19 105.26 101.16 102.35 115.38  (($lex1 .= $lex2) .= $lex3) .= $lex4
 100.00 100.00 100.00 100.00 100.00  $pkg .= $lex1
 100.00 100.00 100.00 100.00 100.00  my $x = "abc"
 100.00 100.00 100.00 100.00 100.00  $lex = $lex1 . $lex2
 100.00 100.00 100.00 100.00 100.00  $lex1 . $lex2
 100.00 100.00 100.00 100.00 100.00  $lex1 = $lex1 . $lex1
 100.00 100.00 100.00 100.00 100.00  $lex = join "", $lex1, $lex2
 100.00 100.00 100.00 100.00 100.00  $lex1 = $lex1 . $lex2
 101.45 128.57 111.90 107.69 175.00  $lex . "foo"
 101.45 128.57 111.90 107.69 175.00  "foo" . $lex
 103.68 128.57 111.90 108.11 175.00  $lex = "const$lex1"
 105.63 136.84 110.53 108.70 140.00  $pkg .= "foo"
 128.33 176.67 161.54 123.81 175.00  $lex .= "foo"
 128.65 145.74 131.91 155.10 155.56  $pkg = $lex1 . $pkg
 142.32 183.75 168.42 165.00 162.50  $pkg .= $lex1 . $lex2
 143.78 159.50 143.08 176.36 216.67  my $lex = $lex1 . $lex2
 145.95 159.22 141.82 182.61 200.00  $pkg = $lex1 . $lex2
 149.90 199.21 191.23 148.72 188.89  $s .= $a.$b.$c where all RH args are utf8
 157.46 208.45 200.00 178.38 185.71  $lex1 .= $lex2 . $lex3
 162.47 225.24 214.89 168.42 188.89  $s .= $a.$b.$c where all args are utf8
 163.09 198.39 180.95 182.81 228.57  my $s = $a.$b.$c where all args are utf8
 164.27 185.83 168.18 198.28 200.00  $pkg = $lex1 . $lex2 . $lex3
 164.55 177.92 167.57 187.80 200.00  $pkg = $pkg . $pkg
 164.79 173.42 177.14 200.00 200.00  $pkg = $pkg . $lex2
 170.03 185.06 162.50 210.00 240.00  $pkg = "const$lex1"
 175.12 202.56 185.00 212.73 228.57  $lex = $lex1 . $lex2 . $lex3
 175.35 231.58 225.58 197.96 188.89  $lex1 .= $lex2 . $lex3 . $lex4
 175.81 214.29 207.89 173.25 200.00  sprintf "%s", $lex1
 184.59 216.55 201.32 222.39 187.50  my $lex = $lex1 . $lex2 . $lex3
 194.78 232.12 218.29 229.57 218.18  $lex = $lex1 . $lex2 . $lex3 . $lex4 . $lex5
 195.58 227.62 206.90 244.90 220.00  my $lex = "const$lex1"
 214.41 244.90 229.63 248.89 300.00  my $lex = sprintf "%s", $lex1
 224.89 261.14 209.52 255.95 277.78  my $lex = "foooooooooo=$lex1 baaaaaaaaar=$lex2\n" where lex1/2 are 400 chars
 230.62 275.76 236.36 227.78 300.00  $lex = "foooooooooo=$lex1 baaaaaaaaar=$lex2\n" where lex1/2 are 400 chars
 231.97 302.65 272.41 238.33 275.00  $lex = "foo=$lex1 bar=$lex2\n"
 233.23 244.21 289.80 259.57 220.00  sprintf "%s%s", $lex1, $lex2
 238.61 261.90 273.17 288.89 187.50  $lex1 = $lex2 . $lex1
 239.64 272.57 234.48 253.33 288.89  $pkg = "foooooooooo=$lex1 baaaaaaaaar=$lex2\n" where lex1/2 are 400 chars
 240.98 263.75 247.73 272.22 275.00  $pkg = sprintf "%s", $lex1
 243.99 295.93 265.62 271.43 266.67  $pkg = "foo=$lex1 bar=$lex2\n"
 250.85 313.48 286.49 283.33 255.56  my $lex = "foo=$lex1 bar=$lex2\n"
 255.30 345.54 329.17 274.55 227.27  $pkg .= "-foo-$lex1-foo-$lex2-foo-"
 260.09 340.35 351.85 263.33 200.00  $pkg .= sprintf "%s", $lex1
 267.63 302.86 286.84 300.00 366.67  $lex = sprintf "%s", $lex1
 270.29 272.00 315.62 317.91 211.11  $pkg = sprintf "foo=%s bar=%s\n", $lex1, $lex2
 272.19 273.86 293.75 321.43 260.00  $pkg = sprintf "foo=%s", $lex1
 272.73 292.31 329.73 325.00 200.00  my $lex = sprintf "foo=%s bar=%s\n", $lex1, $lex2
 272.90 356.57 337.21 278.26 271.43  $lex = "foo1=$lex1 foo2=$lex2 foo3=$lex3 foo4=$lex4\n"
 273.48 379.35 373.81 292.31 250.00  $lex1 .= "-foo-$lex2-foo-$lex3-foo"
 275.38 301.89 317.24 333.33 240.00  my $lex = sprintf "foo=%s", $lex1
 279.21 279.05 312.73 330.00 266.67  $pkg = sprintf "%s%s", $lex1, $lex2
 280.63 302.44 330.77 338.98 250.00  my $lex = sprintf "%s%s", $lex1, $lex2
 287.32 296.52 348.28 332.81 237.50  $lex = sprintf "foo=%s bar=%s\n", $lex1, $lex2
 296.81 406.25 452.38 296.30 240.00  $lex .= sprintf "%s", $lex1
 297.69 336.59 415.79 329.55 212.50  $pkg .= sprintf "%s%s", $lex1, $lex2
 298.19 310.26 335.71 348.72 325.00  $lex = sprintf "foo=%s", $lex1
 301.21 309.47 351.02 355.32 320.00  $lex = sprintf "%s%s", $lex1, $lex2
 325.90 379.45 493.75 356.10 242.86  $lex .= sprintf "%s%s", $lex1, $lex2
 333.93 419.48 454.55 359.60 291.67  $s .= "foo=$a bar=$b baz=$c" where $b,$c are utf8
 337.37 406.58 406.94 383.72 320.00  my $s = "foo=$a bar=$b baz=$c" where $b,$c are utf8
 375.00 478.63 521.43 427.85 291.67  $s .= "foo=$a bar=$b baz=$c" where $s,$b,$c are utf8
 391.79 408.33 490.41 461.96 270.00  $s = sprintf("foo=%s bar=%s baz=%s", $a, $b, $c) where $a,$c are utf8
 528.36 623.38 721.21 571.00 391.67  $s .= "foo=$a bar=$b baz=$c" where $a,$c are utf8
 545.23 613.16 651.39 630.23 440.00  my $s = "foo=$a bar=$b baz=$c" where $a,$c are utf8
 581.81 723.31 830.56 622.22 500.00  $s .= "foo=$a bar=$b baz=$c" where $a is utf8, $c has 0x80
 599.55 714.91 757.69 678.64 577.78  my $s = "foo=$a bar=$b baz=$c" where $a is utf8, $c has 0x80
 603.04 718.32 835.71 693.67 391.67  $s .= "foo=$a bar=$b baz=$c" where $s,$a,$c are utf8
 654.05 828.57 951.61 727.84 500.00  $s .= "foo=$a bar=$b baz=$c" where $s,$a are utf8, $c has 0x80
 184.58 211.78 204.43 201.60 202.52  AVERAGE

-- 
You never really learn to swear until you learn to drive.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About