Front page | perl.perl5.porters |
Postings from October 2017
announcing a new op, OP_MULTICONCAT
Thread Next
From:
Dave Mitchell
Date:
October 25, 2017 12:15
Subject:
announcing a new op, OP_MULTICONCAT
Message ID:
20171025121518.GP3083@iabyn.com
TL;DR: the branch I've just pushed for smoking,
smoke-me/davem/mconcat
allows multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN
or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op, which can
make things a *lot* faster: 4x or more. I'll merge it into blead in a few
days, barring any issues.
This work has taken up most of my time for the last 3 months; its been
kindly funded by Booking.com.
In more detail: it will optimise into a single OP_MULTICONCAT, most
expressions of the form
LHS RHS
where LHS is one of
(empty)
my $lexical =
$lexical =
$lexical .=
expression =
expression .=
and RHS is one of
(A . B . C . ...) where A,B,C etc are expressions and/or
string constants
"aAbBc..." where a,A,b,B etc are expressions and/or
string constants
sprintf "..%s..%s..", A,B,.. where the format is a constant string
containing only '%s' and '%%' elements,
and A,B, etc are scalar expressions (so
only a fixed, compile-time-known number of
args: no arrays or list context function
calls etc)
It doesn't optimise other forms, such as
($a . $b) . ($c. $d)
((($a .= $b) .= $c) .= $d);
(although sub-parts of those expressions might be converted to an
OP_MULTICONCAT). This is partly because it would be hard to maintain the
correct ordering of tie or overload calls.
The compiler uses heuristics to determine when to convert: in general,
expressions involving a single OP_CONCAT aren't converted, unless some
other saving can be made, for example if an OP_CONST can be eliminated, or
in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply
OPpTARGET_MY to, but OP_CONST can't.
The multiconcat op is of type UNOP_AUX, with the op_aux structure directly
holding a pointer to a single constant char* string plus a list of segment
lengths. So for
"a=$a b=$b\n";
the constant string is "a= b=\n", and the segment lengths are (2,3,1).
If the constant string has different non-utf8 and utf8 representations
(such as "\x80") then both variants are pre-computed and stored in the aux
struct, along with two sets of segment lengths.
For all the above LHS types, any SASSIGN op is optimised away. For a LHS
of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too.
For example where $a and $b are lexical vars, this statement:
my $c = "a=$a, b=$b\n";
formerly compiled to
const[PV "a="] s
padsv[$a:1,3] s
concat[t4] sK/2
const[PV ", b="] s
concat[t5] sKS/2
padsv[$b:1,3] s
concat[t6] sKS/2
const[PV "\n"] s
concat[t7] sKS/2
padsv[$c:2,3] sRM*/LVINTRO
sassign vKS/2
and now compiles to:
padsv[$a:1,3] s
padsv[$b:1,3] s
multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY
In terms of how much faster it is, this code:
my $a = "the quick brown fox jumps over the lazy dog";
my $b = "to be, or not to be; sorry, what was the question again?";
for my $i (1..10_000_000) {
my $c = "a=$a, b=$b\n";
}
runs 2.7 times faster, and if you throw utf8 mixtures in it gets even
better. This loop runs 4 times faster:
my $s;
my $a = "ab\x{100}cde";
my $b = "fghij";
my $c = "\x{101}klmn";
for my $i (1..10_000_000) {
$s = "\x{100}wxyz";
$s .= "foo=$a bar=$b baz=$c";
}
The main ways in which OP_MULTICONCAT gains its speed are:
* any OP_CONSTs are eliminated, and the constant bits (already in the
right encoding) are copied directly from the constant string attached to
the op's aux structure.
* It optimises away any SASSIGN op, and possibly a PADSV op on the LHS, in
all cases; OP_CONCAT only did this in very limited circumstances.
* Because it has a holistic view of the entire concatenation expression,
it can do the whole thing in one efficient go, rather than creating and
copying intermediate results. pp_multiconcat() goes to considerable
efforts to avoid inefficiencies. For example it will only SvGROW() the
target once, and to the exact size needed, no matter what mix of utf8
and non-utf8 appear on the LHS and RHS. It never allocates any
temporary SVs except possibly in the case of tie or overloading.
* It does all its own appending and utf8 handling rather than calling
out to functions like sv_catsv().
* It's very good at handling the LHS appearing on the RHS; for example in
$x = "abcd";
$x = "-$x-$x-";
It will do roughly the equivalent of the following (where targ is $x);
SvPV_force(targ);
SvGROW(targ, 11);
p = SvPVX(targ);
Move(p, p+1, 4, char);
Copy("-", p, 1, char);
Copy("-", p+5, 1, char);
Copy(p+1, p+6, 4, char);
Copy("-", p+10, 1, char);
SvCUR(targ) = 11;
p[11] = '\0';
Formerly, pp_concat would have used multiple PADTMPs or temporary SVs to
handle situations like that.
The code is quite big; both S_maybe_multiconcat() and pp_multiconcat()
(the main compile-time and runtime parts of the implementation) are over
700 lines each. It turns out that when you combine multiple ops, the
number of edge cases grows exponentially ;-)
Here are some benchmarks, sorted by improving conditional branch count.
The numbers represent relative counts per loop iteration, compared to
blead at 100.0%. Higher is better: for example, using half as many
instructions gives 200%, while using twice as many gives 50%.
Ir Dr Dw COND IND
------ ------ ------ ------ ------
88.52 121.28 104.76 104.00 140.00 $lex1 .= $lex2
96.19 105.26 101.16 102.35 115.38 (($lex1 .= $lex2) .= $lex3) .= $lex4
100.00 100.00 100.00 100.00 100.00 $pkg .= $lex1
100.00 100.00 100.00 100.00 100.00 my $x = "abc"
100.00 100.00 100.00 100.00 100.00 $lex = $lex1 . $lex2
100.00 100.00 100.00 100.00 100.00 $lex1 . $lex2
100.00 100.00 100.00 100.00 100.00 $lex1 = $lex1 . $lex1
100.00 100.00 100.00 100.00 100.00 $lex = join "", $lex1, $lex2
100.00 100.00 100.00 100.00 100.00 $lex1 = $lex1 . $lex2
101.45 128.57 111.90 107.69 175.00 $lex . "foo"
101.45 128.57 111.90 107.69 175.00 "foo" . $lex
103.68 128.57 111.90 108.11 175.00 $lex = "const$lex1"
105.63 136.84 110.53 108.70 140.00 $pkg .= "foo"
128.33 176.67 161.54 123.81 175.00 $lex .= "foo"
128.65 145.74 131.91 155.10 155.56 $pkg = $lex1 . $pkg
142.32 183.75 168.42 165.00 162.50 $pkg .= $lex1 . $lex2
143.78 159.50 143.08 176.36 216.67 my $lex = $lex1 . $lex2
145.95 159.22 141.82 182.61 200.00 $pkg = $lex1 . $lex2
149.90 199.21 191.23 148.72 188.89 $s .= $a.$b.$c where all RH args are utf8
157.46 208.45 200.00 178.38 185.71 $lex1 .= $lex2 . $lex3
162.47 225.24 214.89 168.42 188.89 $s .= $a.$b.$c where all args are utf8
163.09 198.39 180.95 182.81 228.57 my $s = $a.$b.$c where all args are utf8
164.27 185.83 168.18 198.28 200.00 $pkg = $lex1 . $lex2 . $lex3
164.55 177.92 167.57 187.80 200.00 $pkg = $pkg . $pkg
164.79 173.42 177.14 200.00 200.00 $pkg = $pkg . $lex2
170.03 185.06 162.50 210.00 240.00 $pkg = "const$lex1"
175.12 202.56 185.00 212.73 228.57 $lex = $lex1 . $lex2 . $lex3
175.35 231.58 225.58 197.96 188.89 $lex1 .= $lex2 . $lex3 . $lex4
175.81 214.29 207.89 173.25 200.00 sprintf "%s", $lex1
184.59 216.55 201.32 222.39 187.50 my $lex = $lex1 . $lex2 . $lex3
194.78 232.12 218.29 229.57 218.18 $lex = $lex1 . $lex2 . $lex3 . $lex4 . $lex5
195.58 227.62 206.90 244.90 220.00 my $lex = "const$lex1"
214.41 244.90 229.63 248.89 300.00 my $lex = sprintf "%s", $lex1
224.89 261.14 209.52 255.95 277.78 my $lex = "foooooooooo=$lex1 baaaaaaaaar=$lex2\n" where lex1/2 are 400 chars
230.62 275.76 236.36 227.78 300.00 $lex = "foooooooooo=$lex1 baaaaaaaaar=$lex2\n" where lex1/2 are 400 chars
231.97 302.65 272.41 238.33 275.00 $lex = "foo=$lex1 bar=$lex2\n"
233.23 244.21 289.80 259.57 220.00 sprintf "%s%s", $lex1, $lex2
238.61 261.90 273.17 288.89 187.50 $lex1 = $lex2 . $lex1
239.64 272.57 234.48 253.33 288.89 $pkg = "foooooooooo=$lex1 baaaaaaaaar=$lex2\n" where lex1/2 are 400 chars
240.98 263.75 247.73 272.22 275.00 $pkg = sprintf "%s", $lex1
243.99 295.93 265.62 271.43 266.67 $pkg = "foo=$lex1 bar=$lex2\n"
250.85 313.48 286.49 283.33 255.56 my $lex = "foo=$lex1 bar=$lex2\n"
255.30 345.54 329.17 274.55 227.27 $pkg .= "-foo-$lex1-foo-$lex2-foo-"
260.09 340.35 351.85 263.33 200.00 $pkg .= sprintf "%s", $lex1
267.63 302.86 286.84 300.00 366.67 $lex = sprintf "%s", $lex1
270.29 272.00 315.62 317.91 211.11 $pkg = sprintf "foo=%s bar=%s\n", $lex1, $lex2
272.19 273.86 293.75 321.43 260.00 $pkg = sprintf "foo=%s", $lex1
272.73 292.31 329.73 325.00 200.00 my $lex = sprintf "foo=%s bar=%s\n", $lex1, $lex2
272.90 356.57 337.21 278.26 271.43 $lex = "foo1=$lex1 foo2=$lex2 foo3=$lex3 foo4=$lex4\n"
273.48 379.35 373.81 292.31 250.00 $lex1 .= "-foo-$lex2-foo-$lex3-foo"
275.38 301.89 317.24 333.33 240.00 my $lex = sprintf "foo=%s", $lex1
279.21 279.05 312.73 330.00 266.67 $pkg = sprintf "%s%s", $lex1, $lex2
280.63 302.44 330.77 338.98 250.00 my $lex = sprintf "%s%s", $lex1, $lex2
287.32 296.52 348.28 332.81 237.50 $lex = sprintf "foo=%s bar=%s\n", $lex1, $lex2
296.81 406.25 452.38 296.30 240.00 $lex .= sprintf "%s", $lex1
297.69 336.59 415.79 329.55 212.50 $pkg .= sprintf "%s%s", $lex1, $lex2
298.19 310.26 335.71 348.72 325.00 $lex = sprintf "foo=%s", $lex1
301.21 309.47 351.02 355.32 320.00 $lex = sprintf "%s%s", $lex1, $lex2
325.90 379.45 493.75 356.10 242.86 $lex .= sprintf "%s%s", $lex1, $lex2
333.93 419.48 454.55 359.60 291.67 $s .= "foo=$a bar=$b baz=$c" where $b,$c are utf8
337.37 406.58 406.94 383.72 320.00 my $s = "foo=$a bar=$b baz=$c" where $b,$c are utf8
375.00 478.63 521.43 427.85 291.67 $s .= "foo=$a bar=$b baz=$c" where $s,$b,$c are utf8
391.79 408.33 490.41 461.96 270.00 $s = sprintf("foo=%s bar=%s baz=%s", $a, $b, $c) where $a,$c are utf8
528.36 623.38 721.21 571.00 391.67 $s .= "foo=$a bar=$b baz=$c" where $a,$c are utf8
545.23 613.16 651.39 630.23 440.00 my $s = "foo=$a bar=$b baz=$c" where $a,$c are utf8
581.81 723.31 830.56 622.22 500.00 $s .= "foo=$a bar=$b baz=$c" where $a is utf8, $c has 0x80
599.55 714.91 757.69 678.64 577.78 my $s = "foo=$a bar=$b baz=$c" where $a is utf8, $c has 0x80
603.04 718.32 835.71 693.67 391.67 $s .= "foo=$a bar=$b baz=$c" where $s,$a,$c are utf8
654.05 828.57 951.61 727.84 500.00 $s .= "foo=$a bar=$b baz=$c" where $s,$a are utf8, $c has 0x80
184.58 211.78 204.43 201.60 202.52 AVERAGE
--
You never really learn to swear until you learn to drive.
Thread Next
-
announcing a new op, OP_MULTICONCAT
by Dave Mitchell