develooper Front page | perl.perl5.porters | Postings from October 2015

about branch smoke-me/davem/fast_arith2

Thread Next
From:
bulk 88
Date:
October 29, 2015 06:50
Subject:
about branch smoke-me/davem/fast_arith2
Message ID:
BAY182-W29191C8EBE40368B1020D6DF200@phx.gbl
* make SETi/u/n, (X)PUSHi/u/n more efficient

diff --git a/pp.h b/pp.h
index b497085..2f376e0 100644
--- a/pp.h
+++ b/pp.h
@@ -340,19 +340,59 @@ Does not use C<TARG>.  See also C<L</XPUSHu>>, C<L</mPUSHu>> and C<L</PUSHu>>.
                          } } STMT_END
 #endif
 
+/* set TARG to the IV value i */
+#define TARGi(i)    STMT_START { \
+                        IV TARGi_iv = i;                                \
+                        if (LIKELY((SvTYPE(TARG) == SVt_IV)             \
+                                && !(TARG->sv_flags & SVf_THINKFIRST))) \
^^^^^^ that should be written without && so it is only 1 branch on less than perfect CCs something like SvFLAGS(TARG) &  (SVt_IV|SVf_THINKFIRST) == SVt_IV
+                        {                                               \
+                            (void)SvIOK_only(TARG);                     \
+                            SvIV_set(TARG, TARGi_iv);                   \
^^^^^^^^^^^SvIV_set goes through sv_any, since you just proved that that TARG SV is bodyless with that SVt_IV test, remove an extra memory read, and just write to targ->sv_u.svu_iv, leave SvIV_set in a comment for grepability
+                        }                                               \
+                        else                                            \
+                            sv_setiv_mg(targ, TARGi_iv);                \
+                    } STMT_END

These changes will cause huge code bloat, why dont you put this shortcut logic into the start of sv_setiv_mg and friends (sv_set*_mg), or only for the math PP opcode funcs with PUSHQi, with Q for quick. For ppport.h purposes, PUSHQi is a define to PUSHi if nobody goes through the trouble of backporting this improvement. Dont expose a gazillion XSUBs to this bloat.

Also

#define SETn(n)        STMT_START { sv_setnv(TARG, (NV)(n)); SETTARG; } STMT_END
#define SETi(i)        STMT_START { sv_setiv(TARG, (IV)(i)); SETTARG; } STMT_END
#define SETu(u)        STMT_START { sv_setuv(TARG, (UV)(u)); SETTARG; } STMT_END

These already exist in pp.h and should be reused, TARGi TARGn and TARGu names look redundant.

for commit

* faster add, subtract, multiply

I will also point out this commit https://github.com/perl11/cperl/commit/8b6139a5cbb3307067daf51503fdee2771595afb which deals with perl's math logic forest (it is too big to call it a tree, it is a forest, its ~20 branches after -O1 to add 2 numbers in p5p perl with pp_add) mess. Every commercial CPU has an overflow/underflow status bit, it just isn't accessible from ANSI C.

* split pp_predec() from pp_preinc() and improve


diff --git a/pp_hot.c b/pp_hot.c
index 7936be3..8d0af22 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -464,25 +464,44 @@ PP(pp_eq)
 }
 
 
-/* also used for: pp_i_predec() pp_i_preinc() pp_predec() */
+/* also used for: pp_i_preinc() */
 
 PP(pp_preinc)
 {
-    dSP;
-    const bool inc =
-    PL_op->op_type == OP_PREINC || PL_op->op_type == OP_I_PREINC;
-    if (UNLIKELY(SvTYPE(TOPs)>= SVt_PVAV || (isGV_with_GP(TOPs) && !SvFAKE(TOPs))))
-    Perl_croak_no_modify();
-    if (LIKELY(!SvREADONLY(TOPs) && !SvGMAGICAL(TOPs) && SvIOK_notUV(TOPs) && !SvNOK(TOPs) && !SvPOK(TOPs))
-        && SvIVX(TOPs) != (inc ? IV_MAX : IV_MIN))
+    SV *sv = *PL_stack_sp;
+
+    if (LIKELY(((sv->sv_flags &
+                        (SVf_THINKFIRST|SVs_GMG|SVf_IVisUV|
+                         SVf_IOK|SVf_NOK|SVf_POK|SVp_NOK|SVp_POK|SVf_ROK))
+                == SVf_IOK))
+        && SvIVX(sv) != IV_MAX)
+    {
+    SvIV_set(sv, SvIVX(sv) + 1);
+    }
+    else /* Do all the PERL_PRESERVE_IVUV and hard cases in sv_inc */
+    sv_inc(sv);
+    SvSETMAGIC(sv);
+    return NORMAL;
+}
+
+
+/* also used for: pp_i_predec() */
+
+PP(pp_predec)
+{
+    SV *sv = *PL_stack_sp;
+
+    if (LIKELY(((sv->sv_flags &
+                        (SVf_THINKFIRST|SVs_GMG|SVf_IVisUV|
+                         SVf_IOK|SVf_NOK|SVf_POK|SVp_NOK|SVp_POK|SVf_ROK))
+                == SVf_IOK))
+        && SvIVX(sv) != IV_MIN)
     {
-    SvIV_set(TOPs, SvIVX(TOPs) + (inc ? 1 : -1));
-    SvFLAGS(TOPs) &= ~(SVp_NOK|SVp_POK);
+    SvIV_set(sv, SvIVX(sv) - 1);
     }
-    else /* Do all the PERL_PRESERVE_IVUV conditionals in sv_inc */
-    if (inc) sv_inc(TOPs);
-    else sv_dec(TOPs);
-    SvSETMAGIC(TOPs);
+    else /* Do all the PERL_PRESERVE_IVUV and hard cases  in sv_dec */
+    sv_dec(sv);
+    SvSETMAGIC(sv);
     return NORMAL;
 }

I am against splitting them into 2 pp funcs. Bad for cpu cache.

If you look at the value of op_type

    OP_PREINC     = 47,
    OP_I_PREINC     = 48,
    OP_PREDEC     = 49,
    OP_I_PREDEC     = 50,
    OP_POSTINC     = 51,
    OP_I_POSTINC     = 52,
    OP_POSTDEC     = 53,
    OP_I_POSTDEC     = 54,

Write a static assert for these numbers, and aslong as they stay in order (OP_PREINC needs to be ahead of OP_PREDEC numerically).

Just write  SvIV_set(sv, SvIVX(sv) + (PL_op->op_type-48));

That is branch free. Same explanation goes for the pp_postdec commit (* split pp_postdec() from pp_postinc() and improve). I think there are free bits in the op struct if you can't use op_type (I suggest op_type).

C:\p523\src\win32>perl -e"$bit = 0; printf '%d', 1-(2 & $bit-1)"
-1
C:\p523\src\win32>perl -e"$bit = 1; printf '%d', 1-(2 & $bit-1)"
1
C:\p523\src\win32>

or if you have 2 bits, even less arithmetic

C:\p523\src\win32>perl -e"$twobits = 0; printf '%d', 1-$twobits"
1
C:\p523\src\win32>perl -e"$twobits = 2; printf '%d', 1-$twobits"
-1
C:\p523\src\win32>

For example 1, "(2 & $bit-1)" is effectively a multiply by zero or by 1, but it doesn't use the super slow CPU integer multiply hardware op ( http://www.agner.org/optimize/instruction_tables.pdf  on an ATOM IMUL 64 bit is 14 for latency, AND is 1, so IMUL takes 14 times longer than an AND, on Haswell, IMUL r,r is 3, AND is 1). I don't trust CCs to turn /2 and *0 or *2 into shifts and ands or simpler ops, I'll do it by hand explicitly.


 		 	   		  
Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About