Wednesday, February 25, 2015

Playing with SMT solvers — ALIVe

I have lately seen many references to projects using SMT solvers to do cool things, and I'm planning to look at some of these to see what they can do/how they work/what restrictions they have.

ALIVe (Automatic LLVM InstCombine Verifier) is such a project. John Regehr has a longish description of what it does and the benefits of doing that in his blog, but the idea is that you create a LLVM peephole optimization pass by writing rules transforming LLVM IR, such as
%1 = shl i32 %x, C                    ; shift left
%2 = lshr i32 %1, C                   ; logical shift right
  => 
%2 = and i32 %x, (1<<(32-C)-1)
ALIVe will prove that the transformation is valid using the Z3 SMT solver, and generate C++ code for a LLVM pass that implements it. This should eliminate bugs in this kind of optimizations.

The interesting thing here is how ALIVe can use the SMT solver to prove that the transformation is valid... The aim of SMT solvers is to determine if it is possible or not to find values for variables so that logical expressions are satisfied. The operations available for the expressions have roughly the same functionality as the operations in the LLVM IR, so it is often easy to transform the IR directly to something the solver can work with (essentially by transforming the LLVM IR to LISP syntax, and to prefix each operation by "bv" meaning "bit vector"). For the example above, we first need to define a 32-bit variable X
(declare-const X (_ BitVec 32))
The syntax "declare-const" makes sense as the variables cannot be updated (i.e. they act in the same way as variables in mathematics). We also need to define C in the same way, but 32-bit shifts are only defined for values 0–31 in the LLVM IR, so we need to limit the range of C
(declare-const C (_ BitVec 32))
(assert (bvult C #x00000020))
The shift sequence is translated to
(bvlshr (bvshl X C) C)
and the and statement to
(bvand X (bvsub (bvshl #x00000001 (bvsub #x00000020 C)) #x00000001))
We now want to prove that they evaluate to the same value for all valid X and C, i.e. that
(=
  (bvlshr (bvshl X C) C)
  (bvand X (bvsub (bvshl #x00000001 (bvsub #x00000020 C)) #x00000001))
)
is true for all valid X and C. But the SMT solver can only tell us if it exists values for which the expression is true — not that it is true for all values. The trick is to instead tell the solver to find values for X and C where this is not true — if it says this is impossible, then we know it is true for all X and C.

So the full query to the SMT solver looks like
(declare-const X (_ BitVec 32))

(declare-const C (_ BitVec 32))
(assert (bvult C #x00000020))

(assert (not (=
  (bvlshr (bvshl X C) C)
  (bvand X (bvsub (bvshl #x00000001 (bvsub #x00000020 C)) #x00000001))
)))
Running the solver gives the result "unsat", that is, it is not possible to find values for X and C that satisfies all assertions. Which means that we have proved that this optimization is valid!