3. iranalysis¶
3.1. Abstract Domains in libAbstraIR¶
For details about the abstract interpretation module see Abstrair.
3.2. Relevant papers¶
The Ant and the Grasshopper: Fast and Accurate Pointer Analysis for Millions of Lines of Code (Hardekopf and Lin, PLDI2007) (Download): This paper deals with the BDD-based pointer analysis which is the starting point for the analysis currently used in
libFIPTA.Constant Propagation With Conditional Branches(Wegman, Zadeck, ACM Transactionson Programming Languages and Systems, Volume 13, Issue 2, April 1991) <https://dl.acm.org/doi/10.1145/103135.103136>
Collected requirements from MISRA and Cert rules: iranalysis_requirements.xlsx
Slides from meeting: iranalysis_slides-2017-07-27.pptx
3.3. Jour Fixe Protocols¶
https://bitbucket.tz.axivion.com/projects/DOC/repos/documents/browse/Iranalysis_JF
3.4. Sparse Conditional Value Propagation¶
After SSI form has been created and after pointer analysis (FIPTA) has been executed, an algorithm is executed that attempts to compute values for each supported SSI node (some cases, like floats, are not supported yet). This is intermingled with the detection of dead/alive branches in the code. The foundations for this algorithm are known as (sparse) conditional constant propagation. In our case, we extend this to not only handle constant values, but also intervals, and to improve the pointer targets.
The algorithm initially marks all basic blocks as dead except those that can be reached from outside (entry blocks from entry functions), plus some other blocks that are known to be alive (catch blocks for which exception propagation has determined some potentially caught exception). During the algorithm, more and more blocks are detected as being potentially alive. The obvious cases are unconditional successors of blocks being already marked as alive (that includes interprocedural successors, e.g. entry blocks of callees at a call, and post-call blocks in callers after an exit block). For conditional successors, however, the algorithm looks at the then-computed values of the condition, and only marks those conditional successors as alive for which a corresponding value of the condition has been determined.
The need for a condition’s values combines the control-flow traversal with the value computation - and that computation, in turn, depends on the aliveness of basic blocks as we try to avoid propagation of values out of dead branches into alive successors. This means that we have to execute the aliveness-detection in lockstep with the value computation. In the best case, we compute the values for all SSI nodes inside an alive basic block, before proceeding to the next block.
With loops, however, it is necessary to revisit blocks as we do not have all required incoming values at hand when we first visit the loop. Additionally, we may have to revisit blocks outside loops as well, if due to some change before that block the incoming values for SSI nodes inside the block change. In the latter case, we can benefit from the data-flow connections and only visit those SSI nodes affected by the updated incoming values, instead of recomputing all the values for all the SSI nodes in the block (sparsity). A fixpoint must be reached, and for that we ensure monotonic changes for the aliveness of basic blocks (only changing from dead to alive), and for the computed values (only increasing the range of values). The latter requires some widening especially for loops in order to be scalable.
worklist = {entry(f) | f is an entry function};
add catch blocks with caught exceptions to the worklist;
for item in worklist:
if item is a block:
visit_block(item);
else: // must be an SSI node then
visit_ssi_node(item, false);
void visit_block(block): // so the block is visited for the first time
mark block as alive;
for node in ssi_nodes(block):
visit_ssi_node(node, true);
for succ in unconditional_successors(block):
if succ is dead:
add succ to the worklist;
void visit_ssi_node(node, omit_in_same_block):
compute value of node; // also: widening for phis at loop entries
if value has changed:
if node is decision_node(block(node)):
for succ in conditional_successors(block(node)):
if succ is dead and value_allows_branch(node, succ):
add succ to the worklist;
for succ in data_flow_successors(node):
// Note: if the block of this successor is dead, it has first to be
// detected as alive before values should be computed therein; and it
// will then visit all its nodes
if succ not on the worklist and block(succ) is alive:
// Note: when processing all nodes of a block, we do not need
// to add the individual nodes within the same block to the worklist;
// we however have to add those in alive successors, e.g. if the
// successor is a join block in the loop head
if not (block(succ) == block(node) and omit_in_same_block):
add succ to the worklist;
When visiting a block for the first time, all of the SSI nodes in it have to be processed, and that should be done in control-flow order (with some choice of ordering for nodes at the very same position, such as phi nodes for different variables at the beginning of the block). If we order all the SSI nodes globally, a single array index per basic block into that SSI node array is sufficient to implement ssi_nodes(block).
Revisiting a block is only ever triggered by data-flow connections, when incoming values have changed, and we then only revisit SSI nodes with value changes. The new value for an SSI node in this case is only computed when we visit that node; when we see the updated incoming value, we do not yet trigger the value computation for the data-flow successors, we only add them to the worklist. In this way, when computing the value of an SSI node, as much as possible about the incoming values and relevant branch conditions should be known. The computation of a node’s value includes the restriction of the possible values by looking at preceding conditions.
(Re)processing the data-flow successors also handles the case that some part of a condition receives a new value, and that therefore the restricted values inside the branches controlled by the condition have to be updated to the new condition.
A loop is typically processed as follows: First, the initialization block before the loop is handled, and it pushes the loop head onto the worklist. That loop head is processed and sets values for all nodes in it. That includes phi nodes for the loop counter, for example, where only the initialization side is known yet, so the values here will be underestimated. After that, the loop condition block will be handled, and if the loop can be entered based on the initial loop counter value, then the loop body block will be pushed onto the worklist. The loop body will be processed, all with only the initial loop counter value for now, and it ends with processing the back-jump block. That block has an alive successor (the loop head) and so will not push any block on the worklist. However, the SSI nodes in the loop body and in the back-jump push the phi nodes in the loop header onto the worklist, because one of their input values has changed. Therefore visit_ssi_node is called again for the phi nodes in the loop header. This time it should detect the cyclic case (e.g.: being a loop head and all phi arguments have values) and perform some widening based on the loop detection, as otherwise the algorithm would basically unroll the loop.
The above algorithm only shows the outline for context-insensitive analysis. The implementation distinguishes some contexts (direct callers of a function), causing the actual algorithm to carry the context along with the block or SSI node in the worklist. The same so-far dead block can for example be visited multiple times, once for each context of the function it is in.