RANSAC FEATURE MATCHING (ransac)

Introduction

Corners are an example of a feature which should be stable between images, independent of illumination, camera, or scene composition.  This means we should be able to use them to align different images, for stitching them together in a panorama or matching up objects or comparing their content.  If we can identify matching corners, we can determine how to map one image onto another, accounting for scaling, translation, and rotation.  But with hundreds of corners an exhaustive search for the best fit is impossible, and there are certainly many that do not have corresponding matches and which, if used, would give a bad mapping.

RANSAC, short for Random Sample Consensus, solves these problems by sampling the solution space using a figure of merit that ignores mismatches and outliers.  It forms a hypothesis, a proposed coordinate transform, by picking a random subset of corners and assuming they match.  It then evaluates the hypothesis by counting the number of close pairs, nearer than some given distance.  It makes many trials and picks the one with the most matches, then refines the result with the pairs.

Say that we have a linear model accounting for scale and translation between the coordinates (x, y) of image1 and (v, w) of image2

    v = sx * x + dx

    w = sy * y + dy

where sx and sy are scale factors and dx and dy offsets.  We need two pairs of corners to determine these four unknowns, and once we have them we can transform the corners in image1 into the image2 plane.  If they end up closer to a corner in image2 than some threshold, we count that pair as a successful match.  When done comparing all corners we can refine the model by calculating the best fit of its parameters over the matches using a linear regression

The RANSAC algorithm does assume there is a reasonably large overlap between corners in the two images and that noise (aka false corners) and outliers do not dominate.  If the ideal or actual transform does not have many matches, then most trials will fail or give poor results and more will have to be performed.  We haven't used this technique before, though, so part of the reason for this exercise is to see how well the approach works.

In this section we implement RANSAC for two models, one with the shift-translation (ST) model above and one with an affine transform for rotation-shift-translation (RST).  We won't be needing any parts of Chapel that we haven't already seen, but we'll find we use most of the features we've talked about.

The directory for this section is ransac.  The programs will use the C functions in img_png_v3.[ch] for reading and writing PNG images and the color library ip_color_v3.chpl from the Color Conversion section.  The FAST analysis has also been pulled out into a separate library, ip_corner.chpl.  Use make name to build the programs, leaving off the '.chpl' suffix.  Executables get put in the bin sub-directory, intermediate files in build.  This exercise uses a set of transformed images, with bonds1.png the original, base image.

The programs and images are in a tarball here (as zip file).  Here is a PDF version of this section.

Aside: kd-Trees (kdtree, test_kdtree)

The key step in the RANSAC algorithm is to identify matching pairs of corners after transforming the coordinates of one to the other's plane.  This means searching a 2D plane for the nearest match, which is not well supported by normal data structures.  A kd-tree partitions the plane efficiently.  It is a binary tree where the coordinate used to divide the plane (volume, hypervolume) changes from level to level: in our case the root node splits along x, the first layer by y, the second by x, the third by y, and so forth.  This allows searching in O(log n) time on average (O(n) worst case).  Building the tree can be done in O(n log n) if a median search in linear time is available, as we sa at the start of thetutorial (An Example).  Such a median search does exist, but is a bit complicated to implement; a new method has just appeared with the same order of complexity.

Construction

Brown has recently published an new way to construct the tree, claiming that although it takes O(kn log n) time, where k is the number of dimensions, it is somewhat more efficient than the median search for k <= 4.  [Russel Brown, "Building a Balanced k-d Tree in O(kn log n) Time", Journal of Computer Graphics Techniques, Vol. 4, No. 1, 2015.]  It makes a sort of the corner coordinates along each axis at the start, and then shuffles the indices for each level of the tree, never doing a second sort.

The sorting can be done by any technique.  Ties are resolved in a cyclic order through the coordinates; if we have three, then sort one first by x with ties resolved by y then z (xyz), the second by y with z then x breaking ties (yzx), the third by z using x then y (zxy).  In Brown's paper and our implementation it is enough to store indices to the corners in an array, rather than moving the corner data structures around.  Note that because we are sorting by index and not value the standard methods in the Sort module cannot be used; we'll modify the QuickSort example from the Parallel Programming section to work with indices.

Having sorted the corners by coordinate we then start building the tree layer by layer.  We pick the middle, ie. median, element of the array as the node. We shuffle each sorted array so that those elements smaller than the chosen are in one half and those larger are in the other.  What this means in practice is that we do nothing for the array whose primary sort coordinate is the one we've just used for the tree node (x, then y, then x, ...) because it's already in order.  We scan the other arrays from start to finish, which guarantees we keep the same sort order, but use a comparison function based on the primary sort coordinate to determine if they go in the lower or upper half.  We then recur on each of these halves; like the implementation in the Example, this can be done in parallel.

Here's an example of building a twelve node tree from 3D points.  The initial sort gives

   index     pt            xyz            yzx            zxy

     1   (6, 4, 2)     12 (1, 9, 4)   10 (5, 1, 1)   10 (5, 1, 1)

     2   (4, 7, 5)      3 (3, 1, 8)    3 (3, 1, 8)    6 (8, 9, 1)

     3   (3, 1, 8)      8 (3, 4, 6)    4 (7, 2, 2)    9 (6, 3, 2)

     4   (7, 2, 2)      5 (4, 3, 5)    9 (6, 3, 2)    1 (6, 4, 2)

     5   (4, 3, 5)      2 (4, 7, 5)    5 (4, 3, 5)    4 (7, 2, 2)

     6   (8, 9, 1)     10 (5, 1, 1)    1 (6, 4, 2)    7 (6, 4, 3)

     7   (6, 4, 3)      9 (6, 3, 2)    7 (6, 4, 3)   12 (1, 9, 4)

     8   (3, 4, 6)      1 (6, 4, 2)    8 (3, 4, 6)   11 (8, 6, 4)

     9   (6, 3, 2)      7 (6, 4, 3)   11 (8, 6, 4)    5 (4, 3, 5)

    10   (5, 1, 1)      4 (7, 2, 2)    2 (4, 7, 5)    2 (4, 7, 5)

    11   (8, 6, 4)     11 (8, 6, 4)    6 (8, 9, 1)    8 (3, 4, 6)

    12   (1, 9, 4)      6 (8, 9, 1)   12 (1, 9, 4)    3 (3, 1, 8)

In the zxy column, notice that (6, 3, 2) comes before (6, 4, 2) because the z and x values are the same, leaving y to resolve the difference.

The first median is sixth in the sorted list for x (we can just take the lower of the two midpoints), or point 10 (5, 1, 1).  We now have to shuffle yzx and zxy.  The indices in all three arrays after the shuffle will be the same above and below the median, without disturbing their sorted orders.  From the xyz list we find 12, 3, 8, 5, and 2 below the median and 9, 1, 7, 4, 11, 6 above.  In the yzx array these are ordered 3, 5, 8, 2, 12 and 4, 9, 1, 7, 11, 6, and in the zxy array 12, 5, 2, 3, 8, 3 and 6, 9, 1, 4, 7, 11.  The shuffling is done by scanning the array from top to bottom, copying the index into the appropriate half of a temporary array, then copying back when done.

       xyz            yzx            zxy

   12 (1, 9, 4)    3 (3, 1, 8)   12 (1, 9, 4)

    3 (3, 1, 8)    5 (4, 3, 5)    5 (4, 3, 5)

    8 (3, 4, 6)    8 (3, 4, 6)    2 (4, 7, 5)

    5 (4, 3, 5)    2 (4, 7, 5)    8 (3, 4, 6)

    2 (4, 7, 5)   12 (1, 9, 4)    3 (3, 1, 8)

   10 (5, 1, 1)    

    9 (6, 3, 2)    4 (7, 2, 2)    6 (8, 9, 1)

    1 (6, 4, 2)    9 (6, 3, 2)    9 (6, 3, 2)

    7 (6, 4, 3)    1 (6, 4, 2)    1 (6, 4, 2)

    4 (7, 2, 2)    7 (6, 4, 3)    4 (7, 2, 2)

   11 (8, 6, 4)   11 (8, 6, 4)    7 (6, 4, 3)

    6 (8, 9, 1)    6 (8, 9, 1)   11 (8, 6, 4)

The middle of the top sub-array is the left child of the first node, and the middle of the bottom is the right child.  We pick medians in each half of the y column now, giving 8 (3, 4, 6) as the left child and 1 (6, 4, 2) as the right (again rounding the midpoint down).  We now reorder the x and z columns.

       xyz            yzx            zxy

    3 (3, 1, 8)    3 (3, 1, 8)    5 (4, 3, 5)

    5 (4, 3, 5)    5 (4, 3, 5)    3 (3, 1, 8)

                   8 (3, 4, 6)

   12 (1, 9, 4)    2 (4, 7, 5)   12 (1, 9, 4)

    2 (4, 7, 5)   12 (1, 9, 4)    2 (4, 7, 5)

   10 (5, 1, 1)

    9 (6, 3, 2)    4 (7, 2, 2)    9 (6, 3, 2)

    4 (7, 2, 2)    9 (6, 3, 2)    4 (7, 2, 2)

                   1 (6, 4, 2)

    7 (6, 4, 3)    7 (6, 4, 3)    6 (8, 9, 1)

   11 (8, 6, 4)   11 (8, 6, 4)    7 (6, 4, 3)

    6 (8, 9, 1)    6 (8, 9, 1)   11 (8, 6, 4)

When we have 3 nodes left in a sub-array, as we do on the next pass, then the middle becomes the root, the first its left child and the third its right.  For the last sub-array, 7 (6, 4, 3) is placed in the third (z) layer with 6 (8, 9, 1) to its left and 11 (8, 6, 4) to its right.  If there are only 2 nodes then we take the second as the root and the first as its left child, so 3 (3, 1, 8) goes as the left child of 8 (3, 4, 6), and 5 (4, 3, 5) as its left child.  The complete tree looks like

 

Search

We follow the algorithm in "Algorithms in a Nutshell" by Heineman, Pollice and Selkow, O'Reilly, 2008 to use a kd-tree to find the node nearest a point.  It begins by determining the leaf node where the point would be inserted (or, if the point matches a node, we have our answer).  That is, traverse the tree from the root, going left if the point is less than the node in the cyclic sense we used building the tree, else right.  There's no guarantee, however, that it must be closest, only that we've probably excluded half the tree.  Now we have to check all parents of that node.  Starting from the root, we measure the distance to the point, remembering it if it less than the best distance found so far.  We then ask if the perpendicular distance, which is the difference in x coordinates if it's an x node or in y for a y node, is less than this minimum distance.  If not, then we go down the tree in the direction of the point: the other side cannot be closer than the minimum.  If it is, then we need to check both children.

    find leaf node n under which point would be placed

    dmin = distance between point, root

    better = nearest(pt, root, dmin)

    if better exists, return better

      else return n

 

    nearest(pt, root, dmin)

      result = <none>

      if distance from root to pt < dmin, dmin = distance, result = root

      dp = perpendicular distance from pt to root

      if (dmin < dp) then

        if pt < root then better = nearest(pt, root.left, dmin)

          else better = neareset(pt, root.right, dmin)

        if better exists return better

      else  

        better = nearest(pt, root.left, dmin)

        if better exists then result = better, dmin = distance pt to better

        better = nearest(pt, root.right, dmin)

        if better exists then result = better, dmin = distance pt to better

      return result

In the best and average case this is an O(log n) operation as we make two visits down the tree, one to find the leaf node and the second to confirm that no parents are closer.  In the worst case it becomes O(n) when we have to check both left and right children at every level when the perpendicular distance is less than the minimum.  For small numbers of dimensions the kd-tree search is better than exhaustively checking all nodes, but as the dimensionality of the space increases to somewhere between 10 and 20, the chance of hitting the worst case quickly grows.  Timing tests with the RANSAC program show a factor of 5 improvement, from 41.1 s to 8.3 s per trial compared to exhaustively searching the plane.

Implementation (kdtree, test_kdtree)

We'll implement the kd-tree with a generic class, using a generic record for each node.  It will take a parameter for the number of dimensions (so the tree can be used outside this application) and a type for some piece of data stored at each node; nodes will store this payload, the point, and some flags to indicate if they're leaf nodes.  The tree will be built by first loading data and then constructing it.  Since we cannot add data after the construction we'll need to use an internal flag to prevent this.

The class will have five public methods:

    kdtree() - constructor for the generic elements and number of points

    add_node() - stage a point and its payload to put in the tree

    assemble_tree() - sort the nodes into the kd-tree

    find_nearest_point() - return payload in the node closest to a point

    dump() - print a table with the tree, meant for debug

The node record is

    record kdpoint {

      type eltType;

      param ndim : int;

      var flags : uint(8);

      var pt : ndim * int;

      var val : eltType;

    }

where val is the payload.  The type and parameter are set when we use the node in the class

    class kdtree {

      type dataType;

      param ndim : int;

      var Ltree : domain(rank=1);

      var tree : [Ltree] kdpoint(eltType=dataType, ndim=ndim);

      var Ldata : domain(rank=1);

      var data : [Ldata] kdpoint(eltType=dataType, ndim=ndim);

      var sortind : [1..ndim][Ldata] int;

      var nstored : int;

      var storable : bool = true;

    }

Here we've separated the nodes into a unsorted array data and the kd-tree tree.  The domains for each are defined separately so we can change them when we know the number of points.  We use a flat array to store the tree where the left child of node p is at index 2*p and the right at 2*p+1.  nstored will track how many points we've added, or where to store the next point that comes in.  The storable flag will be set false when we build the tree; we'll use it to tell whether we can add more points.  The sortind array stores the sorted indices we need.

You'll find the code implementing the five public methods in kdtree.chpl. We don't intend to go through it line by line, but do want to make several comments.

In the constructor we make sure there are at least two levels to the tree, even if it stores only one element.  The routine that builds the tree assumes this (place_node() when the size of the array is 1).  The tree otherwise is full, ie. has 2 ** depth - 1 nodes, where depth is the log2 of the number of elements, rounded up.  In the Chapel standard math library log2(val:int) returns the largest power-of-two smaller than the value; it does not automatically cast to real.

The add_node() method simply checks that we haven't assembled the tree yet and copies the new point and value into the staging area, the data array.

assemble_tree() begins by setting up and sorting the indices of the points.  It then calls place_node() which will determine where to split the array, place that point in the tree, shuffle the points and recur on each half.  Unlike our QuickSort example, and the kd-tree example at the start, we pass only the range and not a sliced array because sortind is a member.  But the core logic – shuffle and recur in parallel using cobegin – is the same.

    /* Lsort is subrange of data to process, cbase the primary sort

       coordinate, and pos the index in tree to put the point. */

    proc place_node(Lsort : range, cbase : int, pos : int) {

      if (3 == Lsort.size) {

        /* first element to left child, middle to pos, last to right */

      } else if (2 == Lsort.size) {

        /* first element to left child, middle to pos, right is empty */

      } else if (1 == Lsort.size) {

        /* element to pos, left and right empty */

      } else {

         const midpt = (Lsort.first + Lsort.last) / 2;

         const cnext = if (cbase == ndim) then 1 else (cbase + 1);

         shuffle_sortind(Lsort, cbase, midpt);

         tree(pos) = data(sortind(cbase)(midpt));

         tree(pos).flags = cbase : uint(8) | HAS_L | HAS_R;

         cobegin {

           place_node(Lsort[..midpt-1], cnext, 2*pos);

           place_node(Lsort[midpt+1..], cnext, 2*pos+1);

         }

       }

     }

Since we're using a flat array for the tree, we need flags in the kdpoint record to mark if the node is a leaf (IS_LEAF), is empty (IS_NIL), or if it has a left or right child (HAS_L and HAS_R).  We also store the primary sort coordinate.  These flags are defined as constants inside the class.  We don't use an enumeration because of the problems using the constant as an int (Chapel can't figure out the type and you would need to cast every usage).  Although shuffle_sortind() modifies the array, it is safe to do so in parallel during the recursion because the ranges will not overlap.

The sorting method sort_kdpoints() is a modified QuickSort.  For convenience we've defined three sets of comparison functions, with point_<comparison>() being the fundamental group.  They use a loop through the coordinates, resolving the comparison with the first unequal pair.

    inline proc point_lt(pt1 : ndim * int, pt2 : ndim * int, cbase : int,

                         noeq = false) : bool {

      for craw in cbase..(cbase+ndim-1) {

        const coff = if (craw > ndim) then (craw - ndim) else craw;

        if (pt1(coff) != pt(coff)) then

          return pt1(coff) < pt2(coff);

      }

      if (noeq) then halt("kd-tree does not accept equal points");

      else return true;

    }

Here coff is wrapped to the range 1..ndim.  Since cbase varies, we cannot use a for param to unwrap the loop.  This is somewhat inefficient for a small number of dimensions, but it is a general solution that works for all ndim.  The noeq flag is needed because we don't want duplicate points while building the tree, but we do want to allow them when searching for the nearest node.  The other two sets are wrappers around this, dataind_<comparison>() using the base coordinate and array index to get points from the data array via sortind, and kdpoint_<comparison>() that pulls the points from the nodes directly.

In the shuffle_sortind() procedure we've made one optimization by noting that the indices in all coordinate arrays are the same below or above the midpoint, only their order differs.  We don't need to compare the points with the median while shuffling, because we already know that the primary coordinate array is in the correct order and tells us which half to place the index in. We use an intermediate array lt to mark them, and the decision to shuffle becomes a look up into lt.

The searching procedure find_nearest_point() follows the algorithm described above.  It is straightforward code.  Perhaps the only two points to make about Chapel is the use of a for param in the distance-squared calculation dist2(), and a clean definition of the perpendicular distance given the coordinate cbase used by the tree node

    proc perpdist2(pt : ndim * int, node : int) : int {

      const cbase = tree(node).flags & COORD;

      return (pt(cbase) - tree(node).pt(cbase)) ** 2;

    }

Here COORD is a bit mask to extract the coordinate we packed into the flags member.

The program test_kdtree.chpl is a self-checking test bench for several functions in kdtree.  Compile and run it with

    make test_kdtree

    bin/test_kdtree

Aside: Transforms and Best Fits

RANSAC matching is a three-step process, beginning with an estimate of the transform between two images, then checking if it is a good mapping, and finally refining it with the pairs of matching features.  There are two transforms we'll want to use, a linear coordinate mapping accounting for scaling and translation (ST) and a mapping that also includes rotation (RST).  The ST model is simpler and was used to develop and test the program first.  The math behind the steps involves solving a linear system for the first, evaluating the transform for the second, and a best-fit error minimization regression for the third.

Repeating the ST transform from (x, y) coordinates to (v, w),

        v = sx x + dx
 
        w = sy y + dy
 

Note that the order of the operations is important; a TS model would give v = sx * (y + dx) which has a different interpretation for the shift dx.  This is just a linear equation which needs two pairs of points to fix the constants, (x1, y1), (v1, w1) and (x2, y2), (v2, w2)

        v1 = sx x1 + dx
        v2 = sx x2 + dx
        w1 = sy y1 + dy
        w2 = sy y2 + dy

Solving

        sx = ( v1 v2 ) / ( x1 x2 )
 
        dx = v1 sx x1 = v2 sx x2
 
        sy = ( w1 w2 ) / ( y1 y2 )
 
        dy = w1 sy y1 = w2 sy y2
 

Refining these values given many matching pairs is the same as performing a linear regression which minimizes the square error function

        L = i = 1 n ( v i sx x i dx ) 2

Take the partial derivative of L for each constant

        L sx = i = 1 n 2 x i ( v i sx x i dx )
 
        L dx = i = 1 n 2 ( v i sx x i dx )

and set them to zero.  Making the sum index i implicit in the notation,

        2 x ( v sx x dx ) = 0
        2 ( v sx x dx ) = 0

or

        xv sx x 2 dx x = 0
 
        v sx x n dx = 0

Solving for sx and dx gives

        dx = ( v sx x ) / n
 
        sx = ( n xv x v ) / ( n x 2 ( x ) 2 )        

The solution for y and w is the same, just canging variables

        dy = ( w sy y ) / n
 
        sy = ( n yw y w ) / ( n y 2 ( y ) 2 )

 

The RST transform is a specific case of an affine transform which preserves lines and the ratios between distances.  The general equations are

        v = sxx x + sxy y + dx
        w = syx x + syy y + dy
 

and the specific values of the coefficients are

        v = sx cos ( θ ) x + sx sin ( θ ) y + dx
 
        w = sy sin ( θ ) x + sy cos ( θ ) y + dy

That is,

        sxx = sx cos ( θ )
 
        sxy = sx sin ( θ )
 
        syx = sy sin ( θ )
 
        syy = sy cos ( θ )

This is four equations with three unknowns, so we have the additional requirement that

        sxx syx = sxy syy
 

with the inverse relationships

        tan ( θ ) = sxy / sxx = syx / syy
 
        sy = ( syy / sxx ) sx = ( syx / sxy ) sx

The RST mapping is a rigid rotation of the plane when this condition holds. The affine transform also includes a skewing or slanting which this model does not permit.  Again, the order of the three operations is important.

This system has five unknowns, which we can get from three pairs of matching points that also satisfy the additional requirement.

        v1 = sxx x1 + sxy y1 + dx
 
        v2 = sxx x2 + sxy y2 + dx
 
        v3 = sxx x3 + sxy y3 + dx
 
        w1 = syx x1 + syy y1 + dy
 
        w2 = syx x2 + syy y2 + dy
 
        w3 = syx x3 + syy y3 + dy

The solution of the first three is

        dx = ( v3 p3 q3 q2 ( v2 p2 ) ) / ( r3 q3 q2 r2 )
        sxy = ( v2 p2 r2 dx ) / q2
        sxx = ( v1 sxy y1 dx ) / x1

where the intermediate values p, q, and r are

        p2 = x2 v1 / x1
 
        q2 = y2 ( y1 x2 / x1 )
 
        r2 = 1 ( x2 / x1 )
 
        p3 = x3 v1 / x1
 
        p3 = y3 ( y1 x3 / x1 )
 
        r3 = 1 ( x3 / x1 )

For the second set of three

        dy = ( w3 s3 q3 q2 ( w2 s2 ) ) / ( r3 q3 q2 r2 )
        syy = ( w2 s2 r2 dy ) / q2
        syx = ( w1 syy y1 dy ) / x1

using q and r from above and

        s2 = x2 w1 / x1
 
        s3 = x3 w1 / x1

Refining these values can also be done by minimizing the error function

        L = i = 1 n ( v i sxx x i sxy y i dx ) 2

which has partial derivatives to the constants

        L sxx = i = 1 n 2 x i ( v i sxx x i sxy y i dx )
 
        L sxy = i = 1 n 2 y i ( v i sxx x i sxy y i dx )
 
        L dx = i = 1 n 2 ( v i sxx x i sxy y i dx )

Setting these to zero

        2 x ( v sxx x sxy y dx ) = 0
 
        2 y ( v sxx x sxy y dx ) = 0
 
        2 ( v sxx x sxy y dx ) = 0
 

or

        xv sxx x 2 sxy xy dx x = 0
 
        yv sxx xy sxy y 2 dx y = 0
 
        v sxx x sxy y n dx = 0        

Solving for dx and inserting it in the other two equations reduces it to two unknowns

        k + sxx l + sxy m = 0
 
        p + syx q + syy r = 0

The solution to this, again using intermediate values to help, is

        sxx = ( r k + m p ) / ( m 2 r l )
 
        sxy = ( k + sxx l ) / m
 
        dx = ( v sxx x sxy y ) / n

where

        k = n xv x v
 
        l = ( x ) 2 n x 2
 
        m = x y n xy = q
 
        p = n yv y v
 
        r = ( y ) 2 n y 2

and q turns out to be equal to m.  syx, syy, and dy have the same solution except that w is used in place of v.

Implementation (ransac_st, ransac_rst)

Like the kd-tree, implementing the RANSAC algorithm doesn't require any new Chapel features.  The ransac_st program implements the scaling-translation mapping, ransac_rst rotation-scaling-translation.  To compile them

    make ransac_st

or  make ransac_rst

The first procedure called once we have lists of corners in each image is align_corners().  In pseudocode it does

    proc align_corners(corners1 : [?Lcnr1] corner, corners2 : [?Lcnr2] corner,

                       out bestmap : mapinfo) {

      build kdtree for corners2

      for i in 1..ntry {

        do {

          pick seeds and determine mapping

          if mapping valid {

            map corners1 onto corners2 plane

            match up corners

            if have enough matches, break out of while (try is done)

          }

        } until too many failures to get a match (try as a whole fails)

      }

      select best try

      re-create mapped corners and matching pairs

      refine mapping based on pairs

      return mapping

    }

When picking seeds we do not do so blindly; this would cause too many failed tries.  At a minimum we require the corners be similar: they must have about the same length and orientation (as determined by the sequence's start point) and must both be brighter or darker than the corner.  We also require that the mapping make sense, as checked in valid_map().  This means the scaling in x and y must be about the same and bounded, and for RST that the extra condition on the affine coefficients holds.  The bounds on sx and sy are necessary because there is a pathological situation where a large part of one image scales to a small, dense cluster of corners in the other, generating many false matches.  This consistency check is done after picking both pairs in ST, but must be used to screen the third pair of seeds for RST.  This requirement has a substantial cost, since we must calculate and validate the mapping for each corner (O(n)), but is necessary because the chance of finding a valid corner at random is too small.  These screenings take the form of checking each potential corner and adding it to a list if it's acceptable, then picking which to use at random from the list.

Although the corner detection and validation parameters will change with the mapping, we ignore the effect.  For example, limiting how much the start point of the ring can shift will ultimately restrict the rotations that will be accepted, and this parameter must be looser in RST than in ST.  Our assumption is that the features are stable over the mapping values we allow, but part of the experimentation with this algorithm is to see if that is true.

We use the formulas in the previous section to convert our seeds into the transform, and the mapping of corners1 just applies the transform.  Matching is done by taking the transformed corners1 and finding the nearest point in corners2 using the kdtree.  If this node has not yet been paired, then we record it and increment the number of matches.  This is a little sub-optimal in that there may be multiple potential pairings, and we're just taking the first, not the best.  The assumptions that mapsep is small and corner suppression separates the corners should reduce the chance of have multiple matches, but if the scaling differs very much from 1 then this may not be true.  Checking this is also part of our evaluation of the algorithm.

Once done, if this number is more than a given fraction matchfrac of the number of corners, then we declare the try a success, otherwise we record a failure and make another attempt.  Once all tries are done, we pick the one with the greatest number of matches as the overall winner.  We re-create the list of matching pairs and run the regression fit on them to generate the final mapping.

In addition to the command line arguments for changing the FAST detector, the RANSAC algorithm adds many more

    --ntry=<int>           number of times to try to get a match

    --nfail=<int>          number of failed seedings before abandoning try

    --mapsep=<int>         max distance between matches after mapping 1 to 2

    --seedsep=<int>        (RST) smallest distance between seeds

    --matchfrac=<real>     fraction of corners that must match to accept

    --dlen=<int>           difference in corner sequence length to be similar

    --dst<int>             difference in start point coords to be similar

    --outname=<string>     (RST) if provided, file name of image showing matches

There are also two parameters that define the maximum ratio between sx/sy or sy/sx (SAME_S_RATIO) and the absolute value for sx or sy or their inverses (MAXSCALE).  The RST version also adds a limits on the mismatch between the affine scaling coefficients (SAME_RIGIDITY).

You'll find three test programs provided.  test_map_rst is a self-checking test bench for the conversion between affine coefficients and RST description.  With test_fit_[r]st you provide the mapping on the command line and the program runs just the corner mapping and matching, reporting the number of matches found.  These programs require the full ransac_[r]st program, substituting a new main() procedure.  Because there are now two source files with main(), you'll notice the compilation command includes the --main-module compiler flag to resolve the ambiguity.

    chpl --main-module test_fit_st -o bin/test_fit_st test_fit_st.chpl

         img_png_v3.h build/img_png_v3.o -lpng

To run test_fit_[r]st, use the appropriate input files and threshold and provide the mapping you wish to try.  The programs will report the number of corners that match.

Operation

We've manually rotated, scaled, and shifted the base image to test the programs.  The images have a moderate number of corners, and because they are based on the same image there should be good agreement between the features.  The images are

 

 

rot

sx

sy

dx

dy

# corners

after suppress

bonds1

base

0

1.00

1.00

0

0

2770

747

bonds2_s

S

0

1.25

1.30

0

0

1568

562

bonds2_t

T

0

1.00

1.00

150

300

2772

748

bonds2_st

ST

0

1.25

1.30

150

300

1575

565

bonds2_rt

RT

30

1.00

1.00

0

700

2278

651

bonds2_rst

RST

30

1.25

1.30

150

1210

983

398

Either program should be run with --thr=10, for example

    bin/ransac_st --inname1=bonds1.png --inname2=bonds2_st.png --thr=10

    bin/ransac_rst --inname1=bonds1.png --inname2=bonds2_rst.png --thr=10

The FAST detector yields counts between 400 and 750 corners after suppression.  The translation does not affect the total, as it should not, but the scaling and rotation do.  The detail images show that the RST corners are a subset of the base; with one or two exceptions new corners have not been created.

Base image bonds1.png, FAST corners with thr=10
 
 
 
 
 
 

We ran both RANSAC programs for 20 trials on each test image against the base, with a threshold of 10 and all other parameters default.  The tables present the average and one-sigma deviation for the transform, as well as the number of tries that successfully found a mapping (good or bad), the average number of re-seedings per try (including failures), and the time in milliseconds per try.  The programs have been compiled with --fast.

ransac_st     --thr=10 --ntry=50 (default)     differences to actual in red

 

sx

dx

sy

dy

S

1.25

±0.0

0

-0.1

±0.0

-0.1

1.30

±0.0

0

0

±0.0

0

T

1.00

±0.0

0

150.0

±0.0

0

1.00

±0.0

0

300.0

±0.0

0

ST

1.25

±0.0

0

149.9

±0.0

-0.1

1.30

±0.0

0

300.0

±0.0

0

RT

-- no successes in any trial --

RST

-- no successes in any trial --

 

ransac_st     --thr=10 --ntry=50 (default)

 

success [%]

re-seeds

time/try [ms]

S

49 ±8

1087 ±92

186 ± 13

T

99 ±0

212 ±36

60 ± 9

ST

53 ±6

1063 ±60

188 ± 2

RT

-- no successes --

RST

-- no successes --

ransac_st performs very well with the unrotated test cases, producing accurate estimates of the scaling and translation.  It has more trouble with scaling than shifting, as the success rate drops to 50% and the number of trial seeds needed for each match grows strongly, which is reflected in the timing.  All mappings are accurate.  As might be expected it had no luck with the rotated images.

ransac_rst     --thr=10 --ntry=250 (default)     differences to actual in red

 

sx

dx

sy

dy

theta (deg)

S

0.88

±0.4

-0.38

214

±352

+214

0.87

±0.4

-0.42

435

±454

+435

-14.3

±53

-14.3

T

1.00

±0.0

0

152

±22

+2

1.00

±0.1

0

306

±50

+6

0.0

±2.0

0

ST

1.03

±0.3

-0.25

195

±225

+45

1.04

±0.4

-0.26

652

±528

+352

7.7

±52

+7.7

RT

1.00

±0.0

0

0

±0

0

1.00

±0.0

0

700

±0.0

0

30.0

±0.0

0

RST

1.25

±0.0

0

146

±146

-4

1.30

±0.0

0

1215

±22

+5

30.3

±1.2

+0.3

/

ransac_rst     --thr=10 --ntry=250 (default)

 

success [%]

re-seeds

time/try [ms]

S

23 ±2

1322 ±22

312 ± 6

T

58 ±3

944 ±42

307 ± 11

ST

23 ±2

1318 ±22

312 ± 5

RT

76 ±2

792 ±39

210 ± 9

RST

45 ±3

1121 ±32

185 ± 4

ransac_rst performs best on the rotated and translated images, but very badly with scaling without rotation.  The S and ST tests have large errors, and the high standard deviations indicate the measurements are not stable.  The number of successful tries is small, about half the rate of the good results, and this is reflected in the large number of seed trials for each and the corresponding run time.  Still, the results on the rotated images, and the RST test in particular are good.

Does the performance of the RST version get better with more tries?  Yes.  We made 20 runs with ntry between 50 and 500 in steps of 50.  The chart shows the percentage of the 20 runs that were accurate to within 1%, 5%, or 10% of the actual values.  All five parameters had to meet the accuracy threshold for the run to count.

 

For 450 and 500 tries, 19 of the 20 runs had sx within 1.25 ± 0.125, sy within 1.30 ± 0.13, dx within 150 ± 15, dy within 1210 ± 121, and theta within 30.0 ± 3.0 degrees.  14 of 20 runs had an sx of 1.25 ± 0.0125, sy 1.30 ± 0.013, dx 150 ± 1.5, dy 1210 ± 12.1, and theta 30.0 ± 0.3.  Plotting the ratio of the standard deviation to the average for each parameter gives an indication of the stability of the measurement.  There is a sharp decline at 150 tries for the scaling factors, and then a slow settlement.  This means there is a broad maximum in the parameter space that is likely to be found given enough tries (150), but that the true alignment is much smaller, as the accuracy results imply.

 

We can see what is happening by placing the input images side-by-side and drawing lines between matching corners.  (This has only been implemented in the RST version.  The image is generated if --outname is given.)  When the matching is poor the lines can clump, with large scaling factors that compress one image into a dense area of corners in another.  You can see this in the bad image, where everything maps into the middle tree, including the corner in the slide at the middle of the left edge and the tree in the lower left corner.  There doesn't seem to be a clear decision rule for separating bad and good results: the parameters of a bad match are still in the ranges we allow, and other tests have shown that adding requirements such as needing to have a wide distribution of matching corners over an image results in other pathologically bad matches.

 

 
.
 

Wrap-Up

The RANSAC example offered nothing in the way of new language features.  Instead it is a summary of what we have learned, using the C interface to read and write PNG images, the color conversion library to convert them to greyscale, and the FAST corner detector with its custom iterator and chunkarray generic class to store a list of corner features.  We added a generic class kdtree to efficiently search the plane for nearest corners.  We have used arrays, ranges, and slices extensively to process the images and lists.  The program contains both task-based parallelism in the kdtree (cobegin) and data parallelism (forall) in the RANSAC analysis.

Whether we put the RANSAC algorithm in our toolkit is still uncertain.  Its performance on real images, of natural scenes taken at different times by different people with different cameras, teases us with occasionally good results, but not consistently.  Scaling in particular gives it difficulty.  We have tried another approach with different features than corners, and the behavior is similar to what we see here: a rough alignment is reached, but there are many local maxima in the matching figure of merit (here the number of matching corners) that generate more matching pairs than the actual and give false results.  The evaluation of the algorithm is still going on, but it leads us outside the scope of this tutorial.

 

Files

The programs and images for this section are here.  A zip file is here.

A PDF copy of this text is here.