We have implicitly used Chapel's concept of an iterator in our loops. Iterators are constructs that step through a sequence one item at a time. A loop processes these steps until there are no more.  Iterators are provided for ranges, but we can also create our own.  To demonstrate this, we'll program a circular iterator for a FAST corner detector.

The directory for this chapter is fast.  You'll be using the C functions for reading and writing PNG images in img_png_v3.[ch], and the color library ip_color_v3.chpl.  Use make name to build the programs, leaving off the '.chpl' suffix.  Executables get put in the bin sub-directory, intermediate files in build.  The sample image for this chapter is owl.png.

The files needed for this chapter are here (as zip file).  A PDF copy of this text is here.

Aside: Corner Detectors

A corner is a 2D feature in an image, a point where there are strong gradients in orthogonal directions.  If a block is aligned to the pixel grid then the vertical edge has a strong gradient over x but not y, and the horizontal edge has a strong gradient in y, not x.  As you approach the point where the edges meet the opposite gradient begins to increase, until at the corner both are present.  Alternatively we can look for patterns of "on" and "off" pixels in the neighborhood of a corner, where the difference between on and off is some threshold.  Corner detectors are run on greyscale images.

The Harris corner detector tests the gradients.  Using a differential edge detector to measure them, we smooth the data with a low-pass filter to reduce noise and then calculate the determinant and trace of the matrix of the second-order derivatives


dxxI   dxyI

dyxI   dyyI


where we approximate the second-order derivatives by applying the first derivatives twice. The metric

    det - kappa * trace * trace

gives the strength of the corner detector.  kappa is usually about 0.04 and smaller values pass more corners.  You can threshold this value to mark corners in the image.

The SUSAN corner detector sums the difference between each pixel within a circle to the center (xc, yc).  It uses a smoothing function (blue) instead of an abrupt threshold (red)


where t is a threshold parameter.  Note that this is inverted in the sense that similar intensities contribute the most.  If this sum is below half the maximum possible over the spot, then the detector verifies it has found a corner by looking at the pixels that contribute to it.  If their centroid is far from the center and all pixels along the line between the center and the centroid out to the edge of the circle belong to the set, then (xc,yc) is declared a corner.  Because the function is independent of the direction of the intensity change the detector works equally well for a bright wedge on a dark background as for a dark wedge in light.

SUSAN members in cyan, centroid in red

The FAST corner detector scans the perimeter of a circle about its center, marking pixels if they are above or below the center by a threshold, or considering them the same.  The detector requires that a significant fraction of the periphery be a continuous stretch above or below.  For example, in a circle of radius 3 there are 16 pixels in the circumference and we may require 9 or 11 in a row be strongly different in the same direction.  You may further add that the remaining pixels be near the center's intensity, although we will not do so.


An iterator is defined like a procedure but with the iter keyword..  It may take arguments.  It cannot overload another operator.  Inside the body you can use return without any expression, and may also use the yield statement to give the next value.  The iterator will be called repeatedly by the loop and will execute until it hits either a return and or yield.  At the return it ends.  At the yield it pauses, returning the yield's expression.  The next time the iterator is called it resumes from the point it paused with the same context as before; that is, the iterator operates as a closure.

An example duplicating a simple range with a stride is

   iter whilerange(lo : int, hi : int, stride = 1) {

     var cnt = lo;


     /* Check validity of range. */

     if (((hi < lo) && (0 < stride)) || ((lo < hi) && (stride < 0)) ||

         (stride == 0)) then return;

     if (0 < stride) {

       while (cnt <= hi) {

         yield cnt;

         cnt += stride;


     } else {

       while (hi <= cnt) {

         yield cnt;

         cnt += stride;




   for i in whilerange(1, 5) do writeln(i);

   > 1

   > 2

   > 3

   > 4

   > 5

   for i in whilerange(-1, -5, -1) do writeln(i);

   > -1

   > -2

   > -3

   > -4

   > -5

   for i in whilerange(1, 10, 2) do writeln(i);

   > 1

   > 3

   > 5

   > 7

   > 9

   for i in whilerange(1, 5, -1) do writeln(i);

   (no output, iterator not executed)

   for i in whilerange(-1, -5, 1) do writeln(i);

   (no output, iterator not executed)

   for i in whilerange(1, 5, 0) do writeln(i);

   (no output, iterator not executed)

An iterator that is evaluated, either as an expression or during assignment to a variable or array, generates its entire sequence stored in a one-dimensional array.


    > 1 3 5 7 9

This example is found in ex_iterator.chpl.  Compile and run it with

    make ex_iterator


Custom Iterators (fast_v1, test_fast_v1)

Our approach to implementing a new iterator evolved while working on this exercise.  The original plan was to have a 'circumference' that would step around a circle with a given radius at a central point.  For small radii, say r < 7, we could just list the points in a series of yield statements; the calculation wouldn't get any more efficient.  For large radii we could calculate the points, which involves squaring and a square root, taking advantage of the symmetry of the problem by only doing this in the first quadrant and reflecting into the other three.

Ah, but what if want to do this at every point in an image?  This calculation could be cached, of course, which means that in addition to the iterator we need to store the intermediate results.  A class would seem the natural way to do this.  For consistency we treat all cases the same, even for small radii, although this will be a little less efficient because the offsets to get the points will be stored in an array rather than hard-coded.

Class methods this() and these() (ex_class.chpl)

Both classes and records support two inherent methods we haven't talked much about yet.  The this() procedure is called if you pass arguments to an instance.

    var circle = new circumference(radius);

    circle(center_x, center_y);

The method call on the instance circle is to this(), which we would define inside the class as

    class circumference {

      proc this(center_x : int, center_y : int) : int { ... }


The these() procedure is called by a loop to iterate over the class.  The iterator takes no arguments, and follows the normal construction rules.

    class circumference {

      iter these() { yield pt1; yield pt2; ... }


these() is what we want to use, and we can abuse this() to get the behavior we want.  Our class will offer two ways of calling it.  First, we can completely specify the iteration – its radius, center point, and whether we want the circle to close by repeating the first point as the last (otherwise it stops just before) – when we construct the instance.  Giving the instance a these() will allow us to use it on the loop line

    for (x, y) in new circumference(radius, center_x, center_y, closed) { ... }

As written, this is a one-shot approach since there is no way to change these parameters outside the constructor.  The second way will take advantage of caching by providing a set_radius() method to pre-calculate the quadrant and then use the this() method to return an iterator for a given point.  The radius is fixed while the iterator moves about the image so that the cost of caching is paid once.  This means we will have to store the radius with the class.

    var circle = new circumference(radius);

    for (x, y) in circle(center_x, center_y, closed) { ... }


    for (x, y) in circle(center_x2, center_y2) { ... }

For the stand-alone mode we will need to implement a constructor and these().

    class circumference {

      var r : int;

      var x_center, y_center : int;

      var close_circle : bool;

      var Lquad : domain(1);         /* range for 1st quadrant points */

      var quad : [Lquad] 2 * int;    /* the points, as an (x,y) tuple */

      var quadcnt : int;             /* number of points in quad (incl.) */


      /* closed is an optional argument. */

      proc circumference(radius : int, xc : int, yc : int, closed = true) {


        x_center = xc;

        y_center = yc;

        close_circle = closed;



      proc set_radius(radius : int) {

        if (r != radius) {

          r = radius;





      iter these() {

        const xc = x_center;

        const yc = y_center;

        const closed = close_circle;


        for i in 0..quadcnt {

          var (x, y) = quad(i);

          yield (xc + x, yc + y);


        /* Second quadrant.  quadcnt-1 to not duplicate first point. */

        for i in 0..quadcnt-1 by -1 {

          var (x, y) = quad(i);

          yield (xc - x, yc + y);


        /* Third quadrant.  Start at one to not duplicate point. */

        for i in 1..quadcnt {

          var (x, y) = quad(i);

          yield (xc - x, yc - y);


        /* Fourth quadrant. */

        for i in 1..quadcnt-1 {

          var (x, y) = quad(i);

          yield (xc + x, yc - y);


        if (closed) {

          var (x, y) = quad(0);

          yield (xc + x, yc + y);




      proc prep_quadrant() {

        var mid = 0;            /* start of second half of quadrant */

        var x, y : int;


        Lquad = 0..2*r;         /* slightly oversize the array */


        /* The actual routine sets up the quadrant automatically for

           radii between 3 and 7. */


        quadcnt = 0;

        x = r;

        y = 0;

        do {

          quad(quadcnt) = (x, y);

          quadcnt += 1;

          y += 1;

          x = nearbyint(sqrt(r**2 - y**2)) : int;

        } while (y < x);

        do {

          y = nearbyint(sqrt(r**2 - x**2)) : int;

          quad(quadcnt) = (x, y);

          quadcnt += 1;

          x -= 1;

        } while (0 <= x);

        /* quadcnt is inclusive. */

        quadcnt -= 1;



The quad array is the cache.  prep_quadrant() handles each octant in the quadrant separately to ensure the ring is continuous, looping over y from 0 to 45° and x from 45° to 90°.  The domain Lquad of the array is slightly bigger than necessary because we don't have a way to calculate the number of points in advance.  The iterator steps through the cache for each quadrant, using the ring's symmetry to add or subtract the peripheral point from the center.  The only tricky part is skipping the first or last point from the range to avoid overlapping the last point of the previous quadrant.

For the cached version we need a different constructor taking only the radius, this(), and an iterator over the cached quad.  These get added to the class definition above.

      proc circumference(radius : int) {




      /* closed is an optional argument */

      proc this(xc : int, yc : int, closed = false) {

        return circum_iter(xc, yc, closed);



      iter circum_iter(xc : int, yc : int, closed : bool) {


        /* Same as the 'these' function above, except it uses the arguments

           and not the members. */


The program ex_class.chpl shows this framework, which we used to figure out how to use this() and these().  Compile and run it with

    make ex_class


It has some differences to the final approach.  In the example the this() method stores the center point as a class member and then returns the these() iterator.  This would not work in a parallel loop because the center for each task would overwrite the others and the loop would be unpredictable.  We had to define circum_iter() so that all variables are local to it.  Each task sees a separate version and can operate independently.

There is one glaring bit of ugliness with this approach.  If we cannot call these() from this() and be safe, we also cannot call circum_iter() from these().  Being an iterator, these() must contain a yield statement.  It cannot call out to another iterator because it must return void (no value), unlike this() which can return the the iterator.  Compilation would fail if you were to try this.  So we are stuck with two nearly identical copies of the iterator; in fact, circum_iter() is just these() without the first three assignments.

Also ugly, but less glaring, is using this() as an iteration method.  If you were willing to have a method call in the loop, you could delete it and directly use

    for (x, y) in circle.circum_iter(xc, yc) { ... }

Also ugly is that the stand-alone version will leak memory because there's no way to call delete on the instance when the loop is done and the instance goes out of scope.  We wrote this before learning that classes require manual management.

FAST Corner Detector (fast_v1.chpl, test_fast_v1.chpl)

The corner detector program takes a color PNG image, converts it to a LAB or YUV greyscale (ie. uses the L or Y planes), and then checks each pixel to see if it meets the FAST criteria.  If so, it marks the pixel on a copy of the greyscale image.  main() calls mark_corners() to make the copy and perform the check; each can be done in parallel forall loops.  Using forall is all that needs to be done to have the program running in parallel.

The FAST test is done in the is_corner() procedure.  It classifies each pixel in the circumference as being more than a threshold amount above (MORE) or below (LESS) the center, or within the threshold (SAME), using the pixel_thrdir() procedure.  It then counts the number of consecutive MORE or LESS pixels, taking care to wrap around at the first point.  If this count is within the acceptable range, then the routine returns true, the pixel is at a corner.

Compile the program with

    make fast_v1

You can adjust the analysis parameters on the command line:

    --inname=<file>        input file

    --outname=<file>       greyscale copy with corners marked in red

    --space=[LAB|YUV]      which greyscale plane to use, L or Y

    --radius=#             radius of the detector

    --minlen=#             minimum count of consecutive MORE or LESS pixels

    --maxlen=#             maximum count of consecutive MORE or LESS pixels

    --thr=#                greyscale difference to label as MORE or LESS

The FAST detector normally uses a ring of radius 3, which gives 16 pixels in the circumference.  There's 20 for r=4, 28 for r=5, 32 for r=6, and 40 for r=7 using the patterns found in fast_v1.chpl.  minlen and maxlen will therefore change depending on the radius, and are normally close to 3/4 of the circumference, representing a right-angle.  We use minlen=10 and maxlen=13 as our defaults.  Remember too that the data range for L is 0-100 and Y 16-235.  The threshold should change with the color space.  You'll find that varying this parameter will change the sensitivity of the detector: too low and there will be many false positives, too high and you will mark only the strongest corners.

To run the program on the test image do

    bin/fast_v1 --inname=owl.png --outname=owl_corners.png



You can also run color_checker in kmeans to generate a rectangle grid of differently colored squares and use this as the input image.


    bin/fast_v1 --inname=colors.png --outname=color_corners.png

The artificial scene is accurately marked.  Natural scenes tend to be quite a bit noisier.  Try different thresholds. For example, --thr=20 on owl.png gives 181 corners, --thr=15 724, and --thr=25 36.  What threshold in the YUV space gives a result similar to the default LAB?


The program test_fast_v1.chpl is a self-checking test bench for the circumference iterator and the is_corner() classification.  Since it uses fast_v1.chpl as a module, we have the problem that there are two modules with main() routines.  To compile you'll need the extra compiler argument to pick one

    chpl --main-module test_fast_v1 -o bin/test_fast_v1 test_fast_v1.chpl


or  make test_fast_v1

    bin/test_fast_v1 --minlen=9

Corner Suppression (fast_v2, test_fast_v2)

If you run the detector on a natural scene, you'll notice the number of corners is very sensitive to the threshold, as we saw with the owl picture.  Trees and plants in general create problems.  The literature contains ideas of how to reduce the count.  For example, a comment in Klette (Concise Computer Vision) mentions using a figure of merit, the largest difference in intensity between the FAST sequence and ring center, to rank the corners, dropping those below some threshold.  Szeliski (Computer Vision: Algorithms and Applications) talks about adding a spatial separation criteria to the decision.  We can combine these ideas to suppress corners that aren't locally maximum in the figure of merit sense.  We will add a suppression distance parameter that will chain together all corners whose x or y coordinates are within this distance of another member of the group, keeping the one with the highest figure of merit.  The others will be dropped.

We'll need an expandable, sortable list to hold the details about the corners, which we'll implement as a generic class.  This will give us the chance to cover Chapel's support for generics.

Generic Classes and Records

Chapel considers a class, record, or function to be generic if you can customize the type of its members or arguments, or the size with parameters.

You can make a generic class or record in three ways.  First, you can create a type member

    class chunkarray {

      type eltType;


This identifier can be used in any position that a type normally would, including argument lists of methods, declarations, and return types.

    class chunkarray {

      type eltType;

      var data : [Lalloc] eltType;           /* our payload */


      /* This allows array-like access to the data, ie. instance(index).

         It returns an element at the index ind with the type of our

         payload. */

      proc this(ind : int) : eltType {

        return data(ind);



      /* This will store an element of the generic type in the array. */

      proc append(const in val : eltType) {

        /* ... */



If you need a procedure to return a type, by the way, then put the intent keyword type after the declaration, without a colon

    proc get_type() type {

      return eltType;


Arguments can also take a type intent, in which case the type, not the value, in the caller is passed to the procedure.  

The second way of creating a generic class or record is to use a parameter member

    class chunkarray {

      type eltType;

      param initalloc : int;

      var data : [1..initalloc] eltType;


Third, you can omit a type when you declare a member.  The type will be assigned when the class or record is instantiated.  A node in a linked list might be

    class listnode {

      var val;

      var next : listnode(val.type) = nil;


The generic component to the class is set up with a type constructor.  The compiler will generate one for each of these three cases with the usual rules: one argument in the order listed in the class with its name for keyword access.  It does not include arguments for non-generic members.  The constructor when called returns an instance type with the types of the members set correctly.  Use this instance type in the declaration of a variable.  If you initialize the type or param members, then these values become the defaults in the constructor; otherwise the argument must be provided.

As an example of a generic class, we'll develop a chunked array that we'll use to store the corners we find.  We don't know in advance how many corners there are, so the data structure needs to grow.  We'll want to access elements randomly, so a linked list will be too inefficient.  We'll use an array list.  To avoid having to re-size the array with each insertion once it fills, we over-size the growth using chunks.  Re-allocations are needed only when the chunk fills, which amortizes the cost of the growth.  Often array lists double in size when they grow.  We'll use a fixed increment to keep the overhead small, at a cost of having to re-size more often.

For our example, we'll keep one array of data with two ranges, the first representing the allocated elements and the second the subset of actual data that's been stored.  We'll have two parameters that set the initial array size and the amount to grow when full.  Finally, we'll need a synchronized variable to guard access when adding elements; we want to be able to use the array safely within a parallel block.  The declarations look like

    class chunkarray {

      type eltType;

      param INIT_ALLOC = 2;     /* values for demo */

      param GROW_ALLOC = 2;

      var Lalloc : domain(rank=1) = 1..INIT_ALLOC;

      var Ldata : domain(rank=1) = 1..0;

      var data : [Lalloc] eltType;

      var lock$ : sync int = 1;


The class is generic because we have a type member; we'll need to provide that when instantiating.  Notice that the Ldata range is initially empty because the end index is less than the beginning.  We've also initialized lock$ so its flag starts in the full state.

Adding an element means checking that we have room; if not, then we increase the Lalloc range which automatically re-sizes the data array.  There is no method to expand a range on just one side, so we need to slice off the expansion on the lower end.  Alternatively, we could avoid expand() by tracking the size ourselves, but we want the range to manage itself.  The lock$ variable guarantees that only one call is active at a time; we use the explicit readFE() and writeEF() procedures to make this clear.  We use a const in intent for the value so we make a copy.

      proc append(const in val : eltType) {



        /* Slice the expansion to trim negative indices. */

        if (Ldata.high == Lalloc.high) {

          Lalloc = Lalloc.expand(GROW_ALLOC)[1..];

          /* For demo to show we're re-allocating. */

          writeln("grew array to ", Lalloc);


        /* Now there's room for the new element. */

        Ldata = Ldata.expand(1)[1..];

        data(Ldata.high) = val;




We'll also provide this() and these() methods so we can use this class like an array.  this() will return a copy of the valid data.  You must provide parentheses, there is no version of the method without.  It is not possible to return a reference to a slice in an array, so to access the actual array you would have to use the members directly.  this(int) returns the element at an index. these() will step over just the data, not the entire allocation.  Finally, we'll add a size() method to return the number of data points, which is also the largest index (inclusive).

       proc this() {

         return data[Ldata];


       proc this(ind : int) : eltType {

         return data(ind);


       proc these() {

         for d in data[Ldata] do yield d;


       proc size() {

         return Ldata.high;


To demonstrate this class, let's define a simple record to hold some details about a corner, the center point xc, yc and the length len of the consecutively differing pixels

     record corner {

       var xc, yc : int;

       var len : int;


We create a new chunkarray to hold corners, and an instance of the record to show that the values will be copied into the array, not a pointer to the record. The chunkarray instantiation requires the generic type be passed as an argument.

     var corners = new chunkarray(corner);

     var details : corner;

We add a few data points

     details.xc = 100; details.yc =  50; details.len = 10;


     details.xc = 150; details.yc =  75; details.len = 13;


     details.xc = 125; details.yc = 100; details.len = 12;


     > grew array to {1..4}

before we can show how to access the data

     writeln("corners data in " , corners().domain, "   size ",


     > corners data in {1..3}   size 3

     writeln("corner #2 ", corners(2));

     > corner #2 (xc = 150, yc = 75, len = 13)

     for c in corners do

       writeln("corner at " c.xc, ", ", c.yc, "   len ", c.len);

     > corner at 100, 50   len 10

     > corner at 150, 75   len 13

     > corner at 125, 100   len 12

The full example is in ex_genclass.chpl.  Compile and run it with

    make ex_genclass


The default values for the parameters will of course be much higher in the actual program.  We'll see another example of a generic class in the next section when we build a kd-tree.

Generic Procedures

A generic procedure is created through its argument list in one of five ways. You can also add clauses that restrict which generic function is called when the compiler resolves overloads.  First, the intent of an argument can be type or param.  If type, then you must pass a type for that argument and the argument can be used as a type in variable declarations and casts.

    proc castto(type t, val : real) {

      return val : t;


    writeln(castto(int(32), 314.159));

    > 314

    writeln(castto(uint(8), 314.159));

    > 58

If param, then you must pass a parameter expression and compilation occurs with that value.  Second, you can omit the type of an argument in which case the value provided by the caller determines it.  The language spec (Formal Arguments without Types) gives a good example of a procedure to construct an arbitrarily sized tuple filled with a value of any type.

    proc filltuple(param cnt: int, x) {

      var result : cnt * x.type;

      for param i in 1..cnt do result(i) = x;

      return result;


    var ex1 = filltuple(4, 2.178);

    writeln("  fill 4-tuple ", ex1);

    >   fill 4-tuple (2.178, 2.178, 2.178, 2.178)

    writeln("  type is ", ex1.type : string);

    >   type is 4*real(64)

To print the type we only need to cast it to a string.  The parameter sets the size of the tuple, and the unspecified type of the second argument determines the type of each element.  You'll find this example in ex_genproc.chpl. The third way to define a generic procedure is with a query for a type.  Like an argument without a type, the procedure call will determine which type is substituted for the query.  Instead of getting the type through the variable, you can then use the type name directly.  A queried type is indicated by a question mark preceding an identifier and that identifier is used in place of a type.  The filltuple() example could be written

    proc filltuple(param cnt : int, x : ?t) {

      var result : cnt * t;

      for param i in 1..cnt do result(i) = x;

      return result;


Queried arguments are used extensively in the Chapel source code.  Here's the safeAdd() procedure in CHPL_HOME/modules/internal/ChapelUtil.chpl, which checks if it's possible to add two integers of the same type without overflow.  The query on the first argument sets a requirement on the second – both must have the same type.  The procedure isIntegralType() is defined in CHPL_HOME/modules/standard/Types.chpl; it is generic and returns true for int or uint types.

    proc safeAdd(a : ?t, b : t) {

      if !isIntegralType(t) then

        compilerError("Values must be of integral type.");

      if a < b {

        if b >=0 {

          return true;

        } else {

          if b < min(t) - a {

            return false;

          } else {

            return true;



      } else {

        if b <= 0 {

          return true;

        } else {

          if b > max(t) - a {

            return false;

          } else {

            return true;






    proc isIntegralType(type t) param

      return isIntType(t) || isUintType(t);

The fourth way to specific a generic procedure is to pass a generic class or record as an argument type.  We will have a procedure that takes a chunkarray of corners to suppress, which makes this generic.

    proc suppress_corners(rawcnr : chunkarray(corner)) : chunkarray(corner) {

      /* ... */


You can query details about the generic type.  If we had written

    proc suppress_corners(rawcnr : chunkarray(?t)) : chunkarray(t) {

      /* ... */


then Chapel would assign t to corner during compilation and create a version of this procedure with the specific type chunkarray(corner).  The definition of the &= operator in CHPL_HOME/modules/internal/ChapelBase.chpl provides an example of this detailed query.  If the width of the integer arguments is the same, then the primitive low-level function can be called.  If not, then casting is needed and Chapel uses the explicit form.

    inline proc &=(ref lhs : int(?w), rhs : int(w)) {

      __primitive("&=", lhs, rhs);


    inline proc &=(ref lhs : uint(?w), rhs : uint(w)) {

      __primitive("&=", lhs, rhs);


    inline proc &=(ref lhs, rhs) {

      lhs = lhs & rhs;


This detailed query also applies to the fifth type of generic procedure, which has an argument with an array of a generic type.

To help with overloading, generic procedures may have a where clause between the argument list/return type and block of code.  The conditions in the clause must be true for Chapel to pick that version of the procedure to use.  This technique is used extensively in the built-in modules.  Consider the prototypes for the sorting functions found in CHPL_HOME/modules/packages/Sort.chpl.

    proc heapSort(Data : [?Dom] ?elType, comparator : ?rec=defaultComparator) {

         where Dom.rank != 1 {

      compilerError("heapSort() requires 1-D array");


The clause restricts HeapSort to one dimensional arrays.  The queries are used to extract the domain from the array to be sorted, the type of the array's elements, and the comparison function/record.  In CHPL_HOME/modules/internal/ChapelTuple.chpl you'll find as one of the addition operators

    inline proc +(x : _tuple, y : x(1).type) where isHomogeneousTuple(x) {

      /* ... */


Here the function isHomogeneousTuple() returns true if all the members of the tuple x have the same type.  The type of y says that it too must be the same by matching the first tuple member, so that we can add the values safely.  As a final example, the fillRandom procedure in CHPL_HOME/modules/standard/Random.chpl which fills an array with random values has the prototype

    proc fillRandom(arr : [], seed : int(64) = SeedGenerator.oddCurrentTime,

                    param algorithm = defaultRNG)

         where isSupportedNumericType(arr.eltType) {

      /* ... */


Implementation (fast_v2, test_fast_v2)

Our suppression of corners begins by modifying fast_v1 to generate details about each corner, namely its location and the maximum pixel difference figure of merit.  We'll sort them first by their x position, then their y, so we can quickly abandon pairs whose separation along either coordinate is more than the parameter suppsep, which will typically be between one and two times the FAST radius (if one then the points must be within each other's FAST circle, if two then the circles will touch).  We'll store all the close corners in a group, using a modified priority heap to keep track of the biggest, with ties resolved randomly.  When all pairs have been processed we take the biggest and mark them on the greyscale image.

Now the details of the implementation, which is found in fast_v2.chpl.  Compile it with

    make fast_v2

The program takes the same command line options as fast_v1, plus --suppsep=# for the maximum difference between the x or y coordinates of the pair.

The procedure is_corner_with_details() has been added to return a corner record with the extra information we need.  In addition to the center and figure of merit, called dpix, we track the starting point and length of the consecutive run of differing pixels.  The procedure is_corner() is now a wrapper around this function that throws away the details.

find_corners() takes the details and stores them in a chunkarray.  The corner detection is done in parallel, which is why the append() method in the chunkarray needs to be synchronized.   It calls down to suppress_corners() to reduce the list, then cleans up before returning it.

    proc find_corners(img : clrimage, spec : fastspec) : chunkarray(corner) {

      const circle = new circumference(spec.radius);

      const Ainside = img.area.expand(-spec.radius, -spec.radius);

      var corners = new chunkarray(corner);


      forall (y,x) in Ainside {

        var details : corner;

        if (is_corner_with_details(img, x, y, spec, circle, details)) {





      var retval = suppress_corners(corners);


      delete circle;

      delete corners;

      return retval;


The suppress_corners() procedure uses a standard sort module, defining the comparison relationships between corners so that sorting is first by x, then y.  It looks at all pairs that are close enough.  The corner with the better figure of merit is registered as the parent of the other.  We use a flat array whose elements represent the index of the parent, or -1 if the corner is a root (or hasn't yet been assigned), to hold the tree.  Since we don't care about maintaining a proper priority heap through all levels of the tree – only the root needs to be the biggest – we go up from each of the two corners to their roots (which may well be the corners themselves if they haven't been placed yet in a tree) and put the smaller under the bigger; this is done in the set_parent() procedure.  After processing all corners we then assemble those without any parents, in other words the best of each group.

    proc suppress_corners(rawcnr : chunkarray(corner)) : chunkarray(corner) {

      var cnr = rawcnr();

      var suppcnr = new chunkarray(corner);

      var parent : [rawcnr.Ldata] int;


      /* Comparison function for sort, first by center x, then y. */

      proc : corner, c2 : corner) : int {

        if (c1.xc == c2.xc) then return (c1.yc - c2.yc);

        else return (c1.xc - c2.xc);





      /* Remember, sets all elements to -1. */

      parent = -1;


      for c1 in cnr.domain {

        for c2 in cnr.domain[c1+1..] {

          /* Sorted by x, so we don't have to check c2.xc < c1.xc. */

          if ((cnr(c1).xc + suppsep) < cnr(c2).xc) then break;

          if (abs(cnr(c1).yc - cnr(c2).yc) <= suppsep) {

            set_parent(cnr, c2, c1, parent);





      for c1 in cnr.domain do

        if (parent(c1) < 0) then suppcnr.append(cnr(c1));


      return suppcnr;


mark_corners() no longer does the FAST analysis.  It now takes a list of corners, which could be the raw or suppressed sets, and places a small colored cross in a copy of the greyscale image.

If you run this version on the test image with the default threshold, the number of corners drops from 181 to 123.  The suppression is clearest in the trees at the top, although there are doubles removed around the owl as the detail image shows.  (For --thr=15 the number drops from 724 to 361, for --thr=25 from 36 to 33.)



Like the Gabor filters, a corner detector is a per-pixel operation and therefore one that's very easy to have run in parallel.  We've seen how an iterator works in Chapel, and how to implement a custom one.  We've also seen how standard class methods allow it to be used to drive a loop.  We've written a generic class to store the corners as they're generated, so we can eliminate many of the groupings that appear in natural scenes.  We've learned:

A corner detector is an example of a feature detector that highlights unique aspects of an image.  We'll use it to look for common points in two different images by randomly matching points and looking for the best agreement.


1. Create a “spot” detector that covers the circumference and its interior.  This can be used to implement a SUSAN corner detector (see for the original paper describing the detector and its implementation).  How does the SUSAN compare with the FAST?

2. Using a square kernel for a Gaussian low-pass filter leads to some distortion of the underlying circular symmetry.  Write a “spot” convolver that takes a circular kernel.


The programs and image for this chapter can be found here.  A zip file is here.

A PDF version of this text is here.