geometry/doc/design_rationale.qbk

[/==============================================================================
    Copyright (c) 2007-2012 Barend Gehrels, Amsterdam, the Netherlands.
    Copyright (c) 2008-2012 Bruno Lalande, Paris, France.
    Copyright (c) 2009-2012 Mateusz Loskot, London, UK., London, UK

    Use, modification and distribution is subject to the Boost Software License,
    Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
    http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[/ note the source code in this QBK is the only not (yet) checked by a compiler]

[section:design Design Rationale]

Suppose you need a C++ program to calculate the distance between two points.
You might define a struct:

    struct mypoint
    {
        double x, y;
    };

and a function, containing the algorithm:

    double distance(mypoint const& a, mypoint const& b)
    {
        double dx = a.x - b.x;
        double dy = a.y - b.y;
        return sqrt(dx * dx + dy * dy);
    }

Quite simple, and it is usable, but not generic. For a library it has to be
designed way further. The design above can only be used for 2D points, for the
struct [*mypoint] (and no other struct), in a Cartesian coordinate system. A
generic library should be able to calculate the distance:

* for any point class or struct, not on just this [*mypoint] type
* in more than two dimensions
* for other coordinate systems, e.g. over the earth or on a sphere
* between a point and a line or between other geometry combinations
* in higher precision than ['double]
* avoiding the square root: often we don't want to do that because it is a relatively expensive
  function, and for comparing distances it is not necessary

In this and following sections we will make the design step by step more generic.

[heading Using Templates]

The distance function can be changed into a template function. This is trivial and allows
calculating the distance between other point types than just [*mypoint]. We add two template
parameters, allowing input of two different point types.

    template <typename P1, typename P2>
    double distance(P1 const& a, P2 const& b)
    {
        double dx = a.x - b.x;
        double dy = a.y - b.y;
        return std::sqrt(dx * dx + dy * dy);
    }

This template version is slightly better, but not much.

Consider a C++ class where member variables are protected...
Such a class does not allow to access `x` and `y` members directly. So, this paragraph is short
and we just move on.

[heading Using Traits]

We need to take a generic approach and allow any point type as input to the distance function.
Instead of accessing `x` and `y` members, we will add a few levels of indirection, using a
traits system. The function then becomes:

    template <typename P1, typename P2>
    double distance(P1 const& a, P2 const& b)
    {
        double dx = get<0>(a) - get<0>(b);
        double dy = get<1>(a) - get<1>(b);
        return std::sqrt(dx * dx + dy * dy);
    }

This adapted distance function uses a generic get function, with dimension as a template parameter,
to access the coordinates of a point. This get forwards to the traits system, defined as following:

    namespace traits
    {
        template <typename P, int D>
        struct access {};
    }

which is then specialized for our [*mypoint] type, implementing a static method called `get`:

    namespace traits
    {
        template <>
        struct access<mypoint, 0>
        {
            static double get(mypoint const& p)
            {
                return p.x;
            }
        };
        // same for 1: p.y
        ...
    }

Calling `traits::access<mypoint, 0>::get(a)` now returns us our `x` coordinate. Nice, isn't it?
It is too verbose for a function like this, used so often in the library. We can shorten the syntax
by adding an extra free function:

    template <int D, typename P>
    inline double get(P const& p)
    {
        return traits::access<P, D>::get(p);
    }

This enables us to call `get<0>(a)`, for any point having the traits::access specialization,
as shown in the distance algorithm at the start of this paragraph. So we wanted to enable classes
with methods like `x()`, and they are supported as long as there is a specialization of the access
`struct` with a static `get` function returning `x()` for dimension 0, and similar for 1 and `y()`.

[heading Dimension Agnosticism]

Now we can calculate the distance between points in 2D, points of any structure or class.
However, we wanted to have 3D as well. So we have to make it dimension agnostic. This complicates
our distance function. We can use a `for` loop to walk through dimensions, but for loops have
another performance than the straightforward coordinate addition which was there originally.
However, we can make more usage of templates and make the distance algorithm as following,
more complex but attractive for template fans:

    template <typename P1, typename P2, int D>
    struct pythagoras
    {
        static double apply(P1 const& a, P2 const& b)
        {
            double d = get<D-1>(a) - get<D-1>(b);
            return d * d + pythagoras<P1, P2, D-1>::apply(a, b);
        }
    };

    template <typename P1, typename P2 >
    struct pythagoras<P1, P2, 0>
    {
        static double apply(P1 const&, P2 const&)
        {
            return 0;
        }
    };

The distance function is calling that `pythagoras` structure, specifying the number of dimensions:

    template <typename P1, typename P2>
    double distance(P1 const& a, P2 const& b)
    {
        BOOST_STATIC_ASSERT(( dimension<P1>::value == dimension<P2>::value ));

        return sqrt(pythagoras<P1, P2, dimension<P1>::value>::apply(a, b));
    }

The dimension which is referred to is defined using another traits class:

``
    namespace traits
    {
        template <typename P>
        struct dimension {};
    }
``

which has to be specialized again for the `struct mypoint`.

Because it only has to publish a value, we conveniently derive it from the
__boost_mpl__ `class boost::mpl::int_`:

``
namespace traits
{
    template <>
    struct dimension<mypoint> : boost::mpl::int_<2>
    {};
}
``

Like the free get function, the library also contains a dimension meta-function.

    template <typename P>
    struct dimension : traits::dimension<P>
    {};

Below is explained why the extra declaration is useful. Now we have agnosticism in the number of
dimensions. Our more generic distance function now accepts points of three or more dimensions.
The compile-time assertion will prevent point a having two dimension and point b having
three dimensions.

[heading Coordinate Type]

We assumed double above. What if our points are in integer?

We can easily add a traits class, and we will do that. However, the distance between two integer
coordinates can still be a fractionized value. Besides that, a design goal was to avoid square
roots. We handle these cases below, in another paragraph. For the moment we keep returning double,
but we allow integer coordinates for our point types. To define the coordinate type, we add
another traits class, `coordinate_type`, which should be specialized by the library user:

    namespace traits
    {
        template <typename P>
        struct coordinate_type{};

        // specialization for our mypoint
        template <>
        struct coordinate_type<mypoint>
        {
            typedef double type;
        };
    }

Like the access function, where we had a free get function, we add a proxy here as well.
A longer version is presented later on, the short function would look like this:

    template <typename P>
    struct coordinate_type :    traits::coordinate_type<P> {};

We now can modify our distance algorithm again. Because it still returns double, we only
modify the `pythagoras` computation class. It should return the coordinate type of its input.
But, it has two input, possibly different, point types. They might also differ in their
coordinate types. Not that that is very likely, but we’re designing a generic library and we
should handle those strange cases. We have to choose one of the coordinate types and of course
we select the one with the highest precision. This is not worked out here, it would be too long,
and it is not related to geometry. We just assume that there is a meta-function `select_most_precise`
selecting the best type.

So our computation class becomes:

    template <typename P1, typename P2, int D>
    struct pythagoras
    {
        typedef typename select_most_precise
            <
                typename coordinate_type<P1>::type,
                typename coordinate_type<P2>::type
            >::type computation_type;

        static computation_type apply(P1 const& a, P2 const& b)
        {
            computation_type d = get<D-1>(a) - get<D-1>(b);
            return d * d + pythagoras <P1, P2, D-1> ::apply(a, b);
        }
    };

[heading Different Geometries]

We have designed a dimension agnostic system supporting any point type of any
coordinate type. There are still some tweaks but they will be worked out later.
Now we will see how we calculate the distance between a point and a polygon, or
between a point and a line-segment. These formulae are more complex, and the
influence on design is even larger. We don’t want to add a function with another
name:

    template <typename P, typename S>
    double distance_point_segment(P const& p, S const& s)

We want to be generic, the distance function has to be called from code not
knowing the type of geometry it handles, so it has to be named distance. We also
cannot create an overload because that would be ambiguous, having the same
template signature. There are two solutions:

* tag dispatching
* SFINAE

We select tag dispatching because it fits into the traits system. The earlier
versions (previews) of Boost.Geometry used SFINAE but we found it had several
drawbacks for such a big design, so the switch to tag dispatching was made.

With tag dispatching the distance algorithm inspects the type of geometry of the
input parameters. The distance function will be changed into this:

    template <typename G1, typename G2>
    double distance(G1 const& g1, G2 const& g2)
    {
        return dispatch::distance
            <
                typename tag<G1>::type,
                typename tag<G2>::type,
                G1, G2
            >::apply(g1, g2);
    }

It is referring to the tag meta-function and forwarding the call to the [*apply] method of
a ['dispatch::distance] structure. The [*tag] meta-function is another traits class, and should
be specialized for per point type, both shown here:

    namespace traits
    {
        template <typename G>
        struct tag {};

        // specialization
        template <>
        struct tag<mypoint>
        {
            typedef point_tag type;
        };
    }

Free meta-function, like coordinate_system and get:

    template <typename G>
    struct tag : traits::tag<G> {};

[*Tags] (`point_tag`, `segment_tag`, etc) are empty structures with the purpose to specialize a
dispatch struct. The dispatch struct for distance, and its specializations, are all defined
in a separate namespace and look like the following:

    namespace dispatch {
        template < typename Tag1, typename Tag2, typename G1, typename G2 >
        struct distance
        {};

        template <typename P1, typename P2>
        struct distance < point_tag, point_tag, P1, P2 >
        {
            static double apply(P1 const& a, P2 const& b)
            {
                // here we call pythagoras
                // exactly like we did before
                ...
            }
        };

        template <typename P, typename S>
        struct distance
        <
            point_tag, segment_tag, P, S
        >
        {
            static double apply(P const& p, S const& s)
            {
                // here we refer to another function
                // implementing point-segment
                // calculations in 2 or 3
                // dimensions...
                ...
            }
        };

        // here we might have many more
        // specializations,
        // for point-polygon, box-circle, etc.

    } // namespace

So yes, it is possible; the distance algorithm is generic now in the sense that it also
supports different geometry types. One drawback: we have to define two dispatch specializations
for point - segment and for segment - point separately. That will also be solved, in the paragraph
reversibility below. The example below shows where we are now: different point types,
geometry types, dimensions.

    point a(1,1);
    point b(2,2);
    std::cout << distance(a,b) << std::endl;
    segment s1(0,0,5,3);
    std::cout << distance(a, s1) << std::endl;
    rgb red(255, 0, 0);
    rbc orange(255, 128, 0);
    std::cout << "color distance: " << distance(red, orange) << std::endl;

[heading Kernel Revisited]

We described above that we had a traits class `coordinate_type`, defined in namespace traits,
and defined a separate `coordinate_type` class as well. This was actually not really necessary
before, because the only difference was the namespace clause. But now that we have another
geometry type, a segment in this case, it is essential. We can call the `coordinate_type`
meta-function for any geometry type, point, segment, polygon, etc, implemented again by tag dispatching:

    template <typename G>
    struct coordinate_type
    {
        typedef typename dispatch::coordinate_type
            <
                typename tag<G>::type, G
            >::type type;
    };

Inside the dispatch namespace this meta-function is implemented twice: a generic version and
one specialization for points. The specialization for points calls the traits class.
The generic version calls the point specialization, as a sort of recursive meta-function definition:

    namespace dispatch
    {

        // Version for any geometry:
        template <typename GeometryTag, typename G>
        struct coordinate_type
        {
            typedef typename point_type
                <
                    GeometryTag, G
                >::type point_type;

            // Call specialization on point-tag
            typedef typename coordinate_type < point_tag, point_type >::type type;
        };

        // Specialization for point-type:
        template <typename P>
        struct coordinate_type<point_tag, P>
        {
            typedef typename
                traits::coordinate_type<P>::type
                type;
        };
    }

So it calls another meta-function point_type. This is not elaborated in here but realize that it
is available for all geometry types, and typedefs the point type which makes up the geometry,
calling it type.

The same applies for the meta-function dimension and for the upcoming meta-function coordinate system.

[heading Coordinate System]

Until here we assumed a Cartesian system. But we know that the Earth is not flat.
Calculating a distance between two GPS-points with the system above would result in nonsense.
So we again extend our design. We define for each point type a coordinate system type
using the traits system again. Then we make the calculation dependant on that coordinate system.

Coordinate system is similar to coordinate type, a meta-function, calling a dispatch function
to have it for any geometry-type, forwarding to its point specialization, and finally calling
a traits class, defining a typedef type with a coordinate system. We don’t show that all here again.
We only show the definition of a few coordinate systems:

    struct cartesian {};

    template<typename DegreeOrRadian>
    struct geographic
    {
        typedef DegreeOrRadian units;
    };

So Cartesian is simple, for geographic we can also select if its coordinates are stored in degrees
or in radians.

The distance function will now change: it will select the computation method for the corresponding
coordinate system and then call the dispatch struct for distance. We call the computation method
specialized for coordinate systems a strategy. So the new version of the distance function is:

    template <typename G1, typename G2>
    double distance(G1 const& g1, G2 const& g2)
    {
        typedef typename strategy_distance
            <
                typename coordinate_system<G1>::type,
                typename coordinate_system<G2>::type,
                typename point_type<G1>::type,
                typename point_type<G2>::type,
                dimension<G1>::value
            >::type strategy;

        return dispatch::distance
            <
                typename tag<G1>::type,
                typename tag<G2>::type,
                G1, G2, strategy
            >::apply(g1, g2, strategy());
    }

The strategy_distance mentioned here is a struct with specializations for different coordinate
systems.

    template <typename T1, typename T2, typename P1, typename P2, int D>
    struct strategy_distance
    {
        typedef void type;
    };

    template <typename P1, typename P2, int D>
    struct strategy_distance<cartesian, cartesian, P1, P2, D>
    {
        typedef pythagoras<P1, P2, D> type;
    };

So, here is our `pythagoras` again, now defined as a strategy. The distance dispatch function just
calls its apply method.

So this is an important step: for spherical or geographical coordinate systems, another
strategy (computation method) can be implemented. For spherical coordinate systems
have the haversine formula. So the dispatching traits struct is specialized like this

    template <typename P1, typename P2, int D = 2>
    struct strategy_distance<spherical, spherical, P1, P2, D>
    {
        typedef haversine<P1, P2> type;
    };

    // struct haversine with apply function
    // is omitted here

For geography, we have some alternatives for distance calculation. There is the Andoyer method,
fast and precise, and there is the Vincenty method, slower and more precise, and there are some
less precise approaches as well.

Per coordinate system, one strategy is defined as the default strategy. To be able to use
another strategy as well, we modify our design again and add an overload for the distance
algorithm, taking a strategy object as a third parameter.

This new overload distance function also has the advantage that the strategy can be constructed
outside the distance function. Because it was constructed inside above, it could not have
construction parameters. But for Andoyer or Vincenty, or the haversine formula, it certainly
makes sense to have a constructor taking the radius of the earth as a parameter.

So, the distance overloaded function is:

    template <typename G1, typename G2, typename S>
    double distance(G1 const& g1, G2 const& g2, S const& strategy)
    {
        return dispatch::distance
            <
                typename tag<G1>::type,
                typename tag<G2>::type,
                G1, G2, S
            >::apply(g1, g2, strategy);
    }

The strategy has to have a method apply taking two points as arguments (for points). It is not
required that it is a static method. A strategy might define a constructor, where a configuration
value is passed and stored as a member variable. In those cases a static method would be
inconvenient. It can be implemented as a normal method (with the const qualifier).

We do not list all implementations here, Vincenty would cover half a page of mathematics,
but you will understand the idea. We can call distance like this:

    distance(c1, c2)

where `c1` and `c2` are Cartesian points, or like this:

    distance(g1, g2)

where `g1` and `g2` are Geographic points, calling the default strategy for Geographic
points (e.g. Andoyer), and like this:

    distance(g1, g2, vincenty<G1, G2>(6275))

where a strategy is specified explicitly and constructed with a radius.

[/ 'TODO: What to do with this section? We have concepts already --mloskot]
[heading Point Concept]
The five traits classes mentioned in the previous sections form together the
Point Concept. Any point type for which specializations are implemented in
the traits namespace should be accepted as a valid type. So the Point Concept
consists of:

* a specialization for `traits::tag`
* a specialization for `traits::coordinate_system`
* a specialization for `traits::coordinate_type`
* a specialization for `traits::dimension`
* a specialization for `traits::access`

The last one is a class, containing the method get and the (optional) method
set, the first four are metafunctions, either defining type or declaring a
value (conform MPL conventions).

So we now have agnosticism for the number of dimensions, agnosticism for
coordinate systems; the design can handle any coordinate type, and it can
handle different geometry types. Furthermore we can specify our own
strategies, the code will not compile in case of two points with different
dimensions (because of the assertion), and it will not compile for two points
with different coordinate systems (because there is no specialization).
A library can check if a point type fulfills the requirements imposed by the
concepts. This is handled in the upcoming section Concept Checking.

[heading Return Type]

We promised that calling `std::sqrt` was not always necessary. So we define a distance result `struct`
that contains the squared value and is convertible to a double value. This, however, only has to
be done for `pythagoras`. The spherical distance functions do not take the square root so for them
it is not necessary to avoid the expensive square root call; they can just return their distance.

So the distance result struct is dependant on strategy, therefore made a member type of
the strategy. The result struct looks like this:

    template<typename T = double>
    struct cartesian_distance
    {
        T sq;
        explicit cartesian_distance(T const& v) : sq (v) {}

        inline operator T() const
        {
            return std::sqrt(sq);
        }
    };

It also has operators defined to compare itself to other results without taking the square root.

Each strategy should define its return type, within the strategy class, for example:

    typedef cartesian_distance<T> return_type;

or:

    typedef double return_type;

for cartesian (pythagoras) and spherical, respectively.

Again our distance function will be modified, as expected, to reflect the new return type.
For the overload with a strategy it is not complex:

    template < typename G1, typename G2, typename Strategy >
    typename Strategy::return_type distance( G1 const& G1 , G2 const& G2 , S const& strategy)

But for the one without strategy we have to select strategy, coordinate type, etc.
It would be spacious to do it in one line so we add a separate meta-function:

    template <typename G1, typename G2 = G1>
    struct distance_result
    {
        typedef typename point_type<G1>::type P1;
        typedef typename point_type<G2>::type P2;
        typedef typename strategy_distance
            <
                typename cs_tag<P1>::type,
                typename cs_tag<P2>::type,
                P1, P2
            >::type S;

        typedef typename S::return_type type;
    };

and modify our distance function:

    template <typename G1, typename G2>
    inline typename distance_result<G1, G2>::type distance(G1 const& g1, G2 const& g2)
    {
        // ...
    }

Of course also the apply functions in the dispatch specializations will return a result like this.
They have a strategy as a template parameter everywhere, making the less verbose version possible.

[heading Axis Order]

This is an important note for users who develop GIS applications, require compliance with OGC
standards or use variety of coordinate reference systems (CRS).

Boost.Geometry convention is that coordinate values of 2D tuple is always given according to
mathematical axis order: X,Y. Considering GIS applications, that always means easting,northing or,
in terms of geographic coordinate system: longitude,latitude.
Boost.Geometry does not allow or respect coordinate values listed in the axis order as
specified by coordinate reference system (CRS).

In practice, users may easily adapt point types with alternate axis ordering and
specify coordinate access in order as expected by Boost.Geometry.

[heading Summary]

In this design rationale, __boost_geometry__ is step by step designed using tag dispatching,
concepts, traits, and metaprogramming. We used the well-known distance function
to show the design.

__boost_geometry__ is designed like described here, with
some more techniques as automatically reversing template arguments, tag casting,
and reusing implementation classes or dispatch classes as policies in other
dispatch classes.

[endsect] [/ end of section Design Rationale]