Tuesday, 10 December 2013

Money in Brazil is Real

Nice! You are going to Brazil to see the World Cup or the Olympics or maybe just for enjoying 6,000km of sunny wonderful coast?

In this case you have heard that money in Brazil is Real. Yeah... this is the name of the currency: Real. Its symbol is BRL, which means Brazilian Real.

In Brazilian Portuguese (yes: in Brazil we speak Portuguese, not Spanish!), the word real has two meanings:

* like real in English
* like royal in English

So, now you know that Brazilians, despite far from monarchy for almost 2 centuries now, they have a currency which is royal. Yes, royal !

I know, I know... English speakers make jokes with Brazilian money, which is real, not imaginary, isn't it? .... LOL ... not a problem... Brazilians are easy going and we love jokes, any kind of joke, lots of them, politically correct or not, racially correct of not, sexually correct or not, religiously correct or not... because this is how the world is: with all sorts of things which make this world. So, regardless of your political orientation, sexual orientation, your race or religion or anything it might be... it's time for a joke.

By the way, prepare your mood for Brazilians, in particular in Rio. Cariocas (plural of Carioca: people who have born in Rio) are known for their quick thinking when there's a situation which might lead to a joke. You may miss a joke, I may miss a joke, but Cariocas never miss one. So, do not be offended if you suddenly become victim of a joke... better learn quickly how to make others victims of your jokes too. :)

Living in England I know that many jokes are not acceptable here. To be more realistic, Brits make lots of jokes too, about everything ... but only in the privacy of their homes, where they are sure that they will not be prosecuted for being naughty with someone else's race, religion or sexual orientation.

Things are different in Brazil. Bullying is national culture, all over the place. Or still national culture, to be more realistic. Unfortunately, British mood and life style is slowly contaminating Brazilians. Unfortunately, bullying is becoming something not acceptable anymore. Maybe globalization is responsible for this. This is pretty sad and absolutely unacceptable!

You may find the previous paragraph nasty. But be sure, it's your fault and only your fault. This is your mindset, which is shamefully narrow and only tuned to your own culture for decades, not being able of seeing anything outside your small island. When you go to another country, you have to adapt yourself to the other culture's mindset. It's that simple! :)

Well, anyway... despite that humour in Brazil is not as good as it was one decade ago, it's still very far for Brazilians a day when only jokes about the weather would be socially acceptable.

Sorry ... I couldn't resist to this joke... LOL

Sunday, 8 December 2013

Using TypeTokens to retrieve generic parameters

Note: I've recovered this article from http://archive.org.
The original article is  presented here mostly untouched.



Super Type Tokens, also known by Type-safe Heterogenerous Container (or simply THC) is very well described in article by Neal Gafter, who explains how Super Type Tokens can be used in order to retrieve Run Time Type Information (RTTI) which would be erased otherwise, in a process known as type erasure.


Overview

There are circumstances where you'd like to have a class which behaves in different ways depending on generic parameters.

Contrary to what is widely accepted, type erasure can be avoided, which means that the callee has ability to know which generic parameters were employed during the call.

For example, imagine a List which would not rely on Java Collections Framework but on array of primitive types, because performance would be much better than JCF classes. So, you'd like to tell List that it should internally allocate an array of ints or an array of doubles, depending on a generic parameter you specify. Something like this:
List<Integer> myList = new PrimitiveList<Integer>()
... would be backed by int[] whilst
List<Double> myList = new PrimitiveList<Double>()
... would be backed by a double[].

The problem: Type Erasure

When Generics was implemented in Java5, it was decided that this feature would be offered by javac (the Java compiler) and only very minimum changes would be implemented in other components of the architecture. The big benefit of this decision is that the implementation of this feature was relatively simple and imposed only relatively minimum risk to existing Java applications, guaranteeing compatibility and stability of the Java ecosystem as a whole.

The problem with this implementation is that only javac knows about generic types you specified in your source code. This knowledge exists only at compilation time. At run time, your callee class has no clue what generic parameters were employed during the call. It happens because information relative to generic types is lost in a process known as type erasure, which basically means that javac does not put type information it has at compilation time in the bytecode, which ultimately means that your running application does not know anything about type information you've defined in your source code.

Confused? Well ... it basically means that the code below is not possible:
class MyClass<T> {
    private final T o;

    public MyClass() {
        this.o = new T();
    }
}
... because at run time MyClass does not actually know anything about the type of generic parameter T. In spite javac at compile time is able to perform syntax and semantics validation of your source code, at run time all information regarding generic type T is thoroughly lost.

Actually, the previous statement may not be 100% correct under certain circumstances. This is what we will see in the next topic.

How type erasure can be avoided

When Generics was implemented in Java5, the type system was reviewed and long story short, information about generic types can be made available at run time under specific circumstances. This is a very important concept to our discussion here:
Generic types are available to anonymous classes.

Anonymous classes

Let's debate a little bit what an anonymous class is and also what it is not.
Let's suppose we are instantiating MyClass like this:
MyClass<Double> myInstance = new MyClass<Double>() {
        //
        // Some code here
        //
        // In this block we are adding functionality to MyClass
        //
};

We are actually creating an instance of MyClass, but also we are adding some additional code to it, which is enclosed by curly braces.

What it means is that we are creating an object myInstance of an anonymous class of MyClass. It does not mean that MyClass is itself an anonymous class! MyClass is definitely not anonymous because you have declared it somewhere else, correct?

In the snippet of code above we are using something which is an extended thing made from our original definition of MyClass plus some more logic. This extended thing is actually the anonymous class we are talking about. In other words, the myInstance.class was never declared, which means it is anonymous.

How javac handles anonymous classes

When javac finds an anonymous class, it creates data structures in the bytecode (which are available at run time) which holds the actual generic type parameters employed during the call. So, we have another very important concept here:
The Java compiler employs type erasure when objects are instantiated
except when objects are instantiated from anonymous classes.

In other words, our aforementioned MyClass do not know any type information when it is called like this
MyClass<Double> myClass = new MyClass<Double>();
but it does know generic type information when it is called like this:
MyClass<Double> myClass = new MyClass<Double>() { /* something here */ };

In order to obtain generic type information at run time, you do have to change the call, in order to employ an anonymous class made of your original class and not your original class directly. In the next topic we will cover what needs to be done in your implementation of MyClass in order to retrieve generic type information, but a very specific concept is that it will not work unless you call an anonymous class of your defined class. So:
MyClass<Double> myClass1 = new MyClass<Double>();     // type erasure DOES happen
MyClass<Double> myClass2 = new MyClass<Double>() { }; // type erasure DOES NOT happen!

Notice that you only need to have an anonymous class. If you don't have any additional logic to be added if you don't need anything additional. Like you see when object myClass2 was created, there's an anonymous block which is absolutely empty, in this example.

Classical solution

Let's review what we are interested here: we are interested on generic types, which are types. Observe that types are ultimately class definitions. So, we would like to give our class MyClass<T> the ability to know that its T generic parameter is actually a T.class.

In our classical solution described here it can be done very easily simply passing what we need during the call. This is something like this:
MyClass<Double> myClass = new MyClass<Double>(Double.class);

Observe that this is not a very good solution because you have to tell Double three times: (1) when you define the type, (2) when you pass the generic parameter and (3) when you pass the formal parameter Double.class. It looks too verbose and too repetitive, isn't it?

Anyway, this is what the great majority of developers do. They simply tell that Double is generic parameter and then they tell Double.class just after as a formal parameter during the call. In spite it works, the code does not look optimal and it even may lead to bugs later when your application becomes bigger and you start to refactor things, etc.

More flexible solution

We already visited a classical solution for the problem of type erasure and we had already seen how an anonymous call can be done. Now we need to understand how generic types can be retrieved at run time without having to pass Double so many times as we did in our classical solution.

Going straight to the point, let's define an skeleton of our MyClass which does the job we need. Joining some ideas from the classical solution and using some incantation offered by a class called TypeTokenTree. Below we explain the general concept:
import org.jquantlib.lang.reflect.TypeTokenTree;

public class MyClass<T> {

    private final Class<?> typeT;

    public MyClass(final Class<?> typeT) {
        this.typeT = typeT;
        init();
    }

    public MyClass() {
        this.typeT = new TypeTokenTree(this.getClass()).getElement(0);
        init();
    }

    private init() {
        // perform initializations here
    }
}

The code above allows you to call MyClass employing 2 different strategies:
MyClass<Double> myClass1 = new MyClass<Double>(Double.class); // classical solution
MyClass<Double> myClass2 = new MyClass<Double>() { };         // only sorcerers do this

Notice that object myClass1 employs the classical solution we described, which is what the great majority of developers do. The object myClass2 was created using the incantation explained in this article and we will explain it better below.

Digging the solution

Class TypeTokenTree is a helper class which returns the Class of the n-th generic parameter. In the line
this.typeT = new TypeTokenTree(this.getClass()).getElement(0);

We are building an instance of TypeTokenTree, passing the actual class of the current instance and asking for the 0-th generic type parameter.

Please observe what we've written in bold: the actual class of the current instance may be or may not be MyClass. Got it? Observe that the actual class of the current instance will not be MyClass if you employed an anonymous call. In this case, i.e: when you have an anonymous call, javac generates code which keeps generic type information available in the bytecode. Notice that:
TypeTokenTree fails when a non-anonymous call is done!

This is OK. Actually, there's no way to be anything different from that!. It's application's responsibility to recover from such situation.

In the references section below you can find links to class TypeTokenTree and another class it depends on: TypeToken. These files are implemented as part of JQuantLib and contain code which is specific to JQuantLib and may not be convenient for everyone. For this reason, we can see below modified versions of these classes which aims to be independent of JQuantLib and aims to explain in detail how the aforementioned incantation works.

First of all, you need to have a look at method getGenericSuperclass from the JDK. This method is basically the root of the incantation and it basically traverses data structures created in the bytecode by javac. These data structures provide type information regarding the generic types you employed. In general, getGenericSuperclass returns null, which means that the current instance belongs to a non-anonymous class. In the rare circumstances you employ anonymous classes, getGenericSuperclass will return something different of null. And this is how we do this magic.

When getGenericSuperclass does not return null, you have opportunity to traverse the data structure javac created in the bytecode and you can discover what was available at compile time (finally!), effectively getting rid of type erasure.
static public Type getType(final Class<?> klass, final int pos) {
    // obtain anonymous, if any, class for 'this' instance
    final Type superclass = klass.getGenericSuperclass();

    // test if an anonymous class was employed during the call
    if ( !(superclass instanceof Class) ) {
        throw new RuntimeException("This instance should belong to an anonymous class");
    }

    // obtain RTTI of all generic parameters
    final Type[] types = ((ParameterizedType) superclass).getActualTypeArguments();

    // test if enough generic parameters were passed
    if ( pos < types.length ) {
        throw RuntimeException(String.format(
           "Could not find generic parameter %d because only %d parameters were passed",
              pos, types.length));
    }

    // return the type descriptor of the requested generic parameter
    return types[pos];
}

Pros and cons

The big benefit of employing Type Tokens is that the code becomes less redundant, I mean:
MyClass<Double> myClass = new MyClass<Double>() { };
... is absolutely enough. You don't need anything like this:
MyClass<Double> myClass = new MyClass<Double>(Double.class);
On the other hand, the code also becomes obcure, because failing to remember to add the anonymous block will end up on an exception thrown by class TypeToken.
MyClass<Double> myClass = new MyClass<Double>() { }; // succeeds
MyClass<Double> myClass = new MyClass<Double>();     // TypeTokenTree throws an Exception

The point is: this technique is not widely advertised and most developers never heard that this could be done. If you are sharing your code with your peers, contributors or clients, chances are that you will have to spend some time explaining the magic the code does. In general, developers forget to make the call properly, which leads to failures at runtime, as just explained above.

There's also a small performance penalty imposed when TypeToken is called, once this information may be available at compile time and javac can simply write it down straight away when you call
MyClass<Double> myClass = new MyClass<Double>(Double.class);

Test Cases

OK. Now you visited the theory, you'd like to see how this thing really works. Below you can find some test cases which exercise classes TypeToken and TypeTokenTree'. These test cases cover some varied scenarios and they should be enough to illustrate how the techniques explained here can be used in the real world.


References





If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website.

Thanks

Richard Gomes 20:16, 3 January 2011 (GMT) [Date of the original article ]

Tuesday, 3 December 2013

Configuring 2 static IPs with fibre PPPoE with Eclipse Internet : MTU size issue


THIS IS A DRAFT *  THIS IS A DRAFT * THIS IS A DRAFT * THIS IS A DRAFT


This blog entry describes difficulties and solutions related to PPPoE, in particular issues related to MTU size and iptables configurations.



I've recently moved from BeThere to Eclipse Internet. It was a long marriage with Be, for 5 years, but I had to go away for technical reasons.

In a matter of a few days I've got a new Zyxel NBG4604 from Eclipse. The router is IPv6 capable, which was a surprise to me, since I've researched about the modem and I haven't found anything mentioning IPv6. Eclipse still do not offer IPv6, but you already have IPv6 even without knowing it, if you have static IPs.

IPv6 apart, I stumbled with a much simpler thing: an annoying issue which happens on certain websites, leading to sluggish performance. So, what's the point of migrating to fibre if navigation is seriously impacted?

But, what the issue is? and what causes it?


The sluggish fiber connection

The issue is that some websites fail to load properly into the browser. This is an example: github.com employs avatars from gravatar.com . It happens that github seems to "work fine", whilst gravatar "fails" to load. The browser keeps trying to load something from gravatar and stays there, trying and trying and the request is never completed. This jeopardizes navigation of github, which is a primary source of concern to me.

Long story short, the issue is related to PPPoE (PPP over Ethernet), which is basically the authentication layer which is employed by many ISPs, including Eclipse. Explaining in slightly deeper detail, it is necessary to adjust a parameter called MTU size when you have PPPoE due to technicalities you can find more details here.

I've opened a ticket with Eclipse Internet asking for the recommended MTU size, which they promptly responded. But there are some more details involved, as I explain below.

If I were using my Zyxel router like any regular end user, behind the fibre modem... I suppose I would not have any trouble. But I've decided to employ a Debian box as my main router / firewall. It basically means that I need to configure it properly, understand some technicalities I wouldn't care otherwise.


Path to solution

Long story short, I've configured these things:
  •  PPPoE
  •  /etc/network/interfaces with 2 static IPs (or multiple static IPs)
  •  MTU size
  •  iptables rules related to MTU size
I've connected the fibre modem to my NIC eth0 whilst the other NIC eth1 is connected to the LAN.

The NIC which faces the Internet, via the fibre modem has to acquire 2 IPs from the ISP. The first IP is acquired when the PPPoE layer authenticates and negotiates stuff with the ISP side. The additional IP address need to be configured after the first one, not requiring any special negotiation by the PPPoE layer.

OK. See below my /etc/networks/interfaces (with some fake addresses):

auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual
    address   192.168.2.2
    netmask   255.255.255.0

auto dsl-provider
iface dsl-provider inet ppp
    pre-up    /sbin/ifconfig eth0 up
    post-down /sbin/ifconfig eth0 down
    provider  dsl-provider
    address   82.111.111.111

    netmask   255.255.255.252
    post-up   sleep 7 ; \
              gw=$( /sbin/ifconfig ppp | \

                    head -2 | tail -1 | \
                    sed -E 's/(.*P-t-P:)([0-9.]+)( .*)/\2/' ) ; \
              echo "Define default gateway $gw" ; \
              /sbin/route add default gw $gw ppp0 ; \
              echo Bringing up ppp0:1 ; \
              /sbin/ifconfig ppp0:1 82.111.111.112 netmask 255.255.255.252; \
              echo Bringing up eth1 ; \
              /sbin/ifconfig eth1 up; \
              echo Disable IP forward for security reasons; \
              echo 0 > /proc/sys/net/ipv4/ip_forward
    pre-down  echo Bringing down eth1 ; \
              /sbin/ifconfig eth1 down ; \
              echo Bringing down ppp0:1 ; \
              /sbin/ifconfig ppp0:1   down


The important bits are:

* eth0 must be left without any IP configuration because it will be employed by PPPoE in order to talk with the fibre modem.
* eth1 can be configured with a LAN IP, but you should not bring it up until you define the default gateway, which will be some IP address on the ISP side.
* you really don't know what the default gateway is, until the moment the connection is stablished with your ISP because this IP can change and will probably change every time you disconnect and connect again.
* interface ppp0, despite not configured by you, will be configured for you when dsl-provider brings up.


More details

Make a backup copy of /etc/network/interfaces as it is presented in the section above. Observe that, when you install PPPoE, it will change your configuration. But I already told you how it should be. So, make a backup copy!

$ cp /etc/network/interfaces /etc/network/interfaces.BACKUP
$ sudo apt-get install pppoe pppoeconf -y

During the installation, pppoeconf runs and tries to find your fibre modem. Make sure you connected eth0 with your modem.

It will ask your username and password, required to authenticate against the ISP. Eclipse sent me a letter with this stuff, but it is also available in the Connection Manager page.

When PPPoE runs, it changes your /etc/networks/interfaces. Have a look at it and see what happened. If you are following this recipe the way I describe, you will see that we had already configured everything which is needed in our version of /etc/networks/interfaces. Simply restore the backup copy.

$ cp /etc/networks/interfaces.BACKUP /etc/networks/interfaces

Just a reminder: make sure you put your 2 (or more) IPs as mentioned in the Connection Manager page into your /etc/networks/interfaces, in interfaces ppp0:1, ppp0:2, .... as much as you have static IPs, remembering that your first IP must go to ppp0 itself.

In other words, I've put the line below somewhere into my /etc/networks/interfaces:

  /sbin/ifconfig ppp0:1 82.111.111.112 netmask 255.255.255.252

If you have more that 2 static IPs, you will be interested on configuring additional virtual interfaces.


Now try to connect to your ISP:

$ sudo ifup dsl-provider

You should see something like this:

$ sudo ifconfig ppp0
ppp0      Link encap:Point-to-Point Protocol 
          inet addr:82.111.111.111 P-t-P:82.153.1.65  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1478  Metric:1
          RX packets:1200866 errors:0 dropped:0 overruns:0 frame:0
          TX packets:748261 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3
          RX bytes:1633001082 (1.5 GiB)  TX bytes:59749711 (56.9 MiB)


Now verify if your second IP is connected:

$ ifconfig ppp0:1
ppp0:1    Link encap:Point-to-Point Protocol 
          inet addr:82.111.111.112  P-t-P:82.111.111.112  Mask:255.255.255.252
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1478  Metric:1

There are 2 important aspects to be noted at this point:

1. You may see that a netmask is 255.255.255.255, which means our IP is connected one-on-one to an IP on the ISP side. Well, this is what point-to-point means, and it makes sense! But all netmasks should honour what we have configured in /etc/networks/interfaces, which does not seem to be the case. We will address this issue later.

2. The MTU size is 1478, which is a recommended value I've got from Eclipse. Chances are that you are seeing some other value. No worries, we will address this issue later.

Let's dive a bit into these aspects in the next sections.


Interface configuration

The interface ppp0 happens to be wired to "P-t-P:82.153.1.65" in this case in particular. Actually, every time you connect you may potentially connect to a different IP on the ISP side. It means that you cannot assume that a certain IP in particular is your default gateway permanently. In our /etc/network/interfaces we find dynamically what is the default gateway we need to configure:

      ...
      gw=$( /sbin/ifconfig ppp | \
            head -2 | tail -1 | \
            sed -E 's/(.*P-t-P:)([0-9.]+)( .*)/\2/' ) ; \
      /sbin/route add default gw $gw ppp0 ; \

      ...

The netmask of ppp0 should be actually 255.255.255.252. This is the value I said it should be in /etc/networks/interfaces, but it is stubborn and insists on 255.255.255.255. It's possibly an issue I still need to fix on PPPoE configuration.
PENDING: I said before we would be addressing this issue. Well, not yet :( ... I still need to figure out how this can be done.

My ppp0:1 is a virtual interface which is configured with my second static IP address. Observe that it is connected with itself "P-t-P:82.111.111.112", which does not look to be correct. It should be connected to some IP in the vicinity of "P-t-P:82.153.1.65", which is the IP ppp0 is currently connected to.
PENDING: I still need to fix this!


The netmask of ppp0:1 is already 255.255.255.252, which honours the configuration I've put on /etc/networks/interfaces. This is good.

Note: despite of pending items, lots of things are working just fine here.


Static routes

Now have a look at the static routing table:

$ sudo route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         82.153.1.65     0.0.0.0         UG    0      0        0 ppp0
82.111.111.111  0.0.0.0         255.255.255.252 U     0      0        0 ppp0
82.153.1.65     0.0.0.0         255.255.255.255 UH    0      0        0 ppp0
192.168.2.0     0.0.0.0         255.255.255.0   U     0      0        0 eth1

PENDING: I still need to fix the netmask marked in red.
PENDING: I still need to accomodate ppp0:1 in the routing table!

The important bits are:

1. Flags UG means that this route is the default gateway. This route must be associated to interface ppp0 and must be associated with the IP address given by the ISP at PPPoE negotiation time. This was already explained in the section above.

2. Flags UH means that a given route is a host route, i.e: a route to talk to a single host in particular. In this case, interface ppp0 is responsible for talking to the IP address given by the ISP at PPPoE negotiation time.These flags are set by ppp, since it creates a point-to-point connection to a specific host on the ISP side.

3. I still need to configure ppp0:1 and make it appear in the routing table. The way it is at the moment "works" and I can even ping this address from outside, but it actually routes via ppp0, which is an additional hop, which adds some latency.
PENDING: I still need to accomodate ppp0:1 in the routing table!


MTU size configuration

Long story short, it's necessary to configure the MTU size in order to accomodate some information with is hanging on each packet of data when you are using PPPoE. The actual value of MTU size may vary under different circumstances and may even depend on what you have on your side of the connection. But let's keep it simple at this point and simply stick to the value Eclipse informed me to employ, which is 1478.

Keeping it simple, all you have to do is edit your /etc/ppp/peers/dsl-provider and make sure you have a block like this:

    connect /bin/true
    noauth
    persist
    mtu 1478


Then reconnect, making sure you release everything before connecting again:

$ sudo ifdown dsl-provider; sudo poff ; \
   sudo ifconfig eth1 down; sudo ifconfig eth0 down; \
   sudo ifup dsl-provider


Try to navigate to websites like http://github.com and see if the browser successfully retrieves everything, completing the request in a few seconds. If the icon keeps rolling and rolling in the browser's location bar... this is not a very good sign.

Note: Actually, chances are that this test will not work very well if you have a configuration similar to mine, I mean: you are using your Linux box as a router and/or firewall. This leads us to the next section.


iptables configuration

I have a firewall based on iptables running on my Debian box. I'm definitely not a network engineer and I'm not willing to become one, but I managed to configure my firewall relatively easily using a software called fwbuilder. It took me some time to get used to how things work... but, as I said, it can be done relatively easily if you know some basics of TCP/IP. No need to hire a network engineer ;-)

Long story short, if you put this below in the epilog script of your firewall configuration, you will be telling iptables to clamp MSS to MTU.


echo "Running epilog script"
# This is needed for NAT on ppp0
$IPTABLES --table nat --append POSTROUTING --out-interface ppp0 -j MASQUERADE

# This is needed for hosts in DMZ-10 to accept requests
$IPTABLES --append FORWARD --in-interface virbr1 -j ACCEPT

# http://adsl.cutw.net/mtu.html
# http://www.tldp.org/HOWTO/IP-Masquerade-HOWTO/mtu-issues.html
# http://www.cisco.com/en/US/tech/tk175/tk15/technologies_tech_note09186a0080093bc7.shtml

$IPTABLES -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
$IPTABLES -A OUTPUT  -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
$IPTABLES -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu


# Now it's safe to enable IP forward
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "Epilog script done"



At this point, you must be curious about what virbr1 is in the block above, which is something which appeared from nothing in this discussion. It's explained in the next section.

You may also be curious about what the hell clamping MSS to MTU means. Look for more information in section References below.


Bootstrapping virtual servers

My Debian box is at the same time the router, the firewall and it hosts several virtual machines, mimicking a typical DMZ scenario, with virtual machines facing the Internet and virtual machines facing the LAN.

When my server comes up, it first brings up the interface dsl-provider, because it's marked as auto in /etc/networks/interfaces:


      ...
      auto dsl-provider
      iface dsl-provider inet ppp

      ...

... then it starts the virtual machines, then it starts the firewall.

For virtual machines, I use virt-manager , which provides among other things, the service /etc/init.d/libvirt-guests, which is responsible for bringing up all virtual machines. This process also creates subnets for those virtual machines facing the Internet and for those machines facing the LAN.

When I start the firewall, all static routing is already defined, whatever if interfaces involved are physical, related to ppp or related to virt-manager. All that the firewall needs to do is enforce security on these routings and make sure that certain requests arriving from the Internet are properly routed to certain virtual servers sitting on the subnet which should face the Internet, which in my case is virbr1. It's also necessary to remember to enable ip_forward in the kernel.

But we still need to make sure that the firewall starts after services provided by virt-manager. The way to do this is to put something like this below in your firewall configuration, in the editor tab of your firewall definition.

#### BEGIN INIT INFO
#Provides:       firewall
#Required-Start: $network $remote_fs $syslog libvirt-bin libvirtd libvirt-guests
#Required-Stop:  $network $remote_fs $syslog
#Default-Start:  2 3 4 5
#Default-Stop:   0 1 6
#Description:    firewall rules
#### END INIT INFO


When fwbuilder deploys the firewall rules onto your server, it will create /etc/init.d/firewall with a header like shown below, which is is what you is needed in order to provide dependency information between services. Some spaces are stubbornly added by fwbuilder, but it works without need to edit anything by hand, which is great.

# ### BEGIN INIT INFO
# Provides:       firewall
# Required-Start: $network $remote_fs $syslog libvirt-bin libvirtd libvirt-guests
# Required-Stop:  $network $remote_fs $syslog
# Default-Start:  2 3 4 5
# Default-Stop:   0 1 6
# Description:    firewall rules
# ### END INIT INFO



Conclusion

If you followed this article, you will be probably able to connect to Eclipse Internet without any router between your Debian box and your modem. You will also be able to tackle problems related to slowness which happens due to misconfigured parameters related to ppp and iptables.

I hope this article is useful and please let me know if you find it incomplete, misleading or wrong. If you have suggestions or something to complement, please let me know :)


References

http://adsl.cutw.net/mtu.html
http://www.tldp.org/HOWTO/IP-Masquerade-HOWTO/mtu-issues.html
http://www.cisco.com/en/US/tech/tk175/tk15/technologies_tech_note09186a0080093bc7.shtml




-------

If you found this article useful, please consider sharing it or linking to it.



THIS IS A DRAFT *  THIS IS A DRAFT * THIS IS A DRAFT * THIS IS A DRAFT
 

Friday, 29 November 2013

Ready for Python development with Emacs in just 60 seconds

This post demonstrates how you can configure a very decent environment for Python development with Emacs, in just 60 seconds.

A dream is now true: The first time you start Emacs, it automagically downloads and configures all plugins you need.  Emacs is then just ready to work and you can start typing code immediately. 

For the impatient


1. Save configuration files you eventually have!

  $ cd $HOME 
  $ tar cpf dot-emacs.ORIGINAL.tar.gz .emacs .emacs.d
  $ mv .emacs   dot-emacs.ORIGINAL
  $ mv .emacs.d dot-emacs.d.ORIGINAL

2. Remove any Emacs configuration files you eventually have.

  $ rm -r -f .emacs .emacs.d

3. Install Python libraries

This should be done preferably inside a virtual environment.

   $ workon py276 #-- py276 is a virtualenv I'm using
   $ pip install epc
   $ pip install jedi
   $ pip install elpy

4. Download my .emacs file onto your home folder.

  $ cd $HOME 
  $ wget https://raw.github.com/frgomes/dot-emacs/master/dot-emacs.el
  $ ln -s dot-emacs.el .emacs

4. Start emacs. It will configure itself when it first run!

  $ emacs test.py

Features in a nutshell

  • python-mode, cython-mode and nxml-mode: ditto
  • jedi: provides auto completion
  • flymake: highlight syntax errors as you type

Contribute


Please let me know if you find issues. In particular, I don't have Windoze boxes, so the automagic configuration thing was never tested on it.

This script was designed to run on Emacs 23 onwards but only tested on Emacs 24. Please let me know if you find issues.

Please point out typos and bad English.

You can also suggest plugins or tools I missed. This is very much appreciated and may benefit my workflow as well :)
So... thanks a lot for your suggestion!

Known Issues

If you are behind firewall, you may (or may not) face download problems which involves HTTPS protocol. As far as I know, this is a bug on a third party library which Emacs depends on.

If Emacs opens the message window and vomits hundreds of errors coming from file cython-mode.el ... that's because your proxy server refused the https request and returned an error message in HTML. It's easy to fix this issue:

  $ cd $HOME/.emacs.d/plugins
  $ rm cython-mode.el
  $ wget https://raw.github.com/cython/cython/master/Tools/cython-mode.el

Chances are that now cython-mode.el is OK, since wget performs the request the way it needs to be in order to work properly.


That's it: ready for coding in 60 seconds :)

Cheers

-- Richard Gomes

Versioning your /etc under Bazaar in Debian Wheezy

Suppose you've installed some packages on your Linux box, made some configurations and messed your system. Pretty bad! You would like to revert all changes to a previous working state, isn't it? This post explains how you can employ Bazaar in Debian boxes to do that.

The idea is pretty simple: put your /etc into some sort of source control system, like git, bzr, hg or darcs. There's a tool which does just that: etckeeper.

We will also create two other repositories, for high availability purposes: one in the same computer and another remotely.

Let's start: install etckeeper. You will see that /etc/.git is created, because git is the default choice for etckeeper.

Install etckeeper

    $ apt-get etckeeper -y
    $ ls -ald /etc/.git

I prefer Bazaar. So, let's get rid of this .git repository:

    $ etckeeper uninit
    $ ls -ald /etc/.git

Using bzr instead of git

Now it's time to reconfigure etckeeper so that it will user Bazaar instead. Simply edit /etc/etckeeper/etckeeper.conf in order to look like below:

    $ head -5 /etc/etckeeper/etckeeper.conf
    # The VCS to use.
    #VCS="hg"
    #VCS="git"
    VCS="bzr"
    #VCS="darcs"

Let's now install Bazaar and create the repository again, this time using Bazaar.

    $ apt-get install bzr -y
    $ etckeepet init
    $ ls -ald /etc/.bzr

Your first commit

Review what was done and perform your first commit.

    $ cd /etc
    $ bzr status | less
    $ bzr commit -m 'first commit'
    $ bzr status

Create a second repository on a separate hard drive

It's now time to create a copy on the same computer, preferably on a separate hard drive:

    $ cd /srv
    $ bzr init --no-tree etckeeper
    $ cd /etc
    $ bzr config push_location=/srv/etckeeper
    $ bzr push

Create a third repository on a server box

It's also a good idea to create a remote repository, like the example below:

    $ ssh myself@server
    $ COMPUTER=penguim
    $ mkdir -p /srv/bzr/etckeeper
    $ bzr init --no-tree /srv/bzr/etckeeper/${COMPUTER}
    $ exit


Back to your local computer, push a copy from your second repository (the one on your second hard drive) onto the remote server:

    $ cd /srv/etckeeper
    $ COMPUTER=$( hostname )
    $ bzr config push_location=bzr+ssh://myself@server/srv/bzr/etckeeper/${COMPUTER}
    $ bzr push


That's it :)

-- Richard Gomes

Monday, 28 January 2013

A better implementation of Visitor pattern

The Visitor pattern is possibly the most complicated design pattern you will face. Not only explanations, implementations and examples you may find in the Internet are confusing in general and many times divergent from one another, but also the definition of what the Visitor Pattern is can be many times obscure and rarely explained properly with examples and applications in the real world.

Visitor Pattern: which one?



In fact, several variations of what we call Visitor Pattern do exist. Many times these variations are simply called "Visitor Pattern", without suffixes which would be necessary to differentiate them. This certainly causes confusion. In addition, lots of articles about this pattern fail to explain properly what it is, how it works and how it should be employed in real world applications. Add to that some intrinsic complexity this pattern involves and sometimes confusion caused by the nomenclature or definitions adopted.

This article aims to present a clear, non-ambiguous and non-misleading definition of what the Visitor pattern is, with examples in the real world. We will demonstrate how a pair of interfaces is plenty enough for solving a vast range of problems, without any need of any modification in this pair of interfaces in order to accommodate unforeseen situations. Complex use cases can be addressed by defining additional interfaces, which extend the original concept, but without changing the original pair of interfaces in any way.

Definition

From Wikipedia we read: In object-oriented programming and software engineering, the Visitor Design Pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations to existing object structures without modifying those structures.

Let's clarify a little bit this definition: In object-oriented programming and software engineering, the Visitor Pattern is a way of separating a computation from the data structure it operates on.

In particular, we will employ the word computation instead of algorithm because the former is more restrictive and more specific than the later. This article shows that obtaining elements from a data structure also involves an algorithm, which could cause confusion. For this reason, it's better to simply avoid the word algorithm entirely.

Interpretation

The definition says that the Visitor Pattern separates concerns, i.e:
  •  on one hand you have data access which knows how a data structure can be traversed;
  •  on the other hand you are interested on a certain computation you need to be executed.

Besides, there's a very important restriction which is: you cannot change the original data structure. In other words:
  •  the data access knows the data structure but does not know anything about the computation to be performed;
  •  the computation knows how to operate on a certain element of the data structure but doesn't know anything about the data access needed to traverse the data structure in order to obtain the next element, for example;
  •  you are not allowed to amend the data structure in any way: imagine a data structure which belongs to a third-party library, which you don't have the source code;

A key point here is the introduction of the concept of data access. In other words, we are saying that the data structure is not accessed directly, but via some sort of data access logic. This will become clear later.

Use cases

Let's invent scenarios which slowly evolve in complexity. Our aim is to demonstrate why the Visitor Pattern is needed and how it should behave and operate.

Scenario 1: simple array


Suppose we have a very simple array of Integers. We need to visit all elements and calculate the sum of all these elements.

Anyone will immediately think on something like below, isn't it?

public int sumArray(Integer[] array) {
    int sum = 0;
    for (int i=0; i<array.length; i++) sum += array[i];
    return sum;
}


Yes. It solves the problem. Actually, the problem is so simple that you must be asking yourself why we need anything more complicated than a simple one liner. Isn't it?

OK. Let's go ahead to another use case.

Scenario 2: different data structures


This scenario is pretty similar to the previous one, but now we can have either a Integer[], a List<Integer> or a Map<T,Integer>. In this case, one immediately think on something like this:

public int sumList(List<Integer> list) {
    int sum = 0;
    for (int i=0; i<list.size(); i++) sum += list.get(i);
    return sum;
}

public <T> int sumMap(Map<T, Integer> map) {
    int sum = 0;
    for (Entry<T, Integer> key : map.entrySet()) {
    sum += map.get(key);
    }
    return sum;
}


OK. In this use case we can start to understand that there are different structures: Integer[], List<Integer> and Map<T,Integer> but, in the end, all you need to do is obtain an individual Integer from whatever data structure it is and add it to variable sum.

Let's complicate things a little more now.

Scenario 3: different data structures and several computations


This scenario is pretty similar to the previous one, but now we have several different computations: sum of all elements, calculate mean value of them, calculate the standard deviation, tell if they form an arithmetical progression, tell if they form a geometric progression, tell if they are all prime numbers, etc, etc, etc.

Well, in this case, you can figure out what will happen: you will face a proliferation of methods. Multiply the quantity of computations you are willing to provide by the number of different data structures you are willing to support. This proliferation of methods may not be desirable.

The point is: it does not make sense to implement the same logic over and over again just because the data structure is slightly different. Even when data structures are very different, what matters is that we are willing to perform a certain computation on all elements stored in it, does not matter if it is a plain array, a map, a tree structure, a graph or whatever.

The computation only needs to know what needs to be done once an element is at hand; the computation does not need to know how an element can be obtained.

Another aspect is that it may not be desirable or may not be convenient to touch existing code, or existing data structures, to be more precise.

Yet another aspect: Imagine that a certain computation requires additional variables, possibly associated to each element of the original data structure (or several data structures). These additional variables are only needed when you are performing the computations we described; they are not needed anywhere else. In certain constrained scenarios, you will not be allowed to waste memory like this.

Scenario 4: do not touch data structures

Imagine now that you have the problem proposed by scenario 3, but you cannot touch any data structures. Imagine that it's a legacy library in binary format; imagine that you don't have the source. You simply cannot add a single field anywhere in order to keep partial computations or keep response values. You cannot change the provided source code in any way simply because you don't have it. You obviously cannot add any methods intended to perform the computation.

What would be a solution in this case? Well... there's only one: you will have to implement something, whatever it is, outside the provided legacy library.

Conclusion of Use Cases

You may not be allowed to change the original source code in regards to the data structure (or data structures) you have. Or this may not desirable, like in the case of proliferation of methods we mentioned before. This may not be even possible, like in the case of the provided legacy library. This is definitely the most important aspect of the Visitor Pattern: you are not allowed to change the original data structures in any way.

This aspect is commonly misunderstood by many people posting on the Internet and even in many academic works. It's very common to see implementations where data structures are changed, like when the object model is changed; or data structures suddenly start to implement new interfaces, etc.

If you find articles proposing changes in the original data structures, does not matter how, you can be sure that these articles are misunderstanding what Visitor Pattern really means.

Introducing the Visitor Pattern

It's time to present how the Visitor Pattern works.

The Visitor Pattern is defined as a pair of interfaces: a Visitor and a Visitable. There's also some additional expedient your code needs to perform when you call implementations of these interfaces.

Let's learn by example, showing how it can be applied to some real world situations.

A naive Visitor

Remember: a visitor defines a computation

The idea here is that we need an interface which defines how a computation works in respect to a single data element. It does not matter how data elements are organised in the data structure at this point.

Once the interface is defined, we then define a class which performs the computation. Because the interface does not know anything about how data elements are organised in data structures, the implementing class does not have any dependency on how data is organised. The class only cares about details strictly related to how the computation needs to be done; not what needs to be done in order to obtain data elements.

For the sake of brevity, lets present the Visitor interface and demonstrate only one of the several computations we may be interested:

public interface Visitor<T> {
    public void visit(T element);
}

private class Sum<T extends Integer> implements Visitor<T> {
    private int sum = 0;

    @Override
    public void visit(T element) {
        sum += element;
    }

    public T value() {
        return sum;
    }
}


The interface Visitor defines a method which is responsible for receiving a single element from the data structure, whatever data structure it is. Classes which implement the Visitor interface are responsible for defining the computation, receiving one element at a time, without caring about how such element was obtained in the first place.

A naive Visitable

Remember: a Visitable defines data access

The idea is that we need an interface which defines a how data can be obtained in general.

Once the interface is defined, we then define a class which knows how data can be obtained from a given data structure in particular. It means to say that, if we have several different data structures, we will need several classes implementing the Visitable interface, one class for each data structure involved.

Notice that nothing was said about the computation. It's not responsibility of interface Visitable anything involving the computation: it only cares about how single data elements can be obtained from a given data structure.

For the sake of brevity, lets present the Visitable interface and demonstrate only one of the several possible data access expedients we may eventually need:

public interface Visitable<T> {
    public void accept(Visitor<T> v);

}

private class VisitableArray<T extends Integer> implements Visitable<T> {
    private final T[] array;

    public VisitArray(final T[] array) {
        this.array = array;
    }

    @Override
    public void accept(Visitor<T> v) {
        for (int i=0; i<array.length; i++) {
            v.visit(array[i]);
        }
    }
}


The interface Visitable defines a method which is responsible for accepting an object which implements interface Visitor: a Visitable does not know anything about the computation, but it knows how a single data element must be passed to another class which is, in turn, responsible for performing the computation.

Classes implementing the Visitable interface are responsible for traversing the data structure, obtaining a single data element at a time. Also, once a single element is obtained, the class knows how another step of the computation can be triggered for the current data element at hand.

Putting it all together

Your code must now perform the following steps:
  •  instantiate a Visitor;
  •  instantiate a Visitable
  •  bootstrap a Visitable with a Visitor;
  •  request the result of the computation;
It looks like this:

public int computeSumArray(final Integer[] array) {
    // instantiate a Visitor (i.e: the computation)
    final Visitor<Integer>   visitor   = new Sum<Integer>();

    // instantiate a Visitable (i.e: the data access to the data structure)
    final Visitable<Integer> visitable = new VisitArray<Integer>(array);

    // bootstrap a Visitable with a Visitor
    visitable.accept(visitor);

    // returns value computed by the Visitor
    return ((Sum)visitor).value();
}


Now imagine you'd like to calculate the mean of an array of Integers. All you need to do is defining a class called Mean and reuse everything else, like this:

public int computeMeanArray(final Integer[] array) {
    // instantiate a Visitor (i.e: the computation)
    final Visitor<Integer>   visitor   = new Mean<Integer>();

    // instantiate a Visitable (i.e: the data access to the data structure)
    final Visitable<Integer> visitable = new VisitArray<Integer>(array);

    // bootstrap a Visitable with a Visitor
    visitable.accept(visitor);

    // return value computed by the Visitor
    return ((Mean)visitor).value();
}


Now imagine you'd like to obtain a Mean from a List<integer> instead. All you need to do is defining this specific data access for this data structure and reuse everything else, like this:

public int computeMeanList(final List<Integer> list) {
    // instantiate a Visitor (i.e: the computation)
    final Visitor<Integer>   visitor   = new Mean<Integer>();

    // instantiate a Visitable (i.e: the data access to the data structure)
    final Visitable<Integer> visitable = new VisitList<Integer>(list);

    // bootstrap a Visitable with a Visitor
    visitable.accept(visitor);

    // return value computed by the Visitor
    return ((Mean)visitor).value();
}

Extended data structures

Now imagine a situation where you need to aggregate fields to a provided legacy data structure, where you don't have access to the source code.

For the sake of simplicity, let's assume there's a way to unmistakably identify data elements: for arrays, it could be employing an index, for maps, trees and more complex data structures, it could be employing some sort of key.

In the example below we'd like to take coordinates of capitals of countries whilst our data structure only contains country names but do not contain coordinates of their capitals. The solution consists on retrieving coordinates as part of the method accept. In the real world, it may be useful to keep additional fields stored in a data structure which keeps complimentary information relative to the original data structure. This is what we show below:

private class VisitableCapitalCoordinates<T extends Coord> implements Visitable<T> {
    private final T[] countries;
    private final Map<T,Coordinate> capitals; // for storage/caching, if required


    public VisitableCapitalCoordinates(
            final T[] countries; 
            final /* @Mutable */ Map<T,Coordinate> capitals) {
        this.countries = countries;
        this.capitals  = capitals;
    }

    @Override
    public void accept(Visitor<T> v) {
        for (int i=0; i<array.length; i++) {
            String country = countries[i];
            T coord = capitals.put(country, coord(country)) 
            v.visit(coord);
        }
    }


    //
    // private method: not defined in the interface
    //

    private Coordinate coord(T country) {
       return obtainCoordinateOfCapitalSomehow(country);
    }

}


In the example above, the most important changes happen in the class constructor and in private methods. The method accept keeps its original signature and only passes an element of the extended data structure, instead of passing an element of the original data structure.

In the method constructor, we receive two data structures: the original one and the extended one which must be previously created. We show this in the example below:

    final String[] countries = { "USA", "United Kingdom", "France", "Holland", "Belgium" };
    final Map<T,Coordinate> capitals = new HashMap<T,Coordinate>();
     
    // instantiate a Visitor (i.e: the computation)
    final Visitor<Coordinate>   visitor   = new PlotCoordinate<Coordinate>();

    // instantiate a Visitable (i.e: the data access to the data structure)
    final Visitable<Coordinate> visitable = new VisitableCapitalCoordinates<Coordinate>(countries, capitals);

    // bootstrap a Visitable with a Visitor
    visitable.accept(visitor);


We've seen in the previous section that our naive implementation of the Visitor Pattern
  •  has an interface Visitor which provides the computation;
  •  has an interface Visitable which provides the data access;
  •  does not require any modification on existing data structures.

Now let's complicate a little bit more this scenario.

Adding polymorphism

Let's imagine that an investor has an investment portfolio composed of several financial instruments, such as assets, options, bonds, swaps and more exotic financial instruments. Let's suppose we would like to forecast what would be the payoff of this portfolio given certain conditions.

The problem we are facing now is the fact that the data structure is not uniform anymore, like we have seen previously above. Now we have a portfolio which contains several different objects in it. We can think that a portfolio is actually a tree made of several different kinds of nodes in it. Several nodes are leaf nodes, such as an asset, an option or a bond. Some other nodes are not leaf nodes: they aggregate other nodes under it, such as swaps and exotic instruments.

What this use case adds to the initial problem is the fact that now there's not single type <T> which represents all nodes. To be more precise: even if all nodes derive from a base type , we still need to make sure we are handling a certain node properly, according to its type and also we are computing its value properly, according to the type hierarchy. For example:

Object
    Payoff
        ForwardTypePayoff
        NullPayoff
        TypePayoff
            StrikedTypePayoff
                AssetOrNothingPayoff
                CashOrNothingPayoff
                GapPayoff
                PlainVanillsPayoff


At first glance, we can think that we could define a pair of classes PolymorphicVisitor and PolymorphicVisitable which take the node type as a generic parameter. Unfortunately, this is not the case, in spite it makes a lot of sense.

Once Java Generics applies type erasure in general, our application will have troubles to decide between a PolymorphicVisitor<ForwardTypePayoff> and PolymorphicVisitor<NullPayoff> for example, because both will have types erased and will become PolymorphicVisitor<Object>. Going straight to the solution, without spending any more time explaining all intricacies, we need something like this:

public interface PolymorphicVisitor {
    public <T> Visitor<T> visitor(Class<? extends T> element);
}


The interface above provides a method visitor which is responsible for providing the actual Visitor for the element data. So, once you have the actual Visitor, you can delegate to the actual visitor. If the actual visitor was not found, you can try again using the PolymorphicVisitor of the base class, and so on, like this:

@Override
public void accept(final PolymorphicVisitor pv) {
    final Visitor<FixedRateBondHelper> v = (pv!=null) ? pv.visitor(this.getClass()) : null;
    if (v != null) {
        v.visit(this);
    } else {
        super.accept(pv);
    }
}


We also need to define interface PolymorphicVisitable, which works in conjunction with PolymorphicVisitor, like this:
public interface PolymorphicVisitable {
    public void accept(PolymorphicVisitor pv);
}


Notice that the actual visitor needs to be obtained given the class type informed, like shown below:

@Override
public <CashFlow> Visitor<CashFlow> visitor(Class<? extends CashFlow> klass) {

    if (klass==org.jquantlib.cashflow.CashFlow.class) {
        return (Visitor<CashFlow>) new CashFlowVisitor();
    }
    if ( klass == SimpleCashFlow.class) {
        return (Visitor<CashFlow>) new SimpleCashFlowVisitor();
    }
    if (klass==Coupon.class) {
        return (Visitor<CashFlow>) new CouponVisitor();
    }
    if (klass==IborCoupon.class) {
        return (Visitor<CashFlow>) new IborCouponVisitor();
    }
    if (klass == CmsCoupon.class) {
        return (Visitor<CashFlow>) new CmsCouponVisitor();
    }
    ...
}

Sample Code

Please branch project JQuantLib from Launchpad:

    $ bzr branch lp:jquantlib


or, alternatively clone from Github:

    $ git clone http://github.com/frgomes/jquantlib


In particular, you are interested on CashFlow hierarchy, under folder jquantlib/src/main/java

org/jquantlib/util/Visitable.java
org/jquantlib/util/PolymorphicVisitable.java
org/jquantlib/util/PolymorphicVisitor.java
org/jquantlib/util/Visitor.java
org/jquantlib/cashflow/Event.java
org/jquantlib/cashflow/FloatingRateCoupon.java
org/jquantlib/cashflow/Coupon.java
org/jquantlib/cashflow/CmsCoupon.java
org/jquantlib/cashflow/CashFlows.java
org/jquantlib/cashflow/FixedRateCoupon.java
org/jquantlib/cashflow/CashFlow.java
org/jquantlib/cashflow/AverageBMACoupon.java
org/jquantlib/cashflow/SimpleCashFlow.java
org/jquantlib/cashflow/CappedFlooredCoupon.java
org/jquantlib/cashflow/Dividend.java
org/jquantlib/cashflow/IborCoupon.java

Conclusion

The implementation of the Visitor Pattern we present here is scalable and flexible. This implementation does not need any interfaces or class methods which are crafted for a certain application in particular.

We showed that specific interfaces and classes are necessary only when there's polymorphism involved. We purposely defined PolymorphicVisitor and PolymorphicVisitable in order to take decisions regarding which should be the correct Visitable for traversing a specific data structure and should be the correct Visitor needed once we have a certain data element in our hands. These decisions are not responsibility of the simple Visitor and Visitable interfaces. For this reason, additional interfaces are definitely needed.

The Visitor Pattern as defined here conforms to the definition presented on the top of this article and does not require any change in existing data structures.

On the other hand, even with advantages gained with our implementation, the Visitor Pattern still keeps its relative high complexity. In general, code which employs the Visitor Pattern becomes obscure and difficult to be understood, which means that documentation is key in order to keep code organised and relatively easy to be understood.

-----------------------------------------------------

If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website. Thanks

[ First published by Richard Gomes on 17:16, 29 January 2011 (GMT) ]

Saturday, 12 January 2013

Implementation of multiple inheritance in Java

In this article we demonstrate how multiple inheritance can be implemented easily in Java.

Sometimes you need that a certain class behave much like two or more base classes. As Java does not provide multiple inheritance, we have to circumvent this restriction somehow. The idea consists of:
  • Create an interface for exposing the public methods of a certain class;
  • Make your class implement your newly created interface;
At this point, extended classes of your given class can now implement the interface you have created, instead of extending your class. Some more steps ate necessary:
  • Supposing you have an application class which needs multiple inheritance, make it implement several interfaces. Each interface was created as explained above and each interface has a given class which implements the interface.
  • Using the delegation pattern, make your application class implement all interfaces you need.
For example, imagine that you have
  • interface A and its implementation ClassA
  • interface B and its implementation ClassB
  • class ClassC which needs multiple inheritance from classes ClassA and ClassB
interface A {
    void methodA();
}

interface B {
    void methodB();
}

class ClassA implements A {
    void methodA() { /* do something A */ }
}

class ClassB implements B {
    void methodB() { /* do something B */ }
}

class ClassC implements A, B {

    // initialize delegates
    private final A delegateA = new ClassA();
    private final B delegateB = new ClassB();

    // implements interface A
    @Override
    void methodA() {
        delegateA.methodA();
    }

    // implements interface B
    @Override
    void methodB() {
        delegateB.methodB();
    }
}


If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website. Thanks

[ First published by Richard Gomes on 00:22, 11 March 2008 (UTC) ]

Strong type checking with JSR-308

In this article we demonstrate that strong type checking sometimes is not enough for the level of correctness some applications need. We will explore a situation where we semantic checking is needed to achieve higher levels of correctness. Then we will explorer how something like this can be done in C++ and how future improvements in JDK will allow us to do it in Java, obtaining better results than C++ can offer.

Problem

Imagine the following code snippet:
  double calc(double rate, double year) {
    return Math.exp(1+rate,year);
  }

  double rate = 0.45;
  double year = 0.5;

  double result1 = calc(rate, year);
  double result2 = calc(year, rate);
Notice the two calls of the method calc. Yet syntactically and semantically correct for the compiler, humans can easily determine that something will probably go wrong. Ideally, we'd like the compiler warn us about the error in order to shorten the development cycle.

The C++ solution

In C++ we can avoid such mistakes by...
typedef double Rate;
  typedef double Year;

  double calc(Rate rate, Year year);
If you are using a powerful IDE, it will tell you what are the correct order of arguments and you will be able to avoid mistakes.
Important: The C++ compiler still does not prevent you to pass arguments in the wrong order. You will not get a compilation error if you pass arguments in the wrong order!

The Java solution

In Java you dont have typedefs. One possible alternative would be defining mutable objects intended to hold primitive types but this solution is very inefficient. I will avoid spending your time visiting all the candidate solutions and ending up showing you they are not adequate for our needs. Let's go directly to what we need:
In the upcoming JDK7, we will have the possibility to use annotations in accordance with JSR-308. In a nutshell, it allows you to use annotations wherever you have a type in your code, and not only on declarations of classes, methods, parameters, fields and local variables. Let's examine an example:
private Double calc(@Rate double rate, @Time double time) {
  return new Double( Math.exp(1+rate, time) );
}

public method test() {
  @Rate double rate = 0.45;
  @Time double time = 0.5;

  // This call pass
  Double result1 = calc(rate, time);

  // This call *should* give us a compiler error
  Double result2 = calc(time, rate);
}
In the above specific example the code compiles fine in JDK6 but does not offer us the strong type checking it will be possible to use with JDK7.
If you have a nice IDE, it will tell you the correct order of parameters, showing you @Rate double instead of simply double.
 
Much like the C++ compiler, which was not able to detect the wrong order or parameters, javac will suffer from the same illness and will not detect the wrong order of parameters because annotations are meaningless to javac.

On the other hand, javac is able to execute annotation processors you specify in the command line. In particular, we can write an annotation processor which verifies if you are passing a @Rate where a @Rate is expected and so on. It means that JDK7 with help of JSR-308 annotation processors is able to detect the mistake we pointed out in the beginning of this article.

Going ahead, with JDK7 we can produce code with all the very strong type checkings we need to obtain very robust code. See the example below:
private @NonNull Double calc(@Rate double rate, @Time double time) @ReadOnly {
  // The following statement will fail because the return type cannot be null 
  if (condition) return null;
  // The following statement pass
  return new Double( Math.exp(1+rate, time) );
}

public method test() {
  @Rate double rate = 0.45;
  @Time double time = 0.5;

  // This call pass
  Double result1 = calc(rate, time);

  // This statement will fail because the receiver of calc became read-only
  result = 1.0;

  // This call *should* give us a compiler error
  Double result2 = calc(time, rate);
}

Conclusion

Thanks to new upcoming features of JDK7, programmers can now obtain robust code with very strong type checkings in Java which compare the quality of code written in C++.

See also



If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website. Thanks

[ First published by Richard Gomes on 21:22, 27 January 2008 (UTC)

Performance of numerical calculations

In this article we explore the controversial subject of performance of Java applications compared with C++ applications. Without any passion and certainly not willing to promote another flame war, we try to demonstrate that most comparisons are simply wrong because compare essentially very different things. We also present alternative implementations which address most issues.


Problem

Performance of numerical applications written in Java can be poor.


Solution

First of all, performance is a subject that becomes impossible to debate if we consider precision to the last millisecond. Even a piece of code written in assembly running in the same computer can present different execution times when run several times, due to a number of factors outside our focus in this article.

Talking about code written in Java, the subject becomes more complex because the same Java code will eventually end up on different assembly code on different platforms. In addition, there are different JVMs for different platforms and different JIT compilers for different platforms.

Running the same Java code running on a home PC and running on a IBM AIX server can result on huge differences in performance, not only because we expect a server to be much faster than a home PC but mainly because there are a number of optimizations present in IBM's JVM specifically targeting the hardware platform which is not available in a stock JVM for a generic home PC.

The tagline Java programs are very slow when compared with C++ programs is even less precise than what we just exposed. In general what happens is that very skilled C++ programmers compare their best algorithms and very optimized implementations against some Java code they copied from Java tutorials. Very skilled Java developers know that tutorials potentially work as expected but certainly are not meant to perform well.

Another point to consider is that default runtime libraries provided by Java cannot be expected to perform well. The same applies to other languages, including C/C++. This is true that C/C++ have some widely accepted libraries which perform very well but the point is that you will can potentially find something which performs better for the specific hardware platform you have, for the specific problem domain you have.

In the specific situation of Java applications, there are lots of techniques intended to improve performance of the underlying JVM and certainly there are several techniques which can be applied to Java programs in order to avoid operations which are not needed. In order to compare a benchmark test written in C++ against one written in Java, these aspects must be considered, otherwise we will be comparing code written by C++ gurus against code written by Java newbies.

When researching this subject, I've found the references listed below but I haven't spent any time copying source code and eventually changing it or anything else in order to perform the comparison the way I'd like to do. I preferred to adopt a certain "threshold" of I'd say 30%, which means that a difference of 30% or less implies we don't have enough precision. It does not mean that involved parties perform equal. It only means we don't have enough elements to compare with accuracy.

Taking the previous paragraphs into consideration, Java code can be compared with more or less equivalent C++ code:
On the other hand, big differences in performance tell us that we should try to identify reasons for poor performance and eventually propose something better.
Analyzing the previous list, we can identify that:
  • Data structures perform badly: list operations (one dimensional data structures) are 2 times slower whilst matrices can be even 3 times slower.
  • Sort algorithms involve data structures and certainly are affected by slowness of data structures.
  • Trigonometric functions are not a major concern to our specific problem domain.
  • Nested loops where not analyzed.

In order to address the major issues, we evaluated these technologies:
  • FastUtil package which offers Colletions of primitive types. This is where Java code can beat C++ code due to the way C++ handles pointers as opposed to the way Java handles arrays.
  • JAL package which mimics most of C++ STL functionality, providing fast algorithms on arrays of primitive types, which perform much faster than arrays of Objects.
  • Colt package which is an excellent class library used by CERN. In particular Colt has a modified versions of JAL in it.
  • Parallel Colt package is a re-implementation of Colt package and aims to take advantage of modern multi-core CPUs.

See also:



If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website. Thanks

[ First published by Richard Gomes on 14:24, 10 February 2008 (UTC)

Handling of float point rounding errors

In this article we demonstrate that a very elementary mathematical statement can raise big concerns for numerical applications.

Reasoning


Try this simple yet tricky code:
    System.out.println(0.1 + 0.1 + 0.1);

Oh well... the result is: 0.30000000000000004
What??? So... 0.1 + 0.1 + 0.1 is not 0.3 ???

Problem


This problem arises the way floating point numbers are represented in the processor and how they participate in floating point calculations.
Notice that the comparison if (x==0.3) jumped to the else branch. After the sum, the result was 0.30000000000000004 and not 0.3 as we expected. The behaviour of the if statement is correct. In fact, the error is located between the keyboard and the chair: you simply cannot do such comparison!

You have to:
  • remember that floating point errors may happen;
  • evaluate the epsilon associated to the operations you previously done;
  • you have to consider a certain range in your comparisions
like this:
epsilon = blah blah blah; // calculate epsilon somehow
    if ((x>=0.3-epsilon) && (x<=0.3+epsilon)) ...

Solution


JQuantLib takes the same approach as QuantLib. It calculates epsilon after a sequence of mathematical operations which gives us the order of magnitude of the error.


If you found this article useful, it will be much appreciated if you create a link to this article somewhere in your website. Thanks

[ First published by Richard Gomes on 21:47, 27 January 2008 (UTC) ]

Use CachedRowSet and get rid of low level SQL

The interface CachedRowSet provides interesting mechanisms for one willing to work with JDBC with ease. A CachedRowSet is basically a matrix in memory which maps to objects in your database, like this:

  • you have as much columns as fields in a given database table
  • you have as much rows as number of records in given a database table
  • the idea applies to views and joined tables as well

Instead of sending SQL commands to the database, you change cells in this matrix and, when you have finished all your changes in memory, you ask the CachedRowSet to do the dirty work for you. Example:

    // update current row
    crs.updateShort("Age", 58);
    crs.updateInt("Salary", 150000);
    crs.updateRow();

    // insert new row
    crs.moveToInsertRow();
    crs.updateString("Name", "John Smith");
    crs.updateInt("Age", 42);
    crs.updateShort("Salary", 78000);
    crs.insertRow();
    crs.moveToCurrentRow();

    // update the underlying database
    crs.acceptChanges();

The CachedRowSet implementation is a refined piece of software which takes care of all details related to JDBC programming. You not only have something more convenient than coding SQL commands by hand, but you also have stable software at your service doing quality work in the right way it should be done.

In addition, a CachedRowSet is meant to be used disconnected from the database most of the time. After you load a CacheRowSet with data from the database, you can serialize it, send to the web browser, get it back modified and only after that you call acceptChanges. You can also build a CachedRowSet from scratch, populate it by hand and call acceptChanges.

More info: CachedRowSet Javadoc API


CachedRowSet on Java7


Java7 ships with a RI (reference implementation) of interface CachedRowSet. Just use it as you would with any other class or interface. Have fun!


CachedRowSet on Java5 and Java6

There are only interfaces as part of the JDK. You will have to download the RI (reference implementation) by hand. You can download it from here:
JDBC Rowset 1.0.1 JWSDP 1.4 Co-Bundle MREL

Uncompress and upload rowset.jar into your artifact repository exactly like this: javax.sql.rowset:rowset:1.0.1

$ ls jdbc_rowset_tiger-1_0_1-mrel-jwsdp.zip
$ unzip jdbc_rowset_tiger-1_0_1-mrel-jwsdp.zip
$ ls jdbc-rowset/lib/rowset.jar
$ firefox jdbc-rowset/docs/index.html

Note: You can find "rowset" in Jarvana and alike, but that dependency will not help you since it refers to a pom.xml file, not the rowset.jar file you are interested. There's no rowset.jar file stored in Maven Central due to licensing issues.