One of the main reasons for the modernization of the parser for Gekko 3.0 was that it was deemed important to handle timeseries as objects in Gekko. Timeseries are already objects in Gekko 2.0/2.2, but they are not always handled as such during calculations. To understand this, think about two timeseries x1 and x2 as vectors of data. We may say that x1 = (1, 2, 3) and x2 = (2, 3, 4) over the period 2001-2003. If we need to create the timeseries y = x1 + x2 over the same period, we can just do a vector addition: (3, 5, 7) = (1, 2, 3) + (2, 3, 4). But this is not how it is done in Gekko 2.0/2.2. In that version, a time loop is performed, setting t = 2001, t = 2002, t = 2003, respectively. So first it is asked what the values of x1 and x2 are in 2001, these values are summed, and the result is put into y (in 2001). This is repeated for 2002 and 2003. This may sound innocent enough, but there are drawbacks to this strategy. For instance, it becomes less transparent how timeseries are transferred in and out of functions, and especially the use of lags becomes a bit of a headache. You might have a function f(x) returning 2*x[-1], and that method might be called with arguments f(x1+x2). To do this, a temporary series storing the result of x1+x2 would need to be generated (in a time loop) before running the function, and all this becomes a bit messy. The pinnacle of messiness was implementing moving averages and moving sums via such time loops, because these timeloops become nested if, for instance, the moving average is nested (like movavg(movavg(x, 2), 3)), but also just movavg(pch(x), 2) became complicated. Such problems are the probably reason that AREMOS does not allow lags in user functions.

Instead of this, it was decided to do timeseries expressions more encapsulated, like vector algebra. To return to the expression y = x1 + x2, Gekko 3.0 will not calculate the first element (3 = 1 + 2) and put this into y, then the next element (5 = 2 + 3) and put this into y, and so on. Instead, Gekko 3.0 will first calculate the right-hand side as a whole, and assess that this is equal to (3, 5, 7). Then, afterwards, Gekko will think about what to do with that sequence of values (in this case, put it into the timeseries y, over the period 2001-2003).

Handling timeseries expressions as a kind of vector algebra provides clarity and robustness to the internals of Gekko, but there are some issues to this way of handling them, the most pressing being how to deal with periods and lags (or leads). Imagine two large x1 and x2 vectors, containing 50 annual observations, and imagine that the user wants to “PRINT <2001 2003> x1 + x2;”. If x1 and x2 both run from 1954 to 2003, adding them vector-wise creates a 50-element timeseries running from 1953-2003. This is a lot of waste of space and effort to calculate, if in the end we only want to print the three observations 2001-2003. So, the adding of x1 and x2 in Gekko should be smarter than that, exploiting the fact that only a small part of the result is going to be used. An obvious fix would be to just operate on the 2001-2003 parts of x1 and x3, omitting the rest. This is fine, but complexities creep in regarding the setting of this “relevancy period”. How about “PRINT <2001 2003>  (x1+x2)[-10];”? This time, only calculating x1+x2 over the period 2001-2003 will fail, since the result is going to be lagged 10 periods. So we should rather calculate x1+x2 over the period 1991-1993, which would, in the next step, conform with the 10 period lag. So the sum x1+x2 needs to know what is going to happen to it afterwards, whether it is going to be lagged/leaded, weather some observation is going to be picked out, etc. (for instance, (x1+x2)[1991]). So there are some dynamics regarding the time period in play, and in principle this is a hard problem. In principle, the addition x1 + x2 should not be performed until actually needed, which is called lazy or deferred evaluation in computer languages (R, for instance, uses lazy evaluation as default). Lazy evaluation is possible in C# (the language used for Gekko), but it is probably an overkill to use for this purpose.

In Gekko 3.0, the following will be implemented. When performing “PRINT <2001 2003>  (x1+x2)[-10];”, Gekko will tell the addition component that there is going to be a 10 period lagging afterwards, and hence the time period will be adjusted regarding this addition. Such calls may be nested, like “PRINT <2001 2003> (x1+x2)[-10][-1];”, and the logic needs to be able to deal with that, too. In 95% of cases, such logic is simple enough, but it is not watertight. What to do with this function: randomlag(x1+x2)? We could imagine that the randomlag function lags the argument with a random lag between -1 to -50, so there would be no way to know in advance which period the addition component should calculate x1 + x2 over. The only real solution to this would be using lazy evaluation (hard), or revert to the old looping of Gekko 2.0/2.2 (messy).

What would probably happen in such a case in Gekko 3.0 is that Gekko would issue an error, and advise the user to decorate the randomlag method with an indicator of the potential maximal lagging/leading going on inside of the method. If the user does not feel up to this task, there will be an option to perform full-sample calculations of x1+x2, that is, having Gekko simply calculate the vector sum over the full common sample of the two timeseries. This will cost some performance, but Gekko will run.

And, as noted, such cases where a function performs unknown lagging/leading are probably very rare. And even so, randomlag(x1), or randomlag(x1[-2]) would not be a problem, it is only when such a method is fed with an expression that the problem arises. The lag problem caused some delays in the deployment of Gekko 3.0, but the issue seems fixed for now. And the benefits of treating timeseries as a kind of encapsulated vectors will be evident in the long run, both in Gekko command files, but also in the internal Gekko source code (which can be simplified quite a lot).