COMPARE

<< Click to Display Table of Contents >>

Navigation:  Gekko User Manual > Gekko statements >

COMPARE

Previous pageReturn to chapter overviewNext page

COMPARE compares variables in the first-position and Ref (reference) databanks. The comparison is only done for timeseries of the same frequency as the global frequency setting. The comparison is done over the given period (or the global period if a period is not provided), and the user may provide a list of series that are checked (if no list is given, all series are checked).

 

COMPARE will per default put the output in the file compare_databanks.txt (this filename can be changed). You may set thresholds regarding absolute or relative differences (options ABS, REL and PCH), and you may dump a list #dif with the different series names (cf. DUMP).

 

The COMPARE statement is similar to the menu item Utilities --> Compare two databanks... in the Gekko user interface.

 


 

Syntax

 

compare < period  ABS=...  DUMP  REL=...  SORT=...  PCH=... TYPE=...  DUMP> variables  FILE=... ;

 

period

(Optional). Local period, for instance 2010 2020, 2010q1 2020q4 or %per1 %per2+1.

ABS=

(Optional). Absolute differences smaller than the value are not shown, for instance <abs = 150>.

MISSING=

(Optional). Choose m (default) or zero. If zero, any missing values (NaN, shown in PRT as M) will be treated as if they were = 0. The zero setting can be useful if you are comparing two databanks, and you do not want a missing value in one databanks and a 0 in another databank to count as a difference. (If both databanks have a missing value, or if both databanks have a 0 value, this is never counted as a difference in any case).

REL=

(Optional). Relative differences smaller than the value are not shown, for instance <rel = 0.01> equivalent to 1%. You may alternatively use PCH for the same purpose.

SORT=

(Optional). Choose between alpha (default), abs or rel. The first sorts alphabetically (which is default), the next sorts after absolute differences, and the last sorts after relative differences. The sorting and the use of ABS=, REL=, and PCH= are independent of each other.

PCH=

(Optional). Percentage differences smaller than the value are not shown, for instance <pch = 1.0> corresponding to 1%. You may alternatively use REL for the same purpose.

TYPE=

(Optional). Choose between type=normal (default) or type=hist (history). The latter computes relative changes differently. If x is a timeseries from the first-position databank and @x is a timeseries from the reference databank, type = normal computes the relative difference as rel = abs(- @x) / @x. In contrast, with type=hist the relative difference is computed as rel = abs(- @x) / ((abs(@- @x[-1]) + abs(@x[-1] - @x[-2])) / 2), where the numerator is the same, but where the denominator is the average of absolute time-changes in the current period (abs(@- @x[-1])) and the previous period (abs(@x[-1] - @x[-2])). So with type=hist, the variability of @x is used to indicate what a "large" difference is supposed to mean, and the denominator has some similarities with a standard deviation measure. The two methods will normally return different results for some variables. (For SIM Gauss-Seidel convergence check, a procedure almost exactly similar to type=hist is used for relative convergence checks).

DUMP

(Optional). If this option is set, a list #dif will be constructed, containing the list of different timeseries.

variables

A list of variable names. If no variables are given, the full databanks are compared. The names are separated by comma (like x, y, z), and a list #x of names should be used with {}-braces: {#x}. Regarding array-series, you may either indicate the name of the array-series itself (x), in which case all sub-series are checked, or you may state individual elements (like x[a,k]).

FILE=

Filenames may contain an absolute path like c:\projects\gekko\bank.gbk, or a relative path \gekko\bank.gbk. Filenames containing blanks and special characters should be put inside quotes. Regarding reading of files, files in libraries can be referred to with colon (for instance lib1:bank.gbk), and "zip paths" are allowed too (for instance c:\projects\data.zip\bank.gbk). See more on filenames here.

 

If no period is given inside the <...> angle brackets, the global period is used (cf. TIME).

 

 


 

Example

 

Compare all variables for the global period, or a given period:

 

compare;  //global period
compare <2010 2020>;  //for this given period

 

Do the same, with a user-chosen filename:

 

compare <2010 2020> file=dif.txt;

 

Sort the result by relative differences:

 

compare <sort=rel>;

 

Only compare series names from the list #x:

 

#= x1, x2, x3;
compare <2010 2020> {#x};
compare <2010 2020> x1, x2, x3;  //same as above

 

Do not show relative differences smaller than 0.02 (that is, 2%):

 

compare <2010 2020 rel=0.02>;

 

You may 'dump' a list #dif containing the names of the timeseries that are different:

 

compare <dump>;
plot <q> {#dif};  //plots the percentage differences

 

Array-series are supported, consider this example:

 

reset;
time 2001 2002;
xx = series(2);
xx[a, x] = 100, 100;
xx[b, x] = 200, 200;
xx[a, y] = 300, 300;
xx[b, y] = 400, 400;
yy = series(1);
yy[i] = 1000, 1000;
#m1 = a, b;
#m2 = list('a'); //the easiest way to state a 1-element list
clone;
xx[b, y] = 400.4, 402;
yy[i] = 1000.2, 1004;
yy[j] = 2000;
compare <dump sort = rel>;
plot <q> {#dif};
prt #dif; //print out the names of the different timeseries as a flat list.
compare xx[b, y]; //comparing only this particular element.

 

The file compare_databanks.txt will contain the following output:

 

Comparing first-position and reference databanks
 
There are the following 5 series in both banks:
xx[a, x], xx[a, y], xx[b, x], xx[b, y], yy[i]
 
There are the following 1 series in the first-position databank, but not in Ref databank:
yy[j]
 
There are the following 0 series in the Ref databank, but not in the first-position databank:
[none]
 
Out of the 5 common series, there are differences regarding 2 of them:
 
xx[b,y]        WORK       REFERENCE             ABS DIFF      % DIFF         max =     0.50
-------------------------------------------------------------------
2001       400.4000       400.0000               0.4000        0.10
2002       402.0000       400.0000               2.0000        0.50
 
yy[i]          WORK       REFERENCE             ABS DIFF      % DIFF         max =     0.40
-------------------------------------------------------------------
2001      1000.2000      1000.0000               0.2000        0.02
2002      1004.0000      1000.0000               4.0000        0.40

 

At the right of each comparison, the value that is sorted after is shown (max) -- largest differences are shown first. In this case, max = 0.50 means that the maximal percentage difference is 0.50% (in 2002) for the array-series xx[b,y].

 


 

Note

 

Note: local option <rel> and <pch> cannot be used at the same time.

 

If <abs> and <rel>/<pch> are used at the same time, only differences larger than the abs and rel/pch criteria are shown.

 


 

Related functions

 

compareFolders()

 


 

Related statements

 

MULPRT, PRT