Lists

<< Click to Display Table of Contents >>

Navigation:  Gekko User Guide > Data Management User Guide > Gekko programs >

Lists

Previous pageReturn to chapter overviewNext page

Show/Hide Hidden Text

A Gekko list is an ordered sequence of Gekko variables. A list name must begin with the # symbol like the other Gekko collection types (matrix and map). Lists are often used to store strings, particularly strings that represent variable names. But lists are practical for many things. For instance, you may store 2-dimensional Excel spreadsheet cells inside a nested list (a list of lists), see example later on in this section.

 

A simple list example with a list of strings:

 

#m1 = ('x1', 'x2', 'x3');  //strict syntax
#m2 = x1, x2, x3;          //equivalent 'naked list' simple syntax
prt #m1;                   //prints the strings 'x1', 'x2', 'x3'
prt {#m1};                 //prints the series x1, x2, x3

 

Two important things must be noted about the above example. First, the strict syntax for creating a three-element list of strings uses parentheses () and quotes '. Using ()-parentheses is the the most general/strict way of defining lists, but in the particular case of lists of strings, a simpler 'naked list' syntax is legal, too (naked due to the omitted parentheses and quotes). For a naked list, strings are not enclosed in quotes, either, cf. the second statement above. It should be remembered, however, that even though the second statement above may look like it inserts three timeseries objects x1, x2, and x3, this is not the case. The naked list definition #m2 is just a convenient syntactical short-hand for the strict list definition #m1 above, and inside the resulting list there are in both cases three strings 'x1', 'x2', and 'x3', not three timeseries objects x1, x2, and x3.

 

Regarding the print-statements, with prt #m;, you are printing out the list itself -- that is, raw strings. If you need to print out the variables that the strings refer to, you must enclose the list in {}-curlies, as shown in the last print statement. Lists are also often used in connection with loops, where the same calculations are performed for each of the list elements individually (cf. the loops user guide section).

 

You may pick out an individual list element with an []-index, for instance #m[2] to pick out the second element of the list #m. To see all defined lists, use the following statement:

 

list?;

 

 

Naked list syntax

 

If you are combining strings and lists of strings, naked list syntax offers the following easy way of combining them:

 

//naked list combinations

#m1 = b, c;               // ('b', 'c')
#m2 = e,;                 // ('e',) --> note trailing comma!
#m3 = a, {#m1}, d, {#m2}; // ('a', 'b', 'c', 'd', 'e')

 

Note the {}-curlies, and note the trailing comma in the #m2 definition (a naked list of only one element, cf. the section on 'singletons' later on). The use of {}-curlies in naked lists is similar to printing lists of variables with prt a, {#m1}, d, {#m2};, which would be equivalent to prt a, b, c, d, e;. You may use the rep keyword to repeat element, for instance:

 

#letters = a rep 2, b, c rep 3;  // ('a', 'a', 'b', 'c', 'c', 'c')

 

The comma , is useful for concatenation, and for lists of strings, the naked list syntax is often convenient. Naked lists can also be used to define lists of values or lists of 'codes', for instance:

 

#m1 = 2, 3, 4;       // (2, 3, 4)        values
#m2 = 1, -2.3, 3.4;  // (1, -2.3, 3.4)   values
#m3 = 1, a, 2;       // ('1', 'a', '2')  strings
#m4 = 2, 03, 4;      // ('2', '03', '4') strings

 

The first two lists become lists of values (value scalars), whereas the last two lists become lists of strings. The main rule is that if all elements are normal values, the elements of the resulting list will be values, too. In all other cases, all the elements will become string types. There are some special rules regarding such naked lists, see more on the naked list page. To convert the elements of a list to a particular type, you may use the values(), dates(), or strings() functions.

 

Note: When using naked lists, a lot of single quotes can be avoided for long lists of strings. But this convenience may introduce some confusion. On the right-hand side of LIST or FOR statements, you should be aware that there are three different kinds of possible expressions:

 

#= something;                      //normal assignment
#= something, something, ... ;     //naked list definition
#= (something, something, ... );   //strict list definition

 

In the first, there is no comma and no soft parentheses. In the second, there are one or more commas, but no soft parentheses. In the third, there are both commas and parentheses. Now, consider this:

 

#= a;               //normal assignment
#= a, b;            //naked list definition
#= (a, b);          //strict list definition

 

In the first, you are trying to assign the timeseries a to #m (this will fail with a type error). In the second, the list will contain the two strings 'a' and 'b'. And in the last, the list will contain the two series objects a and b.

 

Therefore, in LIST or FOR statements, always take a look at the right-hand side to see if there are commas or enclosing parentheses. On the right-hand side of a naked list definition, other lists must be enclosed in {}-curlies, for instance #= {#m1}, {#m2}; to append the two lists #m1 and #m2. So why is #= #m1; legal, omitting the {}-curlies? That is because there is no comma, and the statement is therefore a normal assignment (the equivalent naked list definition would be #= {#m1},; with trailing comma).

 

 

Other operators

 

You may also combine lists with the + or || operators:

 

#m1 = a, b, c, d;  // ('a', 'b', 'c', 'd')
#m2 = c, d, e, f;  // ('c', 'd', 'e', 'f')
#m3 = #m1 + #m2;   // ('a', 'b', 'c', 'd', 'c', 'd', 'e', 'f')
#m4 = #m1 || #m2;  // ('a', 'b', 'c', 'd', 'e', 'f')

 

The #m3 list contains duplicates, because the + operator for lists is a simple concatenation of the two lists. In #m4, however, duplicates are removed, because the || is the union operator. Some commonly used operators and functions:

 

Some useful operators and functions for two lists
 

#m1 || #m2

Union

clip0049

The union of the two lists, no dublets introduced.

#m1 && #m2

Intersectionclip0050

The intersection of the two lists.

#m1 - #m2

Differenceclip0051

The difference between the two lists.

#m1 + #m2

Concatenation

This is like a 'naive' union where dublets may be introduced if #m2 contains some of the same elements as #m1.

 

unique()

sort()

Functions

There are a large number of list functions available, see more under LIST or under functions. But for instance unique() will remove dublets, and sort() will sort alphabetically. Such functions may be nested, for instance #= #m1.unique().sort();.

 

The += and -= operators can also be practical when appending to or removing elements from naked lists. To append an element to a list, the append() function can also be used, and prepend() to prepend. To remove an element, remove() can be used.  Examples:

 

#= a, b, c, d;                 //Result: ('a', 'b', 'c', 'd')
#+= e, f;                      //Append using a naked list and += operator
#= #m.append('g').append('h'); //Append using append() function
#-= g, h;                      //Remove using a naked list and -= operator
#= #m.remove('e').remove('f'); //Result: ('a', 'b', 'c', 'd')

#= #m.prepend('x');            //Result: ('x', 'a', 'b', 'c', 'd')

 

Beware that you cannot use for instance #+ 'a' or 'a' + #m to append/preprend the string 'a' to the list #m, whereas you can use #- 'a' instead of #m.remove('a') to remove 'a' occurrences from the list.

 

To add or remove a string %s from a list, the following can be convenient. Note the comma (singleton comma, cf. below). More elements can be put after the comma.

 

#+= {%s},;  //add string %s to the list
#-= {%s},;  //remove string %s from the list

 

 

Strict list syntax

 

If you need fine-grained control over the types of list elements, you can use strict syntax, that is, lists defined with ()-parentheses. For instance:

 

#= (1, 'a', 2020q1);   // (1, 'a', 2020q1)

 

In a strict-syntax list definition, ()-parentheses cannot be omitted, and all string elements must be quoted. Above, the #m list contains a value, a string, and a date, so the list elements are all of different type. Other examples where strict ()-syntax must be used:

 

#m1 = (x1, x2);                       //list of series objects
#m2 = (2001q1, 2002m12);              //list of dates
#m3 = (#m1, #m2);                     //list of lists (nested list)
#m4 = ((x1, x2), (2001q1, 2002m12));  //same as above

 

Instead of the first list (a list of series objects), it is often simpler and more manageable to store the names (as strings) of the timeseries (the two strings ('x1', 'x2')), and then use {}-parentheses afterwards when referring to the series objects.

 

 

Wildcards

 

With the INDEX statement, you can use wildcards and ranges to search/index variable names in databanks. For instance:

 

index fx* to #m1;       //wildcard
index fxa..fxn to #m2;  //range

 

The first statement looks in the first-position databank for variable names that match fx*, that is, variables beginning with the string 'fx' (this search is case-insensitive). The result is put into the list #m1. If the to #m1 part of the statement is omitted, Gekko will just print the matching names on the screen, providing an overview of how many variables in the databank start with 'fx'. Ranges are supported, too: fxa..fxn finds names in the alphabetical range fxa to fxn.

 

You may choose whether you want INDEX to return "full" variable names with bankname and frequency (like b1:x1!q) or not (like just x1). Gekko has a lot of functions for dealing with for instance banknames or frequency parts in lists of variable names, for instance removebank() or removefreq() to remove banknames or frequencies, but also functions for setting or adding those. See more under "Bank/name/frequency/index manipulations" on the functions page.

 

You may equivalently search/index in the following way:

 

#m1 = ['fx*'];       //wildcard
#m2 = ['fxa..fxn'];  //range
prt ['fx*'];         //prints the matching timeseries names (as strings)
prt {'fx*'};         //prints the matching timeseries themselves

 

As it is seen from the examples, a wildcard can either be 'naked' like fx*, used in for instance the INDEX or COPY statements, or enclosed inside quotes and brackets/parentheses, like ['fx*'] or {'fx*'}. The former is used for lists, and the latter is used in for instance PRT, PLOT, etc. The user might ask herself the following question: if index x*y; is legal syntax for finding timeseries starting with 'x' and ending with 'y', why is it not possible to use prt x*y; to print those timeseries? The reason is that x*y can also be understood as x multiplied by y, so in statements like PRT or PLOT, where mathematical expressions are allowed, you cannot use the 'naked' wildcard x*y, but must instead use the more cumbersome prt {'x*y'};.

 

Regarding wildcard searching/indexing of databanks, there are some special rules regarding the special name-characters %, #, and !. Read more about these rules in this +toggle.

 

The INDEX statement has options to control whether the resulting list of names includes banknames and frequency indicators. It should be mentioned that indexing works differently when a list rather than a databank is indexed, so for instance #m['*y1'] will match all elements of the list #m ending with 'y1', following normal matching rules, with no special treatment of %, #, and !.

 

 

Listfiles

 

You may create and manipulate lists of strings via list files. List files are just text files, where each line contains a string (without quotes), for instance:

 

vars.lst
 

a
b
c
d

 

If you put the file vars.lst into the working folder, you may subsequently use this listfile as if it was a normal list, using the syntax #(listfile vars). The listfile elements may contain characters like %, # and !, but you may also prepend a -. The latter can be useful for certain kinds of operations where you for instance want to subtract a variable from a sum. If needed, each line of a list file may contain several elements, delimited with the ; symbol. In that case, the list will become a nested list, kind of like a mini-spreadsheet where all the cells are strings (such a list will resemble the csv file syntax).

 

Gekko databank files (.gbk files) can store any kinds of (nested) lists, but sometimes when operating on lists of strings, it is just easier to be able to operate directly on a list in the form of a text file. The following shows how to create and use a listfile:

 

#(listfile vars) = a, b, c, d;       //creates file vars.lst
prt #(listfile vars);                //'a', 'b', 'c', 'd'
#= #(listfile vars) - 'c';         //remove 'c'
prt #m;                              //'a', 'b', 'd'

 

Instead of a list file, you may alternatively import a list or a nested list from a spreadsheet (xls(x) or csv file), read more in this +toggle.

 
 

One-element lists and trailing commas

 

The last thing to note about lists is that one-element lists (also called singletons lists) have to be treated in a special way, adding a 'superfluous' trailing comma:

 

//singleton list
#m1 = x1,;      //naked list with comma: result = ('x1',)
#m2 = ('x1',);  //strict syntax with comma: result = ('x1',)

 

The problem with one-element lists is that in general mathematics, (x) = x, and circumventing this general rule would introduce all sorts of other problems in Gekko. Therefore, as it is also done in e.g. Python, a trailing 'superfluous' comma is used to indicate that we are defining a one-element list. Without the extra comma, in the first statement above, Gekko would fail because it cannot convert a series x into a list, and in the second statement, Gekko would fail because it cannot convert a string 'x1' into a list.

 

If you do not like to use trailing commas, you can alternatively use the list() function:

 

#m1 = list('x1');   //one-element list, note: no comma

#m2 = list();       //empty list

 

In the first statement, quotes must be used for the string. The list() function without arguments is useful if you need to construct an empty list.

 

 

More

 

You can do many more things with lists and nested lists. Some of the more relevant capabilities are listed below. For an exhaustive explanation of all capabilities, see the LIST statement.

 

On the present page, we have mostly dealt with lists of strings. But lists of values or dates are also useful, as is lists of lists (nested lists). Nested lists are practical for representing table-like structures of "cells", for instance the contents of spreadsheets. In contrast to the matrix variable type, with a nested list, you can store dates and strings, too (and not only values). Additionally, nested lists can be of any dimensionality. See more on this page.

Regarding nested lists, you may for instance use #m[2][3] to pick out the second element of #m, which is itself a list. From this element (that is, from #m[2]), the third element is selected. In such cases, you may alternatively use the syntax #m[2, 3], which would do the same. But beware that there is a difference regarding ranges, where for instance #m[2..4][3] is not the same as #m[2..4, 3]. More on this here.

Lists are used a lot for array-series. If you define a list of strings like #= a, b;, and define array-series x[a] and x[b], you may use x[#s] to refer to these array-series, or sum(#s, x[#]) to sum them up, etc. See more under SERIES.

Lists have a close cousin: maps. Maps allow referring to an element with a name (rather than picking out with an element number), and can be thought of as a kind of mini-databanks. So instead of picking out an element of a list with an index number (for instance #m[2] to pick out element number 2), maps assign a name to each element, so you can instead pick out an element with for instance #m['%dw']. This picks out a value corresponding to a Durbin-Watson test (the name %dw is easier to remember and understand, than the index number 2...). Instead of the strict #m['%dw'], you may also use the equivalent #m[%dw] or #m.%dw. Note how the last variant resembles m:%dw, picking out the variable %dw from the databank m.

The INDEX statement can also search inside array-series, cf. INDEX.