COLLAPSE

<< Click to Display Table of Contents >>

Navigation:  Gekko User Manual > Gekko statements >

COLLAPSE

Previous pageReturn to chapter overviewNext page

COLLAPSE aggregates a higher-frequency timeseries to a lower-frequency timeseries, for instance converting quarterly data to annual data (use INTERPOLATE to perform the inverse disaggregation from lower frequency to higher frequency).

 

There are two things to note about the COLLAPSE statement. Firstly, the statement ignores the global time period, and a local time period cannot be set. Therefore, the left-hand side becomes a full collapsed mirror image of the data in the right-hand side timeseries. Secondly, if there are missing values in the high-frequency series, this will be reflected as missing values in the lower-frequency series. So if you need COLLAPSE to create for instance an average of the first three quarters of a year, you must first fill in the fourth quarter with some guessed data (for daily frequency, there are options to ignore/skip missing days when collapsing).

 

Instead of the COLLAPSE statement, you may alternatively use the similar collapse() function, for instance x!q.collapse(); (see under functions). If you need to collapse a collection of Excel data "points" (any pairs of Excel dates + values), you may use IMPORT<collapse>.

 


 

Syntax

 

collapse < MISSING = ... > lowfreq = highfreq method;

 

lowfreq

Lower frequency timeseries. Frequency can be indicated with suffix !a, !q, !m or !w. Banknames may be used. Lists of names may be used, like for instance {#m}.

highfreq

Higher-frequencey timeseries. Frequency can be indicated with suffix !q, !m, !w or !d. Banknames may be used. Lists of names may be used, like for instance {#m}.

method

(Optional). Choose between:

 

total: The higher-freq observations are summed (default).

avg: The higher-freq observations are averaged.

first: The first higher-freq observation is used.

last: The last higher-freq observation is used.

 

Note: default is total. You can alter the default with option collapse method = ... ; (cf. OPTION).

MISSING=

(Optional). Only relevant when collapsing from daily (d) data. You can set collapse <missing=flex> to avoid too many frustrations with single missing days.

 

strict: missing days will entail missing data in the collapsed series (default).

flex: missing days are ignored when collapsing.

 

Note: default is strict. You can alter the default with option collapse missing d = ... ; (cf. OPTION).

 

If a variable on the right-hand side of = is stated without databank, Gekko may look for it in the list of open databanks (if databank search is active, cf. MODE).

Looping: with a list like for instance #= x1, x2;, you may use collapse {#m}!= {#m}!q; to collapse x1!q into x1!a, and x2!q into x2!a.

 


 

Example

 

Use this to convert frequency:

 

resettime 2020 2021;
x!= 1, 2, 3, 4, 5, 6, 7, 8;
collapse x!= x!q//10 and 26
prt x!a;  

 

Since the method is total as default, this will create the annual timeseries x!a where each annual observation is the sum of the corresponding quarters in x!q. Instead, you may use avg like this:

 

collapse x!= x!q avg; //will be 2.5 and 6.5 instead

 

With option avg, the quarters are averaged. If you have existing data in the x!a timeseries and only want to collapse a part of the x!q series into x!a (for instance from 2018-20), you may use a temporary variable like this:

 

reset; time 2017 2020;
x!= 11, m(), m(), m();  //only data for 2017
x!= 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6, 4, 5, 6, 7;
collapse temp!= x!q;
x!<2018 2020> = temp!a;
prt x!a;  //the 2017 value of 11 is preserved

 

In this case, it is easier to use the collapse() function, which is also convenient in general for printing/plotting etc. The collapse() and interpolate() functions (cf. INTERPOLATE) can be combined like the following example (calculating each quarter's share of all quarters):

 

reset; option freq q; time 2001 2002;
x1 = 1, 2, 3, 4, 5, 6, 7, 8;
x2 = x1/x1.collapse().interpolate();
prt x1, x2;  //x2 = 0.1, 0.2, 0.3, 0.4 and 0.19, 0.23, 0.27, 0.31

 

When collapsing daily timeseries, you may use <missing=flex> in order to ignore/skip missing days, cf. the following example.

 

reset; option freq d; time 2021m1d1 2021m1d10;
x!= 1;

collapse x1!= x!d;  //default is <missing=strict>
collapse <missing=flex> x2!= x!d;
prt <2020w53 2021w1 n> x1!w, x2!w, x!d;
 
//                   x1!w           x2!w            x!d 

//  2020w53              M         3.0000                

//  m12d28                                             M 

//  m12d29                                             M 

//  m12d30                                             M 

//  m12d31                                             M 

//  m1d1                                          1.0000 

//  m1d2                                          1.0000 

//  m1d3                                          1.0000 

//                                                       

//  2021w1          7.0000         7.0000                

//  m1d4                                          1.0000 

//  m1d5                                          1.0000 

//  m1d6                                          1.0000 

//  m1d7                                          1.0000 

//  m1d8                                          1.0000 

//  m1d9                                          1.0000 

//  m1d10                                         1.0000 

 

This produces data for x1!w and x2!w for the two weeks 2020w53 and 2021w1, where only the latter week is collapsed from a full 7-day week of data (note that 2020w53 has four days in 2020 and three days in 2021). Therefore, in 2002w53, x1!w has a missing value (because collapse is strict per default), whereas x2!w ignores the missing values and attains the value 3. (Weeks are defined and numbered following the ISO 8601 standard, where days around New Year may belong to week 52, week 53 or week 1. Active 'workdays' will be implemented, omitting for instance weekends and holidays. For now, see the getSpecialDay() function).

 

 


 

Note

 

If a frequency indicator is omitted, Gekko will use the current frequency.

 

You can also use PRT<collapse> to get similar transformations in prints (freqs q and m only), and for printing/plotting etc., the collapse() function is convenient.

 

See also IMPORT<collapse> regarding Excel spreadsheets with higher frequencies than daily, or data with irregular frequencies.

 


 

Related options

 

option collapse method = total;
option collapse missing d = strict;

 


 

Related statements

 

INTERPOLATE, IMPORT, PRT, SMOOTH