<< Click to Display Table of Contents >>

Navigation:  Gekko commands >


Previous pageReturn to chapter overviewNext page

The OLS command performs linear regression (ordinary least squares) on an equation, optionally with linear restrictions on the parameters.


Note: a constant term (intercept) is added automatically, unless suppressed with <constant = no>.


In the OLS output, there are different links that can be clicked, showing for instance how the equation fits on data, decomposition with respect to the right-hand side variables, and parameter stability regarding different estimation periods.






OLS <period XTREND=...  XFLAT=... CONSTANT=... DUMP=... DUMPOPTIONS=...> name  leftside = var1, var2, ...  IMPOSE=... ;



(Optional). Local period, for instance 2010 2020, 2010q1 2020q4 or %per1 %per2+1.


(Optional). Trend polynomial of the stated degree (must be positive). When using XTREND, Gekko will estimate the trend parameters on a linear timeseries with value -1 in the start period and 0 in the end period. [New in 3.1.2]


(Optional). Restrictions on the endpoints of the trend polynomial (cf. the XTREND option). This creates so-called 'Finnish' trends, and the arguments is a list containing s{i} or e{i} for start- or end-points, where {i} states the order for which the derivative must be zero. For instance OLS <xtrend=5 xflat=s2, e2>... will use a polynomial of 5'th degree, where the second-order derivatives are zero in the start- and end-points. This means that PLOT dif(ols_trend); will show a curve that is flat at both ends (ols_trend is the trend part of the right-hand side). If OLS <xtrend=5 xflat=s1, e1>... had been used instead, the curve PLOT ols_trend; would be flat at both ends. [New in 3.1.2]


With <constant = no>, a constant term is not added automatically.


(Optional). Dumps the results as a FRML equation for use in models. You may use OLS<dump> to produce a ols.frm file. OLS<dump=eqs.frm> will use the filename eqs.frm instead. Note that there is no firm guarantee that a subsequent MODEL statement will load the file, but in most cases it will (FRML statements only support a limited subset of general Gekko expressions). If the equation loads, you may consider a SIM<res> to check its residuals. Gekko will put parentheses around all expressions that contain a + or -. This will introduce superfluous parentheses in expressions like * (+ c) or exp(- b) etc. [New in 3.0.6]


(Optional). If you use OLS<dump=eqs.frm dumpoptions='append'>, the results will be appended to an existing eqs.frm file. These options will be augmented with styling, FRML code, etc.  [New in 3.0.6]


(Optional). A name for the equation, used to name the results. If no name is given, ols is used as name.


The leftside variable (may be an expression)

var1, ...

A list of variable names or expressions. A constant term is added automatically, unless you use option <constant = no>.


(Optional). You can impose linear restrictions on the parameters, via a suitable matrix. One restriction per row of the matrix, cf. example below. Remember to count any coefficients corresponding to a trend polynomium (XTREND).





Note that if a name is given, ols is replaced with that name.



A timeseries with the predicted values


A timeseries with the residuals


A matrix with estimated parameters


A matrix with standard errors on parameters


A matrix with t-values on parameters


A matrix with the variance-covariance matrix (of parameters)


A matrix with the correlation matrix (of parameters)


A matrix containing different measures (analogous to the .stats matrix in AREMOS):


1: Residual sum of squares

2: Standard error

3: Residual mean

4: Root mean square error (RMSE)

5: R squared

6: R bar squared

7: [empty]

8: Dependent variable mean

9: Durbin-Watson with lag 1


(At some point, a map will be used instead for these measures).







This example estimates a linear model with five parameters. You may consult the MATRIX section to see the same parameters calculated with linear algebra, or the R_RUN section to see the same parameters calculated via the R interface.


CREATE lna1, pcp, bul1;
SERIES <1998 2010> lna1 = data(' 166.223000  173.221000  179.571000  187.343000  194.888000  202.959000 
  209.426000  215.134000  222.716000  230.520000  238.518000  246.654000  254.991000') ;
SERIES <1998 2010> pcp  = data(' 0.9502030   0.9699920   1.0000000   1.0235000   1.0401100   1.0605400   
  1.0754700   1.0977800   1.1121200   1.1314800   1.1513000   1.1717600   1.1871600')  ;
SERIES <1998 2010> bul1 = data(' 0.0684791   0.0591698   0.0560344   0.0535439   0.0535003   0.0631703   
  0.0649875   0.0578112   0.0473207   0.0404508   0.0467488   0.0472923   0.0475191')  ;
OLS <2000 2010> dlog(lna1) = dlog(pcp), dlog(pcp.1), bul1, bul1.1;


The commands produce the following screen output:


 OLS estimation 2000-2010 (n = 11)
       Variable          Estimate         Std error        T-stat 
  dlog(pcp)              0.144517          0.227011          0.64 
  dlog(pcp.1)            0.613875          0.236473          2.60 
  bul1                   0.186740          0.202534          0.92 
  bul1.1                -0.350908          0.203182          1.73 
  CONSTANT              0.0298039         0.0089418          3.33 
 R2: 0.625034    SEE: 0.00346154    DW: 1.8651


In addition to the screen output, the timeseries ols_predict and ols_residual are produced, together with the matrices #ols_param, #ols_se, #ols_t, #ols_covar, #ols_corr, and #ols_stats. The matrices can be printed out with the PRT command.


In the example above, you may, for example, restrict the first two parameters to sum to 0.80, and the third and fourth to be equal like this (cf. the MATRIX command):


#= [1, 1, 0, 0, 0, 0.80; 0, 0, 1, -1, 0, 0];
OLS <2000 2010> dlog(lna1) = dlog(pcp), dlog(pcp.1), bul1, bul1.1 IMPOSE = #r;


If the parameters are called b{i}, the first restriction is equivalent to 1*b1 + 1*b2 + 0*b3 + 0*b4 + 0*b5 = 0.80, or b1 + b2 = 0.80. The second restriction is equivalent to 0*b1 + 0*b2 + 1*b3 + (-1)*b4 + 0*b5 = 0, or b3 = b4. So the last column of the #r matrix contains the values that the linear restrictions should sum up to. The restrictions produce the following:


 OLS estimation 2000-2010 (n = 11)
       Variable          Estimate         Std error        T-stat 
  dlog(pcp)              0.167642          0.180625          0.93 
  dlog(pcp.1)            0.632358          0.180625          3.50 
  bul1                 -0.0863480         0.0794747          1.09 
  bul1.1               -0.0863480         0.0794747          1.09 
  CONSTANT              0.0291952         0.0085164          3.43 
 R2: 0.491156    SEE: 0.00349218    DW: 1.6847






You may consider R to perform econometrics. But Gekko also has some pretty good interfaces to TSP (with its rock-solid LSQ estimator).


The variables do not need to have similar magnitude to obtain precise parameter estimates (pre-scaling is performed internally).


Instead of OLS<dump>, some people prefer to compose FRML equations for models by hand, using TELL and PIPE. In this way, the equations can be formatted exactly as the user prefers. To control the formatting of paramaters, you may use the inbuilt format() function, for instance using TELL 'FRML y = {format(#ols_param[1], '0.000000')} * x + ({format(#ols_param[2], '0.000000')})';. The last parenthesis is to deal with #ols_param[2] being negative. See more on formatting of strings in the TELL section.


After an OLS, you may use the Copy-button in the main Gekko window to copy/paste (with full precision) the matrix of parameter values/errrors to Excel or other spreadsheets.


OLS produces quite a lot of timeseries containing data for the clickable graphs. You may use INDEX ols_*; to obtain a list of these -- for instance, ols_trend will contain the trend component of the right-hand side (if a trend is stated).




Related commands