Gekko User Manual > Gekko statements

The OLS statement performs linear regression (ordinary least squares) on an equation, optionally with linear restrictions on the parameters.

Note: a constant term (intercept) is added automatically, unless suppressed with <constant = no>.

In the OLS output, there are different links that can be clicked, showing for instance how the equation fits on data, decomposition with respect to the right-hand side variables, and parameter stability regarding different estimation periods.

Syntax

ols <period XTREND=... XFLAT=... CONSTANT=... DUMP=... DUMPOPTIONS=...> name leftside = var1, var2, ... IMPOSE=... ;

period	(Optional). Local period, for instance 2010 2020, 2010q1 2020q4 or %per1 %per2+1.
XTREND=	(Optional). Trend polynomial of the stated degree (must be positive). When using XTREND, Gekko will estimate the trend parameters on a linear timeseries with value -1 in the start period and 0 in the end period. [New in 3.1.2]
XFLAT=	(Optional). Restrictions on the endpoints of the trend polynomial (cf. the XTREND option). This creates so-called 'Finnish' trends, and the arguments is a list containing s{i} or e{i} for start- or end-points, where {i} states the order for which the derivative must be zero. For instance ols <xtrend=5 xflat=s2, e2>... will use a polynomial of 5'th degree, where the second-order derivatives are zero in the start- and end-points. This means that plot dif(ols_trend); will show a curve that is flat at both ends (ols_trend is the trend part of the right-hand side). If ols <xtrend=5 xflat=s1, e1>... had been used instead, the curve plot ols_trend; would be flat at both ends. [New in 3.1.2]
CONSTANT=	With <constant = no>, a constant term is not added automatically.
DUMP=	(Optional). Dumps the results as a FRML equation for use in models. You may use ols<dump> to produce a ols.frm file. ols<dump=eqs.frm> will use the filename eqs.frm instead. Note that there is no firm guarantee that a subsequent MODEL statement will load the file, but in most cases it will (FRML statements only support a limited subset of general Gekko expressions). If the equation loads, you may consider a SIM<res> to check its residuals. Gekko will put parentheses around all expressions that contain a + or -. This will introduce superfluous parentheses in expressions like a * (b + c) or exp(a - b) etc. Note that a limited set of functions like for instance log(y) or dlog(y) on the left-hand side will produce a valid FRML equation, whereas more complicated expressions on the left-hand side will need manual resolving afterwards. [New in 3.0.6]
DUMPOPTIONS=	(Optional). If you use ols<dump=eqs.frm dumpoptions='append'>, the results will be appended to an existing eqs.frm file. These options will be augmented with styling, FRML code, etc. [New in 3.0.6]
name	(Optional). A name for the equation, used to name the results. If no name is given, ols is used as name.
leftside	The leftside variable (may be an expression)
var1, ...	A list of variable names or expressions. A constant term is added automatically, unless you use option <constant = no>.
IMPOSE	(Optional). You can impose linear restrictions on the parameters, via a suitable matrix. One restriction per row of the matrix, cf. example below. Remember to count any coefficients corresponding to a trend polynomium (XTREND).

Results:

Note that if a name is given, ols is replaced with that name.

ols_predict	A timeseries with the predicted values
ols_residual	A timeseries with the residuals
#ols_param	A matrix with estimated parameters
#ols_se	A matrix with standard errors on parameters
#ols_t	A matrix with t-values on parameters
#ols_covar	A matrix with the variance-covariance matrix (of parameters)
#ols_corr	A matrix with the correlation matrix (of parameters)
#ols_stats	A matrix containing different measures (analogous to the .stats matrix in AREMOS): 1: Residual sum of squares 2: Standard error 3: Residual mean 4: Root mean square error (RMSE) 5: R squared 6: R bar squared 7: [empty] 8: Dependent variable mean 9: Durbin-Watson with lag 1 (At some point, a map will be used instead for these measures).

Example

This example estimates a linear model with five parameters. You may consult the MATRIX section to see the same parameters calculated with linear algebra, or the R_RUN section to see the same parameters calculated via the R interface.

reset;
create lna1, pcp, bul1;
lna1 <1998 2010>  = data(' 166.223000 173.221000 179.571000 187.343000 194.888000 202.959000
  209.426000 215.134000 222.716000 230.520000 238.518000 246.654000 254.991000') ;
pcp <1998 2010>   = data(' 0.9502030 0.9699920 1.0000000 1.0235000 1.0401100 1.0605400
  1.0754700 1.0977800 1.1121200 1.1314800 1.1513000 1.1717600 1.1871600') ;
bul1 <1998 2010>  = data(' 0.0684791 0.0591698 0.0560344 0.0535439 0.0535003 0.0631703
  0.0649875 0.0578112 0.0473207 0.0404508 0.0467488 0.0472923 0.0475191') ;
ols <2000 2010> dlog(lna1) = dlog(pcp), dlog(pcp.1), bul1, bul1.1;

The statements produce the following screen output:

OLS estimation 2000-2010 (n = 11)
dlog(lna1)
-----------------------------------------------------------------
   Variable Estimate Std error T-stat
-----------------------------------------------------------------
  dlog(pcp) 0.144517 0.227011 0.64
  dlog(pcp.1) 0.613875 0.236473 2.60
  bul1 0.186740 0.202534 0.92
  bul1.1 -0.350908 0.203182 1.73
  CONSTANT 0.0298039 0.0089418 3.33
-----------------------------------------------------------------
R2: 0.625034 SEE: 0.00346154 DW: 1.8651

In addition to the screen output, the timeseries ols_predict and ols_residual are produced, together with the matrices #ols_param, #ols_se, #ols_t, #ols_covar, #ols_corr, and #ols_stats. The matrices can be printed out with the PRT statement.

In the example above, you may, for example, restrict the first two parameters to sum to 0.80, and the third and fourth to be equal like this (cf. the MATRIX statement):

#r = [1, 1, 0, 0, 0, 0.80; 0, 0, 1, -1, 0, 0];
ols <2000 2010> dlog(lna1) = dlog(pcp), dlog(pcp.1), bul1, bul1.1 IMPOSE = #r;

If the parameters are called b{i}, the first restriction is equivalent to 1*b1 + 1*b2 + 0*b3 + 0*b4 + 0*b5 = 0.80, or b1 + b2 = 0.80. The second restriction is equivalent to 0*b1 + 0*b2 + 1*b3 + (-1)*b4 + 0*b5 = 0, or b3 = b4. So the last column of the #r matrix contains the values that the linear restrictions should sum up to. The restrictions produce the following:

OLS estimation 2000-2010 (n = 11)
dlog(lna1)
-----------------------------------------------------------------
   Variable Estimate Std error T-stat
-----------------------------------------------------------------
  dlog(pcp) 0.167642 0.180625 0.93
  dlog(pcp.1) 0.632358 0.180625 3.50
  bul1 -0.0863480 0.0794747 1.09
  bul1.1 -0.0863480 0.0794747 1.09
  CONSTANT 0.0291952 0.0085164 3.43
-----------------------------------------------------------------
R2: 0.491156 SEE: 0.00349218 DW: 1.6847

Note

You may consider R to perform econometrics, cf. this page. But Gekko also has some pretty good interfaces to TSP (with its rock-solid LSQ estimator).

The variables do not need to have similar magnitude to obtain precise parameter estimates (pre-scaling is performed internally).

Instead of ols<dump>, some people prefer to compose FRML equations for models by hand, using TELL and PIPE. In this way, the equations can be formatted exactly as the user prefers. To control the formatting of paramaters, you may use the inbuilt format() function, for instance using tell 'FRML y = {format(#ols_param[1], '0.000000')} * x + ({format(#ols_param[2], '0.000000')})';. The last parenthesis is to deal with #ols_param[2] being negative. See more on formatting of strings in the TELL section.

After an OLS, you may use the Copy-button in the main Gekko window to copy/paste (with full precision) the matrix of parameter values/errrors to Excel or other spreadsheets.

OLS produces quite a lot of timeseries containing data for the clickable graphs. You may use index ols_*; to obtain a list of these -- for instance, ols_trend will contain the trend component of the right-hand side (if a trend is stated).

If there are missing values at the start or end of the data, Gekko will report this and suggest a truncated (shorter) estimation period. You may use PRT to print out the expressions if missing values are reported.

Related options

OPTION fit ols rekur dfmin = 10;

Related statements

ANALYZE, MATRIX, MODEL, R_RUN