DOWNLOAD

<< Click to Display Table of Contents >>

Navigation:  Gekko User Manual > Gekko statements >

DOWNLOAD

Previous pageReturn to chapter overviewNext page

At the moment, the statement is used to interface to a particular Danish databank containing among other things timeseries data. The downloaded file is in .px format (PC-Axis). This is a format widely used by statistical offices.

 

It is the intention to augment the DOWNLOAD statement regarding other online databanks (jobindsats.dk for instance). Note that you can import a .px file with IMPORT<px> or IMPORT<px array>.

 

The data is downloaded into the first-position databank. If possible, it is advised to put the time element/code last in the .json definition file. If this is not done, Gekko will proceed anyway, but loading the data will take more time.

 


 

Syntax

 

download < ARRAY > url  filename   DUMP=...;

 

ARRAY

(Optional). If this is set, and DUMP is not used, Gekko will put the data into array-timeseries rather than normal timeseries. If DUMP is used, you may use IMPORT <px array> afterwards.

url

Url (web address) to the databank. Note: the web address should be in quotes.

filename

Filename of the .json file defining what data to download. Remember to put the time dimension as the last dimension in your .json file.

DUMP=

(Optional). Name of the file in which to store the contents of the download (in this case, a px-file). Beware that when importing the px file afterwards, you should consider using import <px variablecode> if you want the dimension codes to correspond to what you get with a normal DOWNLOAD (that is: shorter length).

 

 


 

Examples

 

Example with monthly data:

 

reset;
option freq m;
time 2000 2016;
download  'https://api.statbank.dk/v1/data'  statbank.json;
plot {'*'}; 

 

This imports data from api.statbank.dk, with the file statbank.json file describing what data to download.

 

----------------------- statbank.json ---------------------------------
{
   "table": "pris6",
   "format": "px",
   "valuePresentation": "Value",
   "variables": [
      {
         "code": "VAREGR",
         "values": ["011200", "011100"]
      },
      {
         "code": "enhed",
         "values": ["100"]
      },
      {
         "code": "tid",
         "values": ["*"]
      }
   ]
}
-------------------------------------------------------------------------

 

You may use ["*"] to get all values of the field. The resulting series are called pris6_VAREGR_011200_enhed_100 and pris6_VAREGR_011100_enhed_100.

 

After the DOWNLOAD statement, these two timeseries are available in the first-position (Work) databank. The above provedure can be split into two parts (first dumping the download as data.px, and then importing that file):

 

reset;
option freq m;
time 2000 2016;
download 'https://api.statbank.dk/v1/data'  statbank.json   dump = data;
import <px variablecode> data;  //variablecode option to get the same dimension names as normal DOWNLOAD
plot {'*'};

 

If you prefer to use array-series, you may use the <array> option:

 

reset;
option freq m;
time 2000 2016;
download <array> 'https://api.statbank.dk/v1/data'  statbank.json;
plot {'*'};

 

or in two steps:

 

reset;
option freq m;
time 2000 2016;
download 'https://api.statbank.dk/v1/data'  statbank.json   dump = data;
import <px array> data;
plot {'*'};

 

This produces array-series pris6[011200,100] and pris6[011100,100] (can also be referred to by pris6['011200','100'] and pris6['011100','100']).

 


 

Setting up the .json file

 

For users of statistikbanken.dk, you may construct the .json file interactively as follows:

 

Go to https://api.statbank.dk/console

 

Under (1) choose "Retreive data".

Under (2), type the table name under "Tabel id" (for instance pris6 or folk2). You may find the table name by browsing statistikbanken.dk. Under "Format" choose "PX". Click "Variable- and value codes" and choose the fields you want to obtain as dimensions of the array-series (it is preferred to choose time as the last dimension). You may choose particular values ("Value-ids") for the fields, or simply * for all values. Click "Download the result as file".

Under (3), click "Execute".

 

Now you should download a data.px file that can be read into Gekko with import <px array> data;. Try disp pris6; if you have been downloading the pris6 table, in order to see the dimensions. Next, copy-paste the .json code under (3) into a new file statbank.json, and put the file into the Gekko working folder. After this, you should be able to use download 'https://api.statbank.dk/v1/data' statbank.json; to download the data directly into Gekko.

 

 


 

Reading the px format

 

The PC-Axis px format is a flexible data format well suited for multidimensional data. The format is used by many statistical offices in different countries to let their users retrieve statistics. Gekko does not use all of the contents of a px file. The way Gekko reads it is the following:

 

For instance, the timeseries name PROD01_saesonkorrigering_EJSAESON_brancheDB07_BC may be composed from the .px file (and the timeseries may get the following label (metadata): "Ikke sæsonkorrigeret, BC Råstofindvinding og industri"). The timeseries names and data are extracted as follows:

 

MATRIX= . Gets the table name from this (used in the timeseries names), for instance PROD01.

CODES("tid"). Decodes the time periods used. The alternative CODES("time") is allowed.

CODES(...). Gets dimension names and dimension elements from these CODES-fields, for use in the timeseries names. For instance, the name part brancheDB07_BC, where the first part (brancheDB07) is the dimension name, and the last part (BC) is the dimension element. Read more about abbreviated codes in the following section.

VALUES(...). Only used for metadata in the timeseries (timeseries labels).

VARIABLECODE(...). These fields provide shorter names for the variables, for instance VAREGR instead of varegruppe. This is automatically activated regarding the DOWNLOAD statement, but for IMPORT<px> you have to activate these shorter names with IMPORT<px variablecode>.

DATA= . Read the data from here. If, for instance, there is one dimension with 3 elements, and another with 4 elements, Gekko expects 12 numbers in all. Gekko will not accept if a number is split between lines, and numbers should preferably always be followed by a blank also at the end of the line (this is recommended in the px definition). Gekko will count the numbers, and a warning is issued if there are too few numbers compared to the span of the dimensions. In that case, the data may be scrambled/misaligned in Gekko, so take care! If there are too many numbers, Gekko will fail with an error.

STUB= . Is not used!

 

Note that some sources of .px files provide very long single lines of data (thousands of characters). If such a file is opened in a text editor and saved afterwards, the editor may insert line breaks that may render the file unreadable in Gekko (because numbers become split between lines).

 


 

Abbreviated codes

 

The codes inside the px file are ofte very verbose, so if possible, the often shorter dimension names and dimension elements from the .json file are used instead. For instance, in the .json file in the above example, the dimension VAREGR is queried, asking for the elements 011200 and 011100 (this is done in .json with "code": "VAREGR", "values": ["011200", "011100"]). But in the returned px file, we get the following line: CODES("varegruppe")="011200","011100";.  Here, the element names are unchanged, but the dimension name is changed from VAREGR to varegruppe, the latter being more verbose. The former name is used if you download normally with the DOWNLOAD statement, whereas the latter name is used if you for instance first download a px file (with dump=... option) and then read the px file with import<px>. To get Gekko to use the shorter codes when importing, you may use the variablecode option: IMPORT<px variablecode>.

 

For non-array downloading, the dimension elements (CODES elements) may be adjusted compared to the .px file. Any characters æ, ø, å, Æ, Ø, Å are changed into ae, oe, aa, AE, OE, AA, and subsequencly any character that is not a...z or 0...9 is removed (including hyphens - and underscores _). But normally, there are not too many "illegal" characters in .px CODES.

 


 

Note

 

There is currently (March 2024) a limit of 1.000.000 observations when downloading, cf. changelog. Users often download data over a large period, and the issue is that data is often changed only in a few periods at the end of the sample (still, the users often download the whole period to guard against changes in old "historical" data). DST has said that it would be possible to implement a feature that indicates which periods of a given sample contain changed data since a given previous download date. With such a feature, a lot of redundant downloading could be avoided, and the new downloaded data could simply be merged with the older data from the existing Gekko databank (Gekko has features to do such merging easily). This principle is known from incremental backups, and from time to time the full dataset could be downloaded (for safety).

 

For more advanced px reading, you may take a look at the pxr package in R.

 


 

Related statements

 

IMPORT, OPEN, READ