IMPORT

<< Click to Display Table of Contents >>

Navigation:  Gekko commands >

IMPORT

Previous pageReturn to chapter overviewNext page

The IMPORT command merges data (typically series) from an external file into the first-position databank. The IMPORT statement is primarily for non-.gbk files, and it should be noted that IMPORT without options restricts data to the global time period, it only puts data into the first-position databank, and it merges data with any pre-existing data in the first-position databank.

 

Import supports collapsing (aggregating) data points of high frequency into annual, quartery, monthly or daily series, cf. the <collapse> option.

 

Compatibility note: If a time period is not indicated in the <>-option field, Gekko 3.0 will only import data inside the global time period. Before Gekko 3.0, all data would have been imported. To emulate previous behavior, you can use IMPORT<all>. Alternatively, you may set "OPTION bugfix import export = yes;". If the option is set, IMPORT and EXPORT will work as in pre-3.0 versions. The option will be removed at some point, so it is better to change occurrences of date-less IMPORT to IMPORT<all> in old command files.

 

Tabular formats note: When using IMPORT with xlsx, csv or prn files, it is advised to first set the global frequency (option freq = ...) to the frequency of the data file (temporary frequency change can be done with BLOCK freq ...; IMPORT ... ; END;). With DATEFORMAT and DATETYPE at their default values, dates like 1990a1, 1990y or 90 are treated as annual 1990. A date like 199003 is treated as 1990q3 or 1990m3, if global frequency is set to q or m. A date like 19900325 is always understood as daily 1990m3d25. To import data with undated (u) frequency, the global frequency should be set to u first.

 

IMPORT is intended for non-.gbk files, and can be thought of as a soft version of READ. In contrast to READ, IMPORT does not clear the first-position databank (instead it merges data), it only imports data corresponding to the global time period (unless a time period or <all> is used), and it does not alter the Ref databank. There are the following equivalences: IMPORT = READ<first merge respect>, and the inverse: READ = CLEAR<first> + IMPORT<all> + CLONE.

 

 


 

Syntax

 

IMPORT < period  format  ALL  ARRAY  COLS  REF  SHEET=... CELL=... NAMECELL=...  DATECELL=... COLLAPSE=...  METHOD=... DATEFORMAT=...   DATETYPE=...  >  filename TO bankname;

 

period

(Optional). Without a time period indicated, Gekko will import all the data for all observations. When a period is indicated, the databank is time-truncated.

format

(Optional). Choose between CSV, FLAT; GDX, PCIM, PRN, PX, TSD, TSP, XLS, XLSX.

 

CSV: Comma-separated file. Tabular format with rows/cols consisting of names and dates. The global frequency (OPTION freq) must correspond to the frequency in the file.

FLAT: A Gekko-specific text-based format with lines that resemble Gekko series statements. The format is: variable name + start date + end date + numbers. These items are separated by blanks, for instance "x  2020  2022  1.5  -2.5  3.5". This corresponds to <2020 2022> = 1.5, -2.5, 3.5;. If only one number is given, it will be used for the full time period. You can use 'm' or 'm()' to indicate a missing value. Blank lines and lines beginning with '//' are ignored. This format reads much faster than 'real' series statements (which have to be parsed and compiled before the values are extracted).

GDX: A binary GAMS-database. Note OPTION gams exe folder = ... where it is possible to point to the exact GAMS folder (otherwise the system will try to auto-locate GAMS). It seems necessary to use a 32-bit version of GAMS, since the current version of Gekko is 32-bit. Please note that the data is read as array-timeseries (see SERIES), and that Gekko only reads variables, parameters, sets (as Gekko lists) and domains. GAMS can be freely downloaded as a demo, and the demo will work fine regarding Gekko IMPORT. Default options are OPTION gams time set = 't'; OPTION gams time prefix = ''; OPTION gams time offset = 0; OPTION gams time detect auto = no;. This corresponds to time having the set name 't', with natural values, for instance 2020, 2021, etc. Default GAMS read is using a fast reader (low-level API). If this poses problems, try the more robust normal API by setting OPTION gams fast = no;. See more under OPTION.

PCIM: A binary PCIM databank.

PRN: The first item in the prn format must be either 'date' or 'name' to indicate the orientation. The global frequency (OPTION freq = ...) must correspond to the frequency in the file.

PX: Imports a PC-Axis file. See also the <array> option. See more info regarding the px format and how Gekko reads it under the DOWNLOAD command.

TSD: For interchange with AREMOS and others.

TSP: Imports TSP records.

XLS and XLSX: Tabular format with rows/cols consisting of names and dates. If you need to pick out Excel data from particular cells, see SHEET<import>. The global frequency (OPTION freq = ...) must correspond to the frequency in the file. The engine used for Excel reading can be changed with OPTION sheet engine = ...;.

 

filename

Filenames may be contain an absolute path like c:\projects\gekko\myfile, a relative path like \gekko\myfile.gbk, or be stated without a path. Filenames containing blanks and special characters should be put inside quotes. See more on filenames here.

If the filename is set to '*', you will be asked to choose the file in Windows Explorer.

The extension .gbk is automatically added, if it is missing.

ALL

(Optional). With this option, all observations are imported, regardless of the global time period. This corresponds to pre-3.0 Gekko behavior.

REF

(Optional). Reads the file into the reference databank (shown as REF on the F2 window list).  Note that the Ref/reference databanks does not show up in the F2 window if it is empty.

COLS

(Optional). For .csv or Excel files, this indicates whether the timeseries are running downwards in columns.  Note that for .prn files, you indicate this in the first 'cell' (date/name).

TO

(Optional). If "TO bankname" is indicated, Gekko will put the data into a seperate 'named' databank alongside the Work and Ref databanks. For instance, after IMPORT <xlsx> adambk TO a;, you may refer to the variables by means of colon, for instance PRT a:var1;. If you use IMPORT <xlsx> adambk TO *;, the bankname will be the same as the file name. It should be noted that the databank will be read-only (non-editable) when opened like this (this functionality is a subset of the OPEN command)

ARRAY

(Optional). Regarding the PX format, if this option is set, Gekko will put the data into array-timeseries rather than normal timeseries (for the GDX format, Gekko always puts into array-timeseries per default).

SHEET=

(Optional). The name of the sheet for your data, for instance 'Data1'

CELL=

(Optional). For Excel files: the first cell of the data section. Defaults to 'B2'. This allows for rows/columns between the dates, names and data cells (cf. DATECELL and NAMECELL).

DATECELL=

(Optional). For Excel files: the first cell of the dates labels. Calculated from CELL location if not provided. This allows for rows/columns between the dates, names and data cells (cf. CELL and NAMECELL).

NAMECELL=

(Optional). For Excel files: the first cell of the names labels. Calculated from CELL location if not provided. This allows for rows/columns between the dates, names and data cells (cf. CELL and DATECELL).

COLLAPSE=

(Optional). For Excel files with Excel-dates that are going to be collapsed, this option can be set to either m, q or a and indicates the frequency that the data points are being collapsed into. A data point is an Excel date and a corresponding value, for instance 24-Dec-2010 with the value 123.45. In your Excel version, this date might be shown as 24/12/2010 (British English) or 12/24/2010 (US English) or in other formats, but internally there is no confusion, since the Excel dates are stored as values (technically the number of days since January 1, 1900). See the collapse example below.

METHOD=

(Optional: default = total). For Excel files using COLLAPSE, the METHOD option sets how the collapse (aggregation) is performed. Choose between total, avg, first, last, count, cf. also the COLLAPSE command. Use "method=count" to check that data is being collapsed as desired, and note that "method=avg" amounts to "method=total" divided by "method=count". After collapsing into monthly or quarterly timeseries, X12A may be used for seasonal adjustment.

DATEFORMAT=

DATETYPE=

(Optional). These options control the date format for .xlsx and .csv files. DATEFORMAT can be either 'gekko' (default) or a format string like 'yyyy-mm-dd', and the latter may contain a first or last indicator, for instance 'yyyy-mm-dd last', which indicates for quarterly or monthly data that the last day of the quarter or month is used. DATETYPE can be either 'text' or 'excel'. In the former case, the dates are understood as text strings (for instance '2020q3' or '2020-09-30' for a quarterly date), and in the latter case (not relevant for .csv files), the date is understood as an Excel date, which basically counts the days since January 1, 1900. This number would correspond to 44104 for the date 2020-09-31, and can be shown in Excel in different ways depending upon date format settings, language settings, etc., but the internal number itself is unambiguous. [New in 3.0.5].

 

If no period is given inside the <...> angle brackets, the global period is used (cf. TIME).

 

 


 

Examples

 

Reading data from the file data.xlsx (spreadsheet) can be done with:

 

IMPORT <xlsx> data;

 

or by the following:

 

IMPORT <xlsx> *;

 

and then selecting the file. You can use paths etc.:

 

IMPORT <tsd> otherbanks\adam3;

 

This will look for adam3.tsd in the subfolder 'otherbanks', relative the the Gekko working folder.

 

Use the TO keyword like this:

 

IMPORT <xlsx> forecst2 TO f2;

 

 

This reads forecst2.xlsx into the named databank f2. After this, you may use for instance PRT f2:gdp; to print out the timeseries gdp from this databank. You may use IMPORT <xlsx> forecst2 TO *; if you wish to use the filename as databank name. It is possible to use for instance IMPORT <xlsx> * TO *;.

 

Using IMPORT for csv or Excel files is only implemented for 'well-formed' spreadsheets. That is, with data starting in the first column and first row, and with either timeseries running left-to-right (normal for .csv files) or downwards (less normal). You may use IMPORT<csv cols> or IMPORT<xlsx cols>, if the timeseries are running downwards. If you need to pick out data from Excel cells more arbitrarily, see the SHEET<import> command.

 

 


 

Example, collapse

 

Excel data may be collapsed from higher frequencies than months (for instance from weekly, daily or even hourly observations), if the dates are represented as 'Date' types in Excel. Example:

 

IMPORT <xlsx sheet='oil' collapse=m> highfreq.xlsx;

 

The Excel sheet might look like this:

 

 

19-1-2011

20-1-2011

21-1-2011

24-01-2011

25-01-2011

oil_crude

1

2

3

4

5

oil_refined

2

4

6

8

10

 

Using this, it is expected that the timeseries run row-wise, with data starting at cell B2. Hence, the first date should be found at B1, and the dates should continue at cells B2, B3, etc. The first name should be at cell A2, and the names should continue at cells A3, A4, etc. In the example, the timeseries will be collapsed into monthly frequency, in the following way. For each date in the dates row, the month of this particular date is found, and the data is put into that particular month for each timeseries. Since the default method is 'total', the data is being summed for each month. You may use "collapse = q" or "collapse = a" to collapse directly into quarterly or annual data (this is better and simpler than using the COLLAPSE command on the resulting monthly timeseries). For daily data, you may alternatively read these directly with normal IMPORT, and later use COLLAPSE to collapse them into monthly timeseries.

 

If needed, the X12A command can then be used for seasonal adjustments. If you only want to obtain parts of the timeseries, you may restrict with a time period, for instance:

 

IMPORT <2000m1 2018m5 xlsx sheet='oil' collapse = m> highfreq.xlsx;

 

If you prefer averages, use "method=avg":

 

IMPORT <xlsx sheet='oil' collapse=m method=avg> highfreq.xlsx;

 

To check that there is a similar number of data points for each month, you may use "method=count" and print/plot the resulting series to check this (particularly relevant regarding the start and end of the range of Excel dates). As noted above, "method=avg" amounts to "method=total" divided by "method=count".

 

The data does not need to start at cell B2. If, for instance, the first data cell is at G10, you may use:

 

IMPORT <xlsx sheet='oil' cell='g10' collapse=m> highfreq.xlsx;

 

Here, Gekko will expect the first date cell to be at G9, and the first name cell to be at F10. If there are rows/cols between the dates/names and the data cells, you may indicate the precise location of the dates/names:

 

IMPORT <xlsx sheet='oil' cell='g10' datecell='g1'  namecell='a10'  collapse=m> highfreq.xlsx;

 

In this particular case, the dates/names are located in the first row and column of the spreadsheet. If the timeseries run downwards in columns, you may use <cols> for transposed importing:

 

IMPORT <xlsx cols sheet='oil' collapse=m> highfreq.xlsx;

 

Note: when using this functionality, you may 'collapse' for instance monthly data into its own frequency. In that case, using <method=count> should produce timeseries with value 1, indicating that there is only 1 observation for each month (otherwise something is wrong regarding the Excel sheet). Note also that dates in Excel are represented as the number of days since January 1, 1900. These dates may contain fractions, so 1 hour is represented by 1/24, etc. Keep this in mind if you are using <datecell=...>. If this points to a sequence of numbers that are not dates, these numbers may be erroneously interpreted as dates!

 


 

Note

 

To convert a .tsd file or other formats into a .gbk file, just import it with IMPORT<tsd>, and WRITE it. Please note that a .tsd file operates with 8 significant digits (or less), so there will typically be a loss of precision compared to a .gbk file (which is in double-precision).

 

The option 'copylocal' below copies the targeted file to a temporary file on the user's local hard disk before reading. This copying is typically very fast, and afterwards reading the temporary file is faster and more reliable, if the targeted file is located on a network drive. In general, this is a recommended option that alleviates some potential network problems.

 

 


 

Related options

 

OPTION databank file copylocal = yes;

OPTION folder bank = [empty];

OPTION folder bank1 = [empty];

OPTION folder bank2 = [empty];

OPTION gams exe folder = [empty];

OPTION gams fast = yes;

OPTION gams time set = 't';

OPTION gams time prefix = '';

OPTION gams time offset = 0;

OPTION gams time detect auto = no;

OPTION sheet engine = internal; //use 'excel' for the older .xls format

 

 


 

Related commands

 

READ, WRITE, EXPORT, OPEN, CLONE, DOWNLOAD, COLLAPSE