|
<< Click to Display Table of Contents >> IMPORT |
![]() ![]()
|
The IMPORT statement merges variables from a data file (for instance .csv or .xlsx) into the first-position databank (use OPEN if you need to open a 'named' databank instead). The IMPORT statement is primarily for non-.gbk files, and it should be noted that IMPORT without options restricts data to the global time period. There is a DOWNLOAD statement for online databanks.
In contrast to READ, IMPORT does not clear the first-position databank (but merges data), it only imports data corresponding to the global time period (unless <all> is used), and it does not alter the Ref (reference) databank. Between the close cousins IMPORT and READ, there are the following equivalences: import ...; = read<first merge respect>...; and the inverse: read ...; = clear<first>; import<all>...; clone;. READ is quite often used for non-gbk files too, for instance read <gdx>, because READ clears the first-position databank and does not restrict the data period. When importing data containing missing values, beware that such values are only transferred to Gekko if they exist in-between 'real' values, so if you import a (part of a) series with values M, 1, M, 2, M, only the middle missing value is transferred.
Import supports directly collapsing (aggregating) data points of high unknown frequency into annual, quartery, monthly, weekly or daily series, cf. the <collapse> option.
Tabular formats note: When using IMPORT with .csv, .xlsx or other files, it is advised to first set the global frequency (option freq = ...) to the frequency of the data file (temporary frequency change can be done with block freq ...; import ... ; end;). With DATEFORMAT and DATETYPE at their default values, dates like 1990a1, 1990y or 90 are treated as annual 1990. A date like 199003 is treated as 1990q3 or 1990m3, if global frequency is set to q or m. A date like 19900325 is always understood as daily 1990m3d25. To import data with undated (u) frequency, the global frequency should be set to u first. |
Because Gekko uses a so-called cache for databank files, the same file will load faster the next time it is imported.
import < period format ALL COLS REF SHEET=... CELL=... NAMECELL=... DATECELL=... COLLAPSE=... METHOD=... DATEFORMAT=... DATETYPE=... ARRAY VARIABLECODE > filename TO bankname;
period |
(Optional). Without a time period indicated, Gekko will import all the data for all observations. When a period is indicated, the databank is time-truncated. |
format |
(Optional). Choose between aremos, csv, flat, gcm, gdx, parquet, pcim, prn, px, sdf, tsd, tsp, xls/xlsx.
•aremos: If you need more precision (significant digits) from AREMOS than the TSD format can provide, you may use AREMOS dump files. See example. •csv: Comma-separated file. Tabular format with rows/cols consisting of names and dates. The global frequency (option freq) must correspond to the frequency in the file. Array-series names like x[a,b] are allowed. •flat: A Gekko-specific text-based format with lines that resemble Gekko series statements. The format is: variable name + start date + end date + numbers. These items are separated by blanks, for instance x 2020 2022 1.5 -2.5 3.5. This corresponds to x <2020 2022> = 1.5, -2.5, 3.5;. If only one number is given, it will be used for the full time period. You can use m or m() to indicate a missing value. Blank lines and lines beginning with // are ignored. This format reads much faster than 'real' series statements (which have to be parsed and compiled before the values are extracted). Note that frequency indicators are allowed but not enforced, so you may use x!a instead of x (if no frequency indicator is provided, Gekko will guess the frequency from the time period). Also note that array-series like x[a,b] are allowed, but for array-series, blanks inside the [...] are not allowed, so use x[a,b], not x[a, b]. •gcm: Gekko can export timeseries as a .gcm file, using export<gcm>. To "import" such a .gcm file, just RUN it (or copy-paste the contents into your preferred .gcm file). Note that .flat files are similar, but load much faster. Array-series names like x[a,b] are allowed. •gdx: A binary GAMS-database. Note option gams exe folder = ... where it is possible to point to the exact GAMS folder (otherwise the system will try to auto-locate GAMS). Even without the option set, the auto-locator seems pretty good at locating a 32-bit GAMS for a 32-bit Gekko, and a 64-bit GAMS for a 64-bit Gekko version. Please note that the data is read as array-timeseries (see SERIES), and that Gekko only reads variables, parameters, sets (as Gekko lists) and domains. GAMS can be freely downloaded as a demo, and the demo will work fine regarding Gekko IMPORT. Default options are option gams time set = 't'; option gams time prefix = ''; option gams time offset = 0; option gams time detect auto = no;. This corresponds to time having the set name 't', with natural values, for instance 2020, 2021, etc. Default GAMS read is using a fast reader (low-level API). If this poses problems, try the more robust normal API by setting option gams fast = no;. See more under OPTION. •parquet: A binary dataframe representation using Apache Parquet, details in this appendix. Note that only (array-)timeseries (see SERIES) can be stored in a parquet file. Think of parquet as "better and faster" than .csv. The parquet format supports export, too, and a roundtrip (exporting and importing data via a parquet file) will preserve all data and metadata, except data-traces. The format is very convenient for using together with for example Python, R, SQL, etc. •pcim: A binary PCIM databank. •prn: The first item in the .prn format must be either date or name to indicate the orientation. The global frequency (option freq = ...) must correspond to the frequency in the file. Array-series names like x[a,b] are allowed. •px: Imports a PC-Axis file. See also the <array> option for array-series. See more info regarding the px format and how Gekko reads it under the DOWNLOAD statement. Note that the time element of the .px file (CODES("tid") or CODES("time")) should preferably be stated last (this is to keep the .px data file organised in the time dimension, which makes it load faster in Gekko). See also the array and variablecode options: specifically it is advised to use the variablecode option in order to get shorter timeseries names in the non-array case (these will correspond to what you get with direct DOWNLOAD). •sdf: Imports .sdf format. The format is a tab-separated 'table' without header row, with period in the second-last column, and value (with decimal comma) in the last column. Sdf is an array-series format, a bit like .px. •tsd: For interchange with AREMOS and others. Array-series are not supported. •tsp: Imports TSP records. Array-series are not supported. •xls/xlsx: Tabular format with rows/cols consisting of names and dates. See SHEET<import> to see how to load a spreadsheet as a nested list of 'cells' and scrape data from these. The global frequency (option freq = ...) must correspond to the frequency in the file. The engine used for Excel reading can be changed with option sheet engine = ...;. Array-series names like x[a,b] are allowed.
|
filename |
Filenames may contain an absolute path like c:\projects\gekko\bank.gbk, or a relative path \gekko\bank.gbk. Filenames containing blanks and special characters should be put inside quotes. Regarding reading of files, files in libraries can be referred to with colon (for instance lib1:bank.gbk), and "zip paths" are allowed too (for instance c:\projects\data.zip\bank.gbk). See more on filenames here. If the filename is set to '*', you will be asked to choose the file in Windows Explorer. The extension .gbk is automatically added, if it is missing. |
ALL |
(Optional). With this option, all observations are imported, regardless of the global time period. |
REF |
(Optional). Reads the file into the Ref (reference) databank (shown as REF on the F2 window list). |
COLS |
(Optional). For .csv or Excel files, this indicates whether the timeseries are running downwards in columns. Note that for .prn files, you indicate this in the first 'cell' (date/name). |
TO |
(Optional). If to bankname is indicated, Gekko will put the data into a seperate 'named' databank alongside the Work and Ref databanks. For instance, after import <xlsx> bank1 to a;, you may refer to the variables by means of colon, for instance prt a:var1;. If you use import <xlsx> mybank TO *;, the bankname will be the same as the file name (mybank). It should be noted that the databank will be read-only (non-editable) when opened like this (this functionality is a subset of the OPEN statement)
(Optional). If to bankname is indicated, Gekko will put the data into a seperate 'named' databank alongside the first-position and Ref databanks. For instance, after import <xlsx> bank1 to b;, you may refer to the variables by means of colon, for instance prt b:var1;. If you use import <xlsx> bank1 to *;, the bankname will be the same as the file name. It should be noted that the databank will be read-only (protected) when opened like this (import ... to ... is essentially the same as an OPEN statement) |
ARRAY |
(Optional). Regarding the .px format, if this option is set, Gekko will put the data into array-timeseries rather than normal timeseries (for the GAMS .gdx format, Gekko always puts into array-timeseries per default). The variablecode option will not matter if <array> option is used. |
VARIABLECODE |
(Optional). Regarding the .px format, if this option is set, Gekko will use shorter variable names codes, for instance VAREGR instead of varegruppe. Using this option is recommended, and the DOWNLOAD statement uses it implicitly. The variablecode option will not matter if <array> option is used. |
SHEET= |
(Optional). The name of the sheet for your data, for instance 'Data1'. |
CELL= |
(Optional). For Excel files: the first cell of the data section. Defaults to 'B2'. This allows for rows/columns between the dates, names and data cells (cf. DATECELL and NAMECELL). |
DATECELL= |
(Optional). For Excel files: the first cell of the dates labels. Calculated from CELL location if not provided. This allows for rows/columns between the dates, names and data cells (cf. CELL and NAMECELL). |
NAMECELL= |
(Optional). For Excel files: the first cell of the names labels. Calculated from CELL location if not provided. This allows for rows/columns between the dates, names and data cells (cf. CELL and DATECELL). |
COLLAPSE= |
(Optional). For Excel files with Excel-dates that are going to be collapsed, this option can be set to either a, q, m, w or d, and indicates the frequency that the data points are being collapsed into. A data point is an Excel date and a corresponding value, for instance 24-Dec-2010 with the value 123.45. In your Excel version, this date might be shown as 24/12/2010 (British English) or 12/24/2010 (US English) or in other formats, but internally there is no confusion, since the Excel dates are stored as values (technically the number of days since January 1, 1900). See the collapse example below. |
METHOD= |
(Optional: default = total). For Excel files using COLLAPSE, the METHOD option sets how the collapse (aggregation) is performed. Choose between total, avg, first, last, count, cf. also the COLLAPSE statement. Use "method=count" to check that data is being collapsed as desired, and note that "method=avg" amounts to "method=total" divided by "method=count". After collapsing into monthly or quarterly timeseries, X12A may be used for seasonal adjustment. |
DATEFORMAT= DATETYPE= |
(Optional). These options control the date format for .xlsx and .csv files, and for the use of SHEET in Gekcel. DATEFORMAT can be either 'gekko' (default, for instance 2020q3 or 2020m11) or a format string like 'yyyy-mm-dd', and the latter may contain a first or last indicator, for instance 'yyyy-mm-dd last', which indicates for quarterly, monthly or weekly data that the last day of the quarter/month/week is used. DATETYPE can be either 'text' or 'excel'. In the former case, the dates are understood as text strings (for instance '2020q3' or '2020-09-30' for a quarterly date), and in the latter case (not relevant for .csv files), the date is understood as an Excel date, which basically counts the days since January 1, 1900. This number would correspond to 44104 for the date 2020-09-31, and can be shown in Excel in different ways depending upon date format settings, language settings, etc., but the internal number itself is unambiguous. When using SHEET in Gekcel, using DATETYPE='excel' is advised for consistency. |
BANKNAME= |
(Optional). For the .parquet format, sets the name filtered for in the id and bank columns. An empty string will be ignored. |
•If no period is given inside the <...> angle brackets, the global period is used (cf. TIME).
Exporting and importing data from a file data.xlsx (spreadsheet) can be done with:
reset; time 2021 2023; |
The data.xlsx file looks like this:

The spreadsheet has an empty cell A1, names in column A, and dates in row 1. A .csv file would look similar. You may also use the following:
import <xlsx> *; |
and then select the file. You can use paths etc.:
import <xlsx> otherbanks\data; |
This will look for data.xlsx in the subfolder otherbanks, relative the the Gekko working folder.
You may OPEN a non-gbk file as a 'named' databank like this (or use import ... to ...)
open <xlsx> data1 as d; |
This reads data1.xlsx into the named databank d (cf. the F2 databank list). After this, you may use for instance prt d:x1; to print out the timeseries x1 from this databank. You use open <xlsx> * as d; to select the filename from a list.
Using IMPORT for .csv or .xlsx files is only implemented for 'well-formed' spreadsheets, like in the example above. That is, with with timeseries either running in rows or columns (columns is less normal, use the <cols> option in that case). If you need to pick out data from Excel cells more arbitrarily, for instance offsetting where that data starts, see the SHEET<import> statement.
Excel data may be collapsed from higher frequencies than daily (for instance hourly observations), if the dates are represented as 'Date' types in Excel. Example:
import <xlsx sheet='oil' collapse=m> highfreq.xlsx; |
The Excel sheet might look like this:
|
Using this, it is expected that the timeseries run row-wise, with data starting at cell B2. Hence, the first date should be found at B1, and the dates should continue at cells B2, B3, etc. The first name should be at cell A2, and the names should continue at cells A3, A4, etc. In the example, the timeseries will be collapsed into monthly frequency, in the following way. For each date in the dates row, the month of this particular date is found, and the data is put into that particular month for each timeseries. Since the default method is 'total', the data is being summed for each month. You may use collapse = q or collapse = a to collapse directly into quarterly or annual data (this is better and simpler than using the COLLAPSE statement on the resulting monthly timeseries). For daily data, you may alternatively read these directly with normal IMPORT, and later use COLLAPSE to collapse them into monthly timeseries.
If needed, the X12A statement can then be used for seasonal adjustments. If you only want to obtain parts of the timeseries, you may restrict with a time period, for instance:
import <2000m1 2018m5 xlsx sheet='oil' collapse = m> highfreq.xlsx; |
If you prefer averages, use method=avg:
import <xlsx sheet='oil' collapse=m method=avg> highfreq.xlsx; |
To check that there is a similar number of data points for each month, you may use "method=count" and print/plot the resulting series to check this (particularly relevant regarding the start and end of the range of Excel dates). As noted above, method=avg amounts to method=total divided by method=count.
The data does not need to start at cell B2. If, for instance, the first data cell is at G10, you may use:
import <xlsx sheet='oil' cell='g10' collapse=m> highfreq.xlsx; |
Here, Gekko will expect the first date cell to be at G9, and the first name cell to be at F10. If there are rows/cols between the dates/names and the data cells, you may indicate the precise location of the dates/names:
import <xlsx sheet='oil' cell='g10' datecell='g1' namecell='a10' collapse=m> highfreq.xlsx; |
In this particular case, the dates/names are located in the first row and column of the spreadsheet. If the timeseries run downwards in columns, you may use <cols> for transposed importing:
import <xlsx cols sheet='oil' collapse=m> highfreq.xlsx; |
Note: when using this functionality, you may 'collapse' for instance monthly data into its own frequency. In that case, using <method=count> should produce timeseries with value 1, indicating that there is only 1 observation for each month (otherwise something is wrong regarding the Excel sheet). Note also that dates in Excel are represented as the number of days since January 1, 1900. These dates may contain fractions, so 1 hour is represented by 1/24, etc. Keep this in mind if you are using <datecell=...>. If this points to a sequence of numbers that are not dates, these numbers may be erroneously interpreted as dates!
You may use a setup like the following to transfer timeseries with high precision from AREMOS (the AREMOS .tsd format has rather limited precision).
!In AREMOS dump1 #banker; |
This will produce the three files bank1.dmp, bank2.dmp, and bank3.dmp.
//In Gekko |
This will produce the three files bank1.gbk, bank2.gbk, and bank3.gbk. Only timeseries can be transferred like this.
To convert a supported non-gbk file into a .gbk file, just import it with IMPORT<tsd all> and WRITE it. Please note that a .tsd file operates with 8 significant digits (or less), so there will typically be a loss of precision compared to a .gbk file (which is in double-precision).
The option databank file copylocal is yes per default and copies the targeted file to a temporary file on the user's local hard disk before importing. This copying is typically very fast, and afterwards reading the temporary file is faster and more reliable, if the targeted file is for instance located on a network drive. In general, this is a recommended option that alleviates some potential network problems.
The option databank file cache = nongbk is recommended, too. It does the following: (a) it finds the data file, (b) it optionally copies it to the user's local hard disk, and (c) it calculates a MD5 hash code from the file. If this hash code is already known, Gekko uses a cache file representing the databank data, which in most cases is much faster to read (it is in protobuf format: the same format that is used for .gbk files). So there is a small speed penalty for calculating the hash, and a large speed gain for using the protobuf/cache file.
See the Gekko menu 'Options' --> 'Program dependency tracking' or use option global dependency tracking = ...; to activate dependency tracking, so that the use of external files (for instance program files, read/written databanks etc.) are shown as a list at the end of a Gekko session.
option databank file cache = all;
option databank file copylocal = yes;
option folder bank = '';
option folder bank1 = '';
option folder bank2 = '';
option gams exe folder = '';
option gams fast = yes;
option gams time detect auto = no;
option gams time freq = a;
option gams time offset = 0.0;
option gams time prefix = '';
option gams time set = t;
option gams trim = 0;
option interface csv decimalseparator = period;
option interface csv delimiter = semicolon;
option interface csv ignoremissing = no;
option interface csv ndec = 100;
option interface excel ignoremissing = no;
option interface excel language = danish;
option interface excel modernlook = yes;
option interface prn decimalseparator = period;
option interface prn delimiter = blank;
option interface prn ndec = 100;
option sheet engine = internal;
CLONE, COLLAPSE, COPY, DOWNLOAD, EXPORT, OPEN, READ, WRITE