Gekko User Guide > Data Management User Guide

Before we start out the data management user guide, a few general words on what "data management" is. You may skip this part of the user guide if you prefer, and move on to the section on databanks and variables.

So what is "data management"? Essentially, we may think of data as arriving in more or less raw (or at least less refined) form. Then the is processed, checked, adjusted, aggregated, and so on, refining it in the process. This can be done in many ways, among the most simple is to open up a spreadsheet and manage the data there, and among the most complicated is performing big data analytics on massive amounts of data.

Raw data will ofte be represented as "tables", where each row of the table represents an "observation", and where each column represents some characteristic (dimension) of the data. We might for instance envision columns representing sex, age, year and value regarding population data, where one row might represent for instance, sex = male, age = 40, year = 2020, value = 60.000, stating that in 2020, there were 60.000 40-year old males.

This "observation" could be represented as a variable pop, where we could assume that pop[male, 40, 2020] = 60.000. What Gekko does about data management is to assume that all (or at least most) variables are in the time dimension, so instead of pop[male, 40, 2020], Gekko would talk about the timeseries pop[male,40], which would be defined over some period, for instance 1980-2020. Such a series could be represented as a co-called array-timeseries in Gekko. But in practice such timeseries are often represented via naming conventions, providing simple names: for instance representing the timeseries as popmale40 or pop_male_40.

So what Gekko data management is about is essentially the "wrangling" of timeseries like popmale40 or pop_male_40 (or pop[male,40]), adjusting them, filling out holes, printing them, looking at graphs, etc. Gekko is timeseries-oriented, which among other things entails that timeseries are "first-class citizens" of Gekko, and a lot of functionality is built around that concept. For instance, you can typically omit stating the time dimension explicitly, and concepts like percentage growth rates, lags and leads, frequencies, etc., are an integral part of the Gekko language.

But to return to the concept of data "wrangling", Gekko data management is often about data wrangling of timeseries variables. The purpose of Gekko is that such timeseries wrangling should be relatively easy to do, without very complicated syntax or very complicated concepts. Granted, some parts of the Gekko data management capabilities are rather complicated, but a lot of it does not entail a very steep learning curve.

The following sections is an attempt to describe these not-too-complicated parts in a hopefully understandable manner.

Data managements basics

Data managements basics