Basic Usage#

This tutorial covers the basics of GETTSIM’s interface to get you started with the package. GETTSIM enables an almost complete simulation of the German taxes and transfers system which makes it interesting for both students and researchers. Its extensive coverage of taxes and social policies in Germany makes it a valuable educational tool to learn about the current and past German policy environment. Simultaneously, GETTSIM’s capabilities to process household data and compute according taxes and transfers make it a powerful tool that can be used for advanced microsimulations.

The interface consists of two central functions:

  1. set_up_policy_environment which loads a policy environment for a specified date.

  2. compute_taxes_and_transfers which allows you to compute taxes and transfers given a specified policy environment for household or individual observations.

The following sections give a brief introduction to these two functions using a minimal working example. The necessary packages and GETTSIM functions can be imported into your notebook as follows.

[1]:
import json

from gettsim import (
    compute_taxes_and_transfers,
    create_synthetic_data,
    set_up_policy_environment,
)

Loading Policies with set_up_policy_environment#

The function set_up_policy_environment allows you to load the policy environment in Germany for a given date. The function returns two objects:

  • policy_params which is a dictionary containing date-specific parameters for the policy environment.

  • policy_functions which is a dictionary containing functions that are necessary to compute quantities in the taxes and transfers system on the provided date and data.

Below, we load the policy environment for the year 2020. The exact date for this input will be January 1st, 2020. An exact date would be accepted as an input, too.

[2]:
policy_params, policy_functions = set_up_policy_environment(2020)

The two objects can be passed on to compute_taxes_and_transfers with a number of further inputs to compute outputs for a set of data. Both objects are Python dictionaries that hold information required to set up the policy environment for the specified date.

Policy Parameters

policy_params is a nested dictionary of parameters grouped by different policy types they capture. The output below shows the keys of the main dictionary. The names indicate the policy group.

[3]:
print(*policy_params.keys(), sep="\n")
eink_st
eink_st_abzuege
soli_st
arbeitsl_geld
sozialv_beitr
unterhalt
unterhaltsvors
abgelt_st
wohngeld
kinderzuschl
kindergeld
elterngeld
ges_rente
arbeitsl_geld_2
grunds_im_alter
lohn_st

These keys can be used to extract the exact parametrizations for a given policy group. The example below for instance shows the parameters that concern social insurance saved under the key sozialv_beitr.

[4]:
params_sozialv_beitr = policy_params["sozialv_beitr"]

# Print parameters in a nice way
print(json.dumps(params_sozialv_beitr, indent=4, default=str, ensure_ascii=False))
{
    "beitr_satz": {
        "ges_krankenv": {
            "allgemein": 0.146,
            "ermäßigt": 0.14,
            "mean_zusatzbeitrag": 0.011
        },
        "ges_pflegev": {
            "standard": 0.01525,
            "zusatz_kinderlos": 0.0025
        },
        "arbeitsl_v": 0.012,
        "ges_rentenv": 0.093
    },
    "beitr_bemess_grenze_m": {
        "ges_krankenv": {
            "west": 4687.5,
            "ost": 4687.5
        },
        "ges_rentenv": {
            "west": 6900,
            "ost": 6450
        }
    },
    "bezugsgröße_selbst_m": {
        "west": 3185,
        "ost": 3010
    },
    "mindestanteil_bezugsgröße_beitragspf_einnahme_selbst": 0.33333333,
    "geringfügige_eink_grenzen_m": {
        "minijob": {
            "west": 450,
            "ost": 450
        },
        "midijob": 1300
    },
    "ag_abgaben_geringf": {
        "ges_krankenv": 0.13,
        "ges_rentenv": 0.15,
        "st": 0.02
    },
    "ges_pflegev_zusatz_kinderlos_mindestalter": 23,
    "mindestlohn": 9.35,
    "datum": "2020-01-01",
    "rounding": {
        "midijob_faktor_f": {
            "base": 0.0001,
            "direction": "nearest"
        },
        "minijob_grenze": {
            "base": 1,
            "direction": "up"
        }
    }
}

Policy Functions

The dictionary policy_functions contains functions of policy reforms that correspond to the chosen date. The dictionary keys correspond to the variables they help compute for input data.

[5]:
print(*policy_functions.keys(), sep="\n")
_ges_rentenv_beitr_midijob_arbeitg_m
_ges_rentenv_beitr_midijob_arbeitn_m
_arbeitsl_v_beitr_midijob_arbeitg_m
_arbeitsl_v_beitr_midijob_arbeitn_m
_ges_pflegev_beitr_midijob_arbeitg_m
_ges_pflegev_beitr_midijob_arbeitn_m
midijob_bemessungsentgelt_m
midijob_faktor_f
minijob_grenze_ost
minijob_grenze
minijob_grenze_west
_ges_krankenv_beitr_midijob_arbeitg_m
_ges_krankenv_beitr_midijob_arbeitn_m
_ges_krankenv_beitr_satz_arbeitg
ges_krankenv_beitr_satz
ges_krankenv_zusatzbeitr_satz
kindergeld_anspruch
grunds_im_alter_ges_rente_m
wohngeld_eink_freib_m
wohngeld_eink_vor_freib_m
wohngeld_miete_m_hh
_ges_rente_altersgrenze_abschlagsfrei
_ges_rente_besond_langj_altersgrenze
ges_rente_m
ges_rente_vorauss_besond_langj
arbeitsl_geld_2_vermög_freib_hh
arbeitsl_geld_2_eink_anr_frei_m
arbeitsl_geld_2_kindersatz_m_hh
arbeitsl_geld_2_regelsatz_m_hh
arbeitsl_geld_2_kost_unterk_m_hh
_kinderzuschl_vor_vermög_check_m_tu
kinderzuschl_eink_regel_m_tu
vorsorgepauschale
eink_st_tu
sum_eink
vorsorgeaufw_tu
vorsorgeaufw_alter_tu
alleinerz_freib_tu
eink_st_altersfreib
eink_st_sonderausgaben_tu

Both parameters and policy functions are mutable, meaning that GETTSIM not only provides the actual policy environments in Germany for a large range of dates, but also supports changing policies. An extended tutorial on parameters can be found here and a tutorial on policy functions is provided here.

Specifying the Date#

Dates can be specified in various ways. The function set_up_policy_environment accepts objects of type str, int, and datetime as inputs to specify a date. If only a year is specified, the policy date will be set to the first day of the year i.e. the inputs "2020" and 2020 will both return the policy environment for January 1st, 2020. The input "2020/03" on the other hand will set up the policy environment for March 1st, 2020 since a month and year are specified. Lastly, it is also possible to use a specific day such as "2020/03/21", which will return the policy environment for March 21st, 2020.

Computing Outputs with compute_taxes_and_transfers#

The policy environment specified by policy_params and policy_functions can then be applied to simulated or empirical data to compute taxes and transfers for individuals, tax units, and households. This is done via the function compute_taxes_and_transfers. The function requires input data and some further arguments to be specified.

Data Requirements, Input Columns, and Targets#

The data has to fulfill certain requirements in order for GETTSIM to be able to process it properly. Specifically, GETTSIM requires data to be specified as a pandas.DataFrame with columns marking different input variables. GETTSIM parses the column names, which means that the data columns must be named in a specific way for GETTSIM to recognize them as input variables.

There are a total of 45 input names that GETTSIM recognizes, as especially transfers in the German system depend on many different variables. There is a detailed list of them here. The information specified in these inputs can be used to compute taxes and transfers for the selected household, tax unit, or individual data. The required inputs depend on the desired outputs i.e. the data set does not necessarily have to contain all 45 input variables. A small example is illustrated below.

Exemplary Data Set#

For exemplary purposes, we now create a data set which consists of a household of two parents and one child using the create_synthetic_data helper function.

[6]:
data = create_synthetic_data(
    n_adults=2,
    n_children=1,
    specs_constant_over_households={"bruttolohn_m": [2000.0, 1000.0, 0.0]},
)
# Transpose data for better readability
data.T
[6]:
0 1 2
p_id 0 1 2
hh_id 0 0 0
tu_id 0 0 0
hh_typ couple_1_children couple_1_children couple_1_children
hat_kinder True True False
... ... ... ...
m_durchg_alg1_bezug 0.0 0.0 0.0
sozialv_pflicht_5j 0.0 0.0 0.0
kind_unterh_anspr_m 0.0 0.0 0.0
kind_unterh_erhalt_m 0.0 0.0 0.0
steuerklasse 0 0 0

66 rows × 3 columns

This minimal example illustrates some of the naming conventions of input columns:

  • There are three identifiers: p_id identifies a person, tu_id a tax unit, and hh_id a household. In this case, we data consists of one household.

  • _m means, that the variable labeled with this suffix is a monthly variable. Variables without this suffix are always on a yearly basis.

  • _tu and _hh mean, that the variable is to be interpreted on tax unit level, or on household level, respectively. Variables without these suffixes are always on individual level.

Defining Targets#

We first have to select targets i.e. output variables that should be computed for our exemplary observation. In this case we select the four types of social insurance contributions the individuals will have to pay based on their specified information.

[7]:
# Create list of target variables.
targets = [
    "ges_krankenv_beitr_m",
    "ges_rentenv_beitr_m",
    "arbeitsl_v_beitr_m",
    "ges_pflegev_beitr_m",
]

Applying compute_taxes_and_transfers to Calculate Outputs#

Given the information specified above, we now use compute_taxes_and_transfers to compute the variables of interest given by targets for our data in the selected policy environment given by policy_params and policy_functions. The function returns a pandas.DataFrame where the columns contain the target variables.

[8]:
result = compute_taxes_and_transfers(
    data=data,
    functions=policy_functions,
    params=policy_params,
    targets=targets,
)
result.round(2)
[8]:
arbeitsl_v_beitr_m ges_krankenv_beitr_m ges_pflegev_beitr_m ges_rentenv_beitr_m
0 24.00 157.00 30.50 186.00
1 11.06 72.38 14.06 85.75
2 0.00 0.00 0.00 0.00

Lastly, we can join the results with the input data to save everything in a single pandas.DataFrame.

[9]:
data.join(result).T
[9]:
0 1 2
p_id 0 1 2
hh_id 0 0 0
tu_id 0 0 0
hh_typ couple_1_children couple_1_children couple_1_children
hat_kinder True True False
... ... ... ...
steuerklasse 0 0 0
arbeitsl_v_beitr_m 24.0 11.064974 0.0
ges_krankenv_beitr_m 157.0 72.383372 0.0
ges_pflegev_beitr_m 30.5 14.061738 0.0
ges_rentenv_beitr_m 186.0 85.753549 0.0

70 rows × 3 columns