Basic Usage#

This tutorial covers the basics of GETTSIM’s interface to get you started with the package. GETTSIM enables an almost complete simulation of the German taxes and transfers system which makes it interesting for both students and researchers. Its extensive coverage of taxes and social policies in Germany makes it a valuable educational tool to learn about the current and past German policy environment. Simultaneously, GETTSIM’s capabilities to process household data and compute according taxes and transfers make it a powerful tool that can be used for advanced microsimulations.

The interface consists of two central functions:

  1. set_up_policy_environment which loads a policy environment for a specified date.

  2. compute_taxes_and_transfers which allows you to compute taxes and transfers given a specified policy environment for household or individual observations.

The following sections give a brief introduction to these two functions using a minimal working example. The necessary packages and GETTSIM functions can be imported into your notebook as follows.

[1]:
import json

from gettsim import (
    compute_taxes_and_transfers,
    create_synthetic_data,
    set_up_policy_environment,
)

Loading Policies with set_up_policy_environment#

The function set_up_policy_environment allows you to load the policy environment in Germany for a given date. The function returns two objects:

  • policy_params which is a dictionary containing date-specific parameters for the policy environment.

  • policy_functions which is a dictionary containing functions that are necessary to compute quantities in the taxes and transfers system on the provided date and data.

Below, we load the policy environment for the year 2020. The exact date for this input will be January 1st, 2020. An exact date would be accepted as an input, too.

[2]:
policy_params, policy_functions = set_up_policy_environment(2020)

The two objects can be passed on to compute_taxes_and_transfers with a number of further inputs to compute outputs for a set of data. Both objects are Python dictionaries that hold information required to set up the policy environment for the specified date.

Policy Parameters

policy_params is a nested dictionary of parameters grouped by different policy types they capture. The output below shows the keys of the main dictionary. The names indicate the policy group.

[3]:
print(*policy_params.keys(), sep="\n")
eink_st
eink_st_abzuege
soli_st
arbeitsl_geld
sozialv_beitr
unterhalt
unterhaltsvors
abgelt_st
wohngeld
kinderzuschl
kinderzuschl_eink
kindergeld
elterngeld
ges_rente
erwerbsm_rente
arbeitsl_geld_2
grunds_im_alter
lohnst
erziehungsgeld

These keys can be used to extract the exact parametrizations for a given policy group. The example below for instance shows the parameters that concern social insurance saved under the key sozialv_beitr.

[4]:
params_sozialv_beitr = policy_params["sozialv_beitr"]

# Print parameters in a nice way
print(json.dumps(params_sozialv_beitr, indent=4, default=str, ensure_ascii=False))
{
    "beitr_satz": {
        "ges_krankenv": {
            "allgemein": 0.146,
            "ermäßigt": 0.14,
            "mean_zusatzbeitrag": 0.011
        },
        "ges_pflegev": {
            "standard": 0.01525,
            "zusatz_kinderlos": 0.0025
        },
        "arbeitsl_v": 0.012,
        "ges_rentenv": 0.093
    },
    "beitr_satz_jahresanfang": {
        "ges_krankenv": {
            "allgemein": 0.146,
            "ermäßigt": 0.14,
            "mean_zusatzbeitrag": 0.011
        },
        "ges_pflegev": {
            "standard": 0.01525,
            "zusatz_kinderlos": 0.0025
        },
        "arbeitsl_v": 0.012,
        "ges_rentenv": 0.093
    },
    "beitr_bemess_grenze_m": {
        "ges_krankenv": {
            "west": 4687.5,
            "ost": 4687.5
        },
        "ges_rentenv": {
            "west": 6900,
            "ost": 6450
        }
    },
    "bezugsgröße_selbst_m": {
        "west": 3185,
        "ost": 3010
    },
    "mindestanteil_bezugsgröße_beitragspf_einnahme_selbst": 0.33333333,
    "geringfügige_eink_grenzen_m": {
        "minijob": 450,
        "midijob": 1300
    },
    "ag_abgaben_geringf": {
        "ges_krankenv": 0.13,
        "ges_rentenv": 0.15,
        "st": 0.02
    },
    "ag_abgaben_geringf_jahresanfang": {
        "ges_krankenv": 0.13,
        "ges_rentenv": 0.15,
        "st": 0.02
    },
    "ges_pflegev_zusatz_kinderlos_mindestalter": 23,
    "mindestlohn": 9.35,
    "datum": "2020-01-01",
    "rounding": {
        "midijob_faktor_f": {
            "base": 0.0001,
            "direction": "nearest"
        },
        "minijob_grenze": {
            "base": 1,
            "direction": "up"
        }
    }
}

Policy Functions

The dictionary policy_functions contains functions of policy reforms that correspond to the chosen date. The dictionary keys correspond to the variables they help compute for input data.

[5]:
print(*policy_functions.keys(), sep="\n")
_ges_krankenv_beitr_bemess_grenze_m
_ges_krankenv_bezugsgröße_selbst_m
_ges_rentenv_beitr_bemess_grenze_m
_ges_pflegev_beitr_midijob_arbeitg_m
_ges_pflegev_beitr_midijob_arbeitn_m
_ges_pflegev_beitr_midijob_sum_arbeitn_arbeitg_m
_ges_pflegev_beitr_reg_beschäftigt_m
ges_pflegev_beitr_arbeitg_m
ges_pflegev_beitr_m
ges_pflegev_beitr_rente_m
ges_pflegev_beitr_satz
ges_pflegev_beitr_selbst_m
ges_pflegev_zusatz_kinderlos
_midijob_beitragspfl_einnahme_arbeitn_m
geringfügig_beschäftigt
in_gleitzone
midijob_bemessungsentgelt_m
midijob_faktor_f
minijob_grenze
regulär_beschäftigt
_ges_rentenv_beitr_bruttolohn_m
_ges_rentenv_beitr_midijob_arbeitg_m
_ges_rentenv_beitr_midijob_arbeitn_m
_ges_rentenv_beitr_midijob_sum_arbeitn_arbeitg_m
ges_rentenv_beitr_arbeitg_m
ges_rentenv_beitr_m
_ges_krankenv_beitr_midijob_arbeitg_m
_ges_krankenv_beitr_midijob_arbeitn_m
_ges_krankenv_beitr_midijob_sum_arbeitn_arbeitg_m
_ges_krankenv_beitr_reg_beschäftigt_m
_ges_krankenv_beitr_satz_arbeitg
_ges_krankenv_beitr_satz_arbeitg_jahresanfang
_ges_krankenv_bemessungsgrundlage_eink_selbst
_ges_krankenv_bemessungsgrundlage_rente_m
_ges_krankenv_bruttolohn_m
_ges_krankenv_bruttolohn_reg_beschäftigt_m
ges_krankenv_beitr_arbeitg_m
ges_krankenv_beitr_m
ges_krankenv_beitr_rente_m
ges_krankenv_beitr_satz
_ges_krankenv_beitr_satz_jahresanfang
ges_krankenv_beitr_selbst_m
ges_krankenv_zusatzbeitr_satz
_arbeitsl_v_beitr_midijob_arbeitg_m
_arbeitsl_v_beitr_midijob_arbeitn_m
_arbeitsl_v_beitr_midijob_sum_arbeitn_arbeitg_m
_sozialv_beitr_arbeitn_arbeitg_m
arbeitsl_v_beitr_arbeitg_m
arbeitsl_v_beitr_m
sozialv_beitr_arbeitg_m
sozialv_beitr_m
kinderbonus_m
kind_bis_10_mit_kindergeld
kindergeld_anspruch
kindergeld_m
arbeitsl_geld_berechtigt
arbeitsl_geld_eink_vorj_proxy_m
arbeitsl_geld_m
arbeitsl_geld_restl_anspruchsd
_erwerbsm_rente_langj_versicherte_wartezeit
entgeltp_ost_erwerbsm_rente
entgeltp_west_erwerbsm_rente
entgeltp_zurechnungszeit
erwerbsm_rente_m
erwerbsm_rente_zugangsfaktor
ges_rente_vorauss_erwerbsm
rentenartfaktor
_grunds_im_alter_kapitaleink_brutto_m
_grunds_im_alter_mehrbedarf_schwerbeh_g_m
grunds_im_alter_eink_m
grunds_im_alter_erwerbseink_m
grunds_im_alter_ges_rente_m
grunds_im_alter_m_eg
grunds_im_alter_priv_rente_m
grunds_im_alter_vermög_freib_eg
_kindergeld_erstes_kind_m
_unterhaltsvors_anspruch_kind_m
_unterhaltsvorschuss_eink_above_income_threshold
_unterhaltsvorschuss_empf_eink_above_income_threshold
parent_alleinerz
unterhaltsvors_m
unterhaltsvorschuss_eink_m
wohngeld_abzüge_st_sozialv_m
wohngeld_arbeitendes_kind
wohngeld_eink_freib_m
wohngeld_eink_m_hh
wohngeld_eink_vor_freib_m
wohngeld_m_hh
wohngeld_miete_m_hh
wohngeld_min_miete_m_hh
wohngeld_vor_vermög_check_m_hh
kind_unterh_zahlbetr_m
_ges_rente_altersgrenze_abschlagsfrei
_ges_rente_altersgrenze_vorzeitig
_ges_rente_arbeitsl_altersgrenze_ohne_vertrauensschutzprüfung
_ges_rente_arbeitsl_vorzeitig_ohne_vertrauenss
_ges_rente_besond_langj_altersgrenze
_ges_rente_langj_altersgrenze
referenzalter_abschlag
age_of_retirement
anteil_entgeltp_ost
durchschn_entgeltp
entgeltp_ost_update
entgeltp_update_lohn
entgeltp_west_update
ges_rente_anrechnungszeit
ges_rente_anrechnungszeit_45
ges_rente_frauen_altersgrenze
ges_rente_m
ges_rente_regelaltersgrenze
ges_rente_vor_grundr_m
ges_rente_vorauss_besond_langj
ges_rente_vorauss_langj
ges_rente_vorauss_regelrente
ges_rente_vorauss_vorzeitig
ges_rente_wartezeit_15
ges_rente_wartezeit_35
ges_rente_wartezeit_45
ges_rente_wartezeit_5
ges_rente_zugangsfaktor
rentenwert
sum_ges_rente_priv_rente_m
_grundr_zuschlag_eink_vor_freibetrag_m
grundr_berechtigt
grundr_bew_zeiten_avg_entgeltp
grundr_zuschlag_bonus_entgeltp
grundr_zuschlag_eink_m
grundr_zuschlag_höchstwert_m
grundr_zuschlag_m
grundr_zuschlag_vor_eink_anr_m
rente_vorj_vor_grundr_proxy_m
_elterngeld_anz_mehrlinge_anspruch
_elterngeld_proxy_eink_vorj_elterngeld_m
elterngeld_anr_m
elterngeld_anteil_eink_erlass
elterngeld_eink_erlass_m
elterngeld_eink_relev_m
elterngeld_geschw_bonus_anspruch
elterngeld_geschw_bonus_m
elterngeld_kind
elterngeld_m
elterngeld_mehrlinge_bonus_m
elterngeld_nettolohn_m
elterngeld_vorschulkind
elternzeit_anspruch
kinderzuschl_vorrang_bg
wohngeld_kinderzuschl_vorrang_hh
wohngeld_vorrang_hh
_arbeitsl_geld_2_grundfreib_vermög
_arbeitsl_geld_2_max_grundfreib_vermög
_kinderzuschl_nach_vermög_check_m_bg
arbeitsl_geld_2_vermög_freib_bg
kinderzuschl_vermög_freib_bg
wohngeld_nach_vermög_check_m_hh
_anteil_personen_in_haushalt_bg
_kinderzuschl_wohnbedarf_eltern_anteil_bg
kinderzuschl_kost_unterk_m_bg
kinderzuschl_bruttoeink_eltern_m
kinderzuschl_eink_anrechn_m_bg
kinderzuschl_eink_eltern_m
kinderzuschl_eink_min_m_bg
kinderzuschl_eink_regel_m_bg
kinderzuschl_eink_relev_m_bg
kinderzuschl_kindereink_abzug_m
_kinderzuschl_vor_vermög_check_m_bg
kinderzuschl_m_bg
_arbeitsl_geld_2_berechtigte_wohnfläche_bg
_arbeitsl_geld_2_warmmiete_pro_qm_m_bg
arbeitsl_geld_2_kost_unterk_m_bg
bruttokaltmiete_m_bg
heizkosten_m_bg
wohnfläche_bg
_diff_kindergeld_kindbedarf_m
_mean_kindergeld_per_child_m
kindergeld_zur_bedarfsdeckung_m
_arbeitsl_geld_2_eink_ohne_kindergeldübertrag_m
_arbeitsl_geld_2_nettoeink_ohne_transfers_m
arbeitsl_geld_2_bruttoeink_m
arbeitsl_geld_2_eink_anr_frei_m
arbeitsl_geld_2_eink_m
_arbeitsl_geld_2_alleinerz_mehrbedarf_m_bg
arbeitsl_geld_2_kindersatz_m_bg
arbeitsl_geld_2_m_bg
arbeitsl_geld_2_regelbedarf_m_bg
arbeitsl_geld_2_regelsatz_m_bg
arbeitsl_geld_2_vor_vorrang_m_bg
_soli_st_tarif
soli_st_lohnst_m
soli_st_y_sn
_eink_st_tarif
eink_st_mit_kinderfreib_y_sn
eink_st_ohne_kinderfreib_y_sn
eink_st_rel_kindergeld_m
eink_st_y_sn
kinderfreib_günstiger_sn
abgelt_st_y_sn
zu_verst_kapitaleink_y_sn
_lohnst_m
_lohnsteuer_klasse5_6_basis_y
kinderfreib_für_soli_st_lohnst_y
lohnst_eink_y
lohnst_m
lohnst_mit_kinderfreib_m
vorsorge_krankenv_option_a
vorsorge_krankenv_option_b
vorsorgepauschale_y
vorsorgeaufw_y_sn
vorsorgeaufw_alter_y_sn
vorsorgeaufw_y_sn_ab_2020
_eink_st_behinderungsgrad_pauschbetrag_y
_eink_st_kinderfreib_anz_ansprüche
eink_st_abz_betreuungskost_y
alleinerz_freib_y_sn
eink_st_altersfreib_y
eink_st_kinderfreib_y
eink_st_sonderausgaben_y_sn
p_id_kinderfreib_empfänger_1
p_id_kinderfreib_empfänger_2
sonderausgaben_betreuung_y_sn
_zu_verst_eink_mit_kinderfreib_y_sn
_zu_verst_eink_ohne_kinderfreib_y_sn
freibeträge_ind_y
freibeträge_y_sn
zu_verst_eink_y_sn
_zu_verst_eink_abhängig_beschäftigt_y
eink_abhängig_beschäftigt_y
eink_rente_zu_verst_m
eink_rente_zu_verst_y
eink_selbst_y
eink_vermietung_y
kapitaleink_brutto_y
kapitaleink_y
rente_ertragsanteil
sum_eink_y
_add_grouping_suffixes_to_keys
alter_monate
birthdate_decimal
erwachsen
erwachsene_alle_rentner_hh
geburtsdatum
jüngstes_kind_oder_mehrling
kind_ab_14_bis_17
kind_ab_14_bis_24
kind_ab_18_bis_24
kind_ab_6_bis_13
kind_bis_15
kind_bis_17
kind_bis_5
kind_bis_6

Both parameters and policy functions are mutable, meaning that GETTSIM not only provides the actual policy environments in Germany for a large range of dates, but also supports changing policies. An extended tutorial on parameters can be found here and a tutorial on policy functions is provided here.

Specifying the Date#

Dates can be specified in various ways. The function set_up_policy_environment accepts objects of type str, int, and datetime as inputs to specify a date. If only a year is specified, the policy date will be set to the first day of the year i.e. the inputs "2020" and 2020 will both return the policy environment for January 1st, 2020. The input "2020/03" on the other hand will set up the policy environment for March 1st, 2020 since a month and year are specified. Lastly, it is also possible to use a specific day such as "2020/03/21", which will return the policy environment for March 21st, 2020.

Computing Outputs with compute_taxes_and_transfers#

The policy environment specified by policy_params and policy_functions can then be applied to simulated or empirical data to compute taxes and transfers for individuals, households and other groups defined in the German Tax and Transfer law. This is done via the function compute_taxes_and_transfers. The function requires input data and some further arguments to be specified.

Data Requirements, Input Columns, and Targets#

The data has to fulfill certain requirements in order for GETTSIM to be able to process it properly. Specifically, GETTSIM requires data to be specified as a pandas.DataFrame with columns marking different input variables. GETTSIM parses the column names, which means that the data columns must be named in a specific way for GETTSIM to recognize them as input variables.

There are more than 50 input names that GETTSIM recognizes, as especially transfers in the German system depend on many different variables. There is a detailed list of them here. The information specified in these inputs can be used to compute taxes and transfers for the selected household, or individual data. The required inputs depend on the desired outputs i.e. the data set does not necessarily have to contain all 45 input variables. A small example is illustrated below.

Exemplary Data Set#

For exemplary purposes, we now create a data set which consists of a household of two parents and one child using the create_synthetic_data helper function.

[6]:
data = create_synthetic_data(
    n_adults=2,
    n_children=1,
    specs_constant_over_households={"bruttolohn_m": [2000.0, 1000.0, 0.0]},
)
# Transpose data for better readability
data.T
[6]:
0 1 2
p_id 0 1 2
hh_id 0 0 0
hh_typ couple_1_children couple_1_children couple_1_children
hat_kinder True True False
alleinerz False False False
... ... ... ...
kind_unterh_erhalt_m 0.0 0.0 0.0
steuerklasse 0 0 0
budgetsatz_erzieh False False False
voll_erwerbsgemind False False False
teilw_erwerbsgemind False False False

84 rows × 3 columns

This minimal example illustrates some of the naming conventions of input columns:

  • There are three identifiers: p_id identifies a person, and hh_id a household. In this case, we data consists of one household.

  • _m means, that the variable labeled with this suffix is a monthly variable. Variables without this suffix are always on a yearly basis.

  • _hh means, that the variable is to be interpreted on household level. Other suffixes used for grouping are sn, bg, eg, fg and ehe. Variables without suffixes are always on individual level.

Defining Targets#

We first have to select targets i.e. output variables that should be computed for our exemplary observation. In this case we select the four types of social insurance contributions the individuals will have to pay based on their specified information.

[7]:
# Create list of target variables.
targets = [
    "ges_krankenv_beitr_m",
    "ges_rentenv_beitr_m",
    "arbeitsl_v_beitr_m",
    "ges_pflegev_beitr_m",
]

Applying compute_taxes_and_transfers to Calculate Outputs#

Given the information specified above, we now use compute_taxes_and_transfers to compute the variables of interest given by targets for our data in the selected policy environment given by policy_params and policy_functions. The function returns a pandas.DataFrame where the columns contain the target variables.

[8]:
result = compute_taxes_and_transfers(
    data=data,
    functions=policy_functions,
    params=policy_params,
    targets=targets,
)
result.round(2)
[8]:
arbeitsl_v_beitr_m ges_krankenv_beitr_m ges_pflegev_beitr_m ges_rentenv_beitr_m
0 24.00 157.00 30.50 186.00
1 11.06 72.38 14.06 85.75
2 0.00 0.00 0.00 0.00

Lastly, we can join the results with the input data to save everything in a single pandas.DataFrame.

[9]:
data.join(result).T
[9]:
0 1 2
p_id 0 1 2
hh_id 0 0 0
hh_typ couple_1_children couple_1_children couple_1_children
hat_kinder True True False
alleinerz False False False
... ... ... ...
teilw_erwerbsgemind False False False
arbeitsl_v_beitr_m 24.0 11.064974 0.0
ges_krankenv_beitr_m 157.0 72.383372 0.0
ges_pflegev_beitr_m 30.5 14.061738 0.0
ges_rentenv_beitr_m 186.0 85.753549 0.0

88 rows × 3 columns