Advanced Usage#

This tutorial showcases advanced functionalities and applications of GETTSIM’s interface. For an introductory tutorial see here. The introductory tutorial showcases GETTSIM’s two main functions using a minimal working example:

  1. set_up_policy_environment which loads a policy environment for a specified date.

  2. compute_taxes_and_transfers which allows you to compute taxes and transfers given a specified policy environment for household or individual observations.

This tutorial dives deeper into the GETTSIM interface to acquaintance you with further useful functionalities. Specifically, this tutorial shows how to navigate the numerous input and target variables that the package supports as well as how GETTSIM processes them internally using the example of child benefits in the German taxes and transfers system.

[1]:
import numpy as np
import pandas as pd
import plotly.express as px
from gettsim import (
    compute_taxes_and_transfers,
    create_synthetic_data,
    plot_dag,
    set_up_policy_environment,
)

Example: Kindergeld (Child Benefits)#

For this tutorial, we will focus on Kindergeld, which is a child benefit that can be claimed by parents in Germany. Kindergeld can be claimed in different ways and eligibility for families to receive it depends on various variables. For instance, Kindergeld can be claimed as a monthly payment but also as a tax credit (Kinderfreibetrag) which is more advantageous for higher income groups. Additionally, eligibility depends on factors like the age and work status of children. These factors make it a more complex feature of the German taxes and transfers system than one might initially believe.

In the following, we will inspect in detail how the German Kindergeld is implemented in GETTSIM to showcase further functionalities of the package. To start off, we load a policy environment to work with.

[2]:
policy_params, policy_functions = set_up_policy_environment("2020")
[3]:
policy_params["wohngeld"]
[3]:
{'faktor_berechnungsformel': 1.15,
 'koeffizienten_berechnungsformel': {1: {'a': 0.04,
   'b': 0.00058,
   'c': 0.000118},
  2: {'a': 0.03, 'b': 0.000405, 'c': 8.8e-05},
  3: {'a': 0.02, 'b': 0.00035, 'c': 7.09e-05},
  4: {'a': 0.01, 'b': 0.000313, 'c': 3.68e-05},
  5: {'a': 0, 'b': 0.000276, 'c': 3.59e-05},
  6: {'a': -0.01, 'b': 0.000258, 'c': 3.08e-05},
  7: {'a': -0.02, 'b': 0.000239, 'c': 3.16e-05},
  8: {'a': -0.03, 'b': 0.000212, 'c': 3.16e-05},
  9: {'a': -0.04, 'b': 0.000184, 'c': 3.33e-05},
  10: {'a': -0.06, 'b': 0.000147, 'c': 3.85e-05},
  11: {'a': -0.1, 'b': 0.00011, 'c': 4.53e-05},
  12: {'a': -0.14, 'b': 0.000101, 'c': 5.13e-05}},
 'haushaltsgröße_hhn': {1: 5, 2: 12},
 'bonus_sehr_große_haushalte': {'max_anz_personen_normale_berechnung': 12,
  'bonus_jede_weitere_person': 51},
 'abzug_stufen': {0: 0.0, 1: 0.1, 2: 0.2, 3: 0.3},
 'min_miete': {1: 52,
  2: 64,
  3: 76,
  4: 88,
  5: 99,
  6: 99,
  7: 111,
  8: 123,
  9: 135,
  10: 146,
  11: 180,
  12: 286},
 'min_eink': {1: 275,
  2: 357,
  3: 414,
  4: 447,
  5: 532,
  6: 618,
  7: 702,
  8: 787,
  9: 872,
  10: 957,
  11: 1248,
  12: 1443},
 'freib_kinder_m': {'alleinerz': 110, 'arbeitendes_kind': 100},
 'freib_behinderung': 1800,
 'behinderungsgrad': {1: 0, 2: 80},
 'max_miete': {1: {1: 338, 2: 381, 3: 426, 4: 478, 5: 525, 6: 575, 7: 633},
  2: {1: 409, 2: 461, 3: 516, 4: 579, 5: 636, 6: 697, 7: 767},
  3: {1: 487, 2: 549, 3: 614, 4: 689, 5: 757, 6: 830, 7: 912},
  4: {1: 568, 2: 641, 3: 716, 4: 803, 5: 884, 6: 968, 7: 1065},
  5: {1: 649, 2: 732, 3: 818, 4: 918, 5: 1010, 6: 1106, 7: 1217},
  'jede_weitere_person': {1: 77,
   2: 88,
   3: 99,
   4: 111,
   5: 121,
   6: 139,
   7: 153}},
 'vermögensgrundfreibetrag': 60000,
 'vermögensfreibetrag_pers': 30000,
 'datum': numpy.datetime64('2020-01-01'),
 'rounding': {'wohngeld_vor_vermög_check_m_hh': {'base': 1,
   'direction': 'nearest'}}}

The according policy parameters are saved under the key kindergeld.

[4]:
policy_params["kindergeld"]
[4]:
{'altersgrenze': {'mit_bedingungen': 25, 'ohne_bedingungen': 18},
 'kindergeld': {1: 204, 2: 204, 3: 210, 4: 235},
 'einkommensgrenze': 8004,
 'stundengrenze': 20,
 'kinderbonus': 300,
 'datum': numpy.datetime64('2020-01-01')}

DAG Plots for Visualization of the Taxes and Transfers System#

To get a better picture of how Kindergeld is implemented in GETTSIM and, meanwhile, of the structure of the German taxes and transfers system, we can utilize GETTSIM’s visualization capabilities which are concentrated in the function plot_dag. This function creates a directed acyclic graph (DAG) for the taxes and transfers system. It offers many different visualization possibilities. The guide on visualizing the taxes and transfers system gives an in depth explanation of the function.

To figure out which variables are relevant for the child benefit, we plot an according slice of the entire taxes and transfers system implemented in GETTSIM using plot_dag. The function was already imported with all other relevant packages at the beginning of this tutorial. To select the relevant plot, we have to define selectors that we can pass as arguments to the function. We can check the possible output variables here to find the relevant variable name for our application.

[5]:
selectors = {"type": "ancestors", "node": "kindergeld_m"}

Since we are interested in the child benefits, we select the node kindergeld_m and plot its ancestors, which are all the nodes kindergeld_m directly or indirectly depends on. As the plot below shows, the variable depends on many other nodes and generates a very large DAG. Clicking on a node links to the according function or variable.

[6]:
plot_dag(functions=policy_functions, selectors=selectors).show()

An alternative way to inspect the variable is by looking at its neighbors in the DAG. This depiction shows the related variables and functions up to two nodes away from kindergeld_m. It reveals descendants of kindergeld_m: kindergeld_m_tu and kindergeld_m_hh. These variables contain the child benefits on tax unit and household level respectively.

[7]:
selectors = {"type": "neighbors", "node": "kindergeld_m", "order": 2}
plot_dag(functions=policy_functions, selectors=selectors).show()

Computing Variables of Interest#

Once we have inspected the DAG, we now have an impression of the various input variables and functions that influence our variable of interest. As a next step, we will load a set of simulated household data and inspect how we can compute the Kindergeld using compute_taxes_and_transfers and use the function’s features and error messages to aid us in this process.

Simulated Data#

We simulate a dataset using create_synthetic_data. We can easily specify a few variables while all other necessary input variabels will be filled with defaults.

The specification chosen here creates a set of households with two adults and two children. The households vary in the variable bruttolohn_m and are otherwise identical.

[8]:
data = create_synthetic_data(
    n_adults=2,
    n_children=2,
    specs_heterogeneous={
        "bruttolohn_m": [[i, 0, 0, 0] for i in np.linspace(1000, 8000, 701)]
    },
)
[9]:
data[["hh_id", "hh_typ", "alter", "kind", "bruttolohn_m"]]
[9]:
hh_id hh_typ alter kind bruttolohn_m
0 0 couple_2_children 35 False 1000.0
1 0 couple_2_children 35 False 0.0
2 0 couple_2_children 8 True 0.0
3 0 couple_2_children 3 True 0.0
4 1 couple_2_children 35 False 1010.0
... ... ... ... ... ...
2799 699 couple_2_children 3 True 0.0
2800 700 couple_2_children 35 False 8000.0
2801 700 couple_2_children 35 False 0.0
2802 700 couple_2_children 8 True 0.0
2803 700 couple_2_children 3 True 0.0

2804 rows × 5 columns

Adults’ monthly gross earnings range between €1,000 and €8,000. It is captured in the variable bruttolohn_m. We can use the pandas function pandas.DataFrame.describe to assess the variable in detail.

[10]:
data["bruttolohn_m"].describe()
[10]:
count    2804.000000
mean     1125.000000
std      2195.983791
min         0.000000
25%         0.000000
50%         0.000000
75%       250.000000
max      8000.000000
Name: bruttolohn_m, dtype: float64

The columns contain all the input variables needed to compute kindergeld_m.

[11]:
data.columns
[11]:
Index(['p_id', 'hh_id', 'tu_id', 'hh_typ', 'hat_kinder', 'alleinerz',
       'weiblich', 'alter', 'kind', 'in_ausbildung', 'bruttolohn_m',
       'bürgerg_bezug_vorj', 'vermögen_bedürft_hh', 'selbstständig',
       'wohnort_ost', 'eink_selbst_m', 'in_priv_krankenv',
       'priv_rentenv_beitr_m', 'bruttolohn_vorj_m', 'arbeitsstunden_w',
       'geburtsjahr', 'geburtstag', 'geburtsmonat', 'mietstufe', 'entgeltp',
       'rentner', 'betreuungskost_m', 'kapitaleink_brutto_m',
       'eink_vermietung_m', 'bruttokaltmiete_m_hh', 'heizkosten_m_hh',
       'jahr_renteneintr', 'behinderungsgrad', 'wohnfläche_hh', 'm_elterngeld',
       'm_elterngeld_vat_hh', 'm_elterngeld_mut_hh', 'bewohnt_eigentum_hh',
       'immobilie_baujahr_hh', 'sonstig_eink_m', 'grundr_entgeltp',
       'grundr_zeiten', 'grundr_bew_zeiten', 'priv_rente_m', 'schwerbeh_g',
       'm_pflichtbeitrag', 'm_freiw_beitrag', 'm_mutterschutz',
       'm_arbeitsunfähig', 'm_krank_ab_16_bis_24', 'm_arbeitslos',
       'm_ausbild_suche', 'm_schul_ausbild', 'm_geringf_beschäft',
       'm_alg1_übergang', 'm_ersatzzeit', 'm_kind_berücks_zeit',
       'm_pfleg_berücks_zeit', 'y_pflichtbeitr_ab_40', 'anwartschaftszeit',
       'arbeitssuchend', 'm_durchg_alg1_bezug', 'sozialv_pflicht_5j',
       'kind_unterh_anspr_m', 'kind_unterh_erhalt_m', 'steuerklasse'],
      dtype='object')

Using Errors and Warnings#

As the DAG and column list above show, a large number of inputs is required to compute child benefits for a family. While the DAG is very useful to understand the structure within GETTSIM behind a variable or function, it might be difficult to infer which inputs exactly are needed in the data to compute a desired output. The function compute_taxes_and_transfers thus directly provides multiple mechanisms that help you identify the required input variables to compute certain taxes and transfers.

As shown in the basic usage tutorial, the function requires data, one or multiple targets, and policy_params as well as policy_functions to compute taxes and transfers for a given policy environment.

Since our data set includes all required input columns already, the function does so without problems.

[12]:
result = compute_taxes_and_transfers(
    data=data, params=policy_params, targets="kindergeld_m", functions=policy_functions
)
result.head(3)
[12]:
kindergeld_m
0 0.0
1 0.0
2 204.0

Error Messages: Missing Inputs#

However, if we have failed to add a required column, the function throws an error with a message that specifies which columns are missing. For example, the variable arbeitsstunden_w holds information on weekly working hours and is required to compute child benefits. Dropping it from the data triggers the error.

[13]:
incomplete_data = data.drop("arbeitsstunden_w", axis=1)
result = compute_taxes_and_transfers(
    data=incomplete_data,
    params=policy_params,
    targets="kindergeld_m",
    functions=policy_functions,
)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[13], line 2
      1 incomplete_data = data.drop("arbeitsstunden_w", axis=1)
----> 2 result = compute_taxes_and_transfers(
      3     data=incomplete_data,
      4     params=policy_params,
      5     targets="kindergeld_m",
      6     functions=policy_functions,
      7 )

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:118, in compute_taxes_and_transfers(data, params, functions, aggregation_specs, targets, columns_overriding_functions, check_minimal_specification, rounding, debug)
    113 processed_functions = _round_and_partial_parameters_to_functions(
    114     necessary_functions, params, rounding
    115 )
    117 # Create input data.
--> 118 input_data = _create_input_data(
    119     data=data,
    120     processed_functions=processed_functions,
    121     targets=targets,
    122     columns_overriding_functions=columns_overriding_functions,
    123     check_minimal_specification=check_minimal_specification,
    124 )
    126 # Calculate results.
    127 tax_transfer_function = dags.concatenate_functions(
    128     processed_functions,
    129     targets,
   (...)
    132     enforce_signature=True,
    133 )

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:364, in _create_input_data(data, processed_functions, targets, columns_overriding_functions, check_minimal_specification)
    357 dag = set_up_dag(
    358     all_functions=processed_functions,
    359     targets=targets,
    360     columns_overriding_functions=columns_overriding_functions,
    361     check_minimal_specification=check_minimal_specification,
    362 )
    363 root_nodes = {n for n in dag.nodes if list(dag.predecessors(n)) == []}
--> 364 _fail_if_root_nodes_are_missing(root_nodes, data, processed_functions)
    365 data = _reduce_to_necessary_data(root_nodes, data, check_minimal_specification)
    367 # Convert series to numpy arrays

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:492, in _fail_if_root_nodes_are_missing(root_nodes, data, functions)
    490 if missing_nodes:
    491     formatted = format_list_linewise(missing_nodes)
--> 492     raise ValueError(f"The following data columns are missing.\n{formatted}")

ValueError: The following data columns are missing.

[
    "arbeitsstunden_w",
]

Similarly, we can pass an empty pandas.DataFrame to the function to get a list of all the necessary input columns to compute the desired target(s).

[14]:
result = compute_taxes_and_transfers(
    data=pd.DataFrame({"p_id": []}),
    params=policy_params,
    targets="kindergeld_m",
    functions=policy_functions,
)
/home/docs/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:100: UserWarning:

The data types of the following input variables have been converted:

 - p_id from float64 to int

Note that the automatic conversion of data types is unsafe and that its correctness cannot be guaranteed. The best solution is to convert all columns to the expected data types yourself.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 result = compute_taxes_and_transfers(
      2     data=pd.DataFrame({"p_id": []}),
      3     params=policy_params,
      4     targets="kindergeld_m",
      5     functions=policy_functions,
      6 )

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:118, in compute_taxes_and_transfers(data, params, functions, aggregation_specs, targets, columns_overriding_functions, check_minimal_specification, rounding, debug)
    113 processed_functions = _round_and_partial_parameters_to_functions(
    114     necessary_functions, params, rounding
    115 )
    117 # Create input data.
--> 118 input_data = _create_input_data(
    119     data=data,
    120     processed_functions=processed_functions,
    121     targets=targets,
    122     columns_overriding_functions=columns_overriding_functions,
    123     check_minimal_specification=check_minimal_specification,
    124 )
    126 # Calculate results.
    127 tax_transfer_function = dags.concatenate_functions(
    128     processed_functions,
    129     targets,
   (...)
    132     enforce_signature=True,
    133 )

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:364, in _create_input_data(data, processed_functions, targets, columns_overriding_functions, check_minimal_specification)
    357 dag = set_up_dag(
    358     all_functions=processed_functions,
    359     targets=targets,
    360     columns_overriding_functions=columns_overriding_functions,
    361     check_minimal_specification=check_minimal_specification,
    362 )
    363 root_nodes = {n for n in dag.nodes if list(dag.predecessors(n)) == []}
--> 364 _fail_if_root_nodes_are_missing(root_nodes, data, processed_functions)
    365 data = _reduce_to_necessary_data(root_nodes, data, check_minimal_specification)
    367 # Convert series to numpy arrays

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:492, in _fail_if_root_nodes_are_missing(root_nodes, data, functions)
    490 if missing_nodes:
    491     formatted = format_list_linewise(missing_nodes)
--> 492     raise ValueError(f"The following data columns are missing.\n{formatted}")

ValueError: The following data columns are missing.

[
    "arbeitsstunden_w",
    "alter",
    "in_ausbildung",
    "tu_id",
]

Error Messages and Warnings: Unused Inputs#

The function compute_taxes_and_transfers also has an option that allows you to check for unused inputs in your data. This functionality is controlled through the argument check_minimal_specification. By default, it is set to ignore, meaning no check is conduced. However, it can also be set to warn to trigger a warning or raise an error that includes a message stating the unused inputs.

[15]:
result = compute_taxes_and_transfers(
    data=data,
    params=policy_params,
    targets="kindergeld_m",
    functions=policy_functions,
    check_minimal_specification="raise",
)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], line 1
----> 1 result = compute_taxes_and_transfers(
      2     data=data,
      3     params=policy_params,
      4     targets="kindergeld_m",
      5     functions=policy_functions,
      6     check_minimal_specification="raise",
      7 )

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:118, in compute_taxes_and_transfers(data, params, functions, aggregation_specs, targets, columns_overriding_functions, check_minimal_specification, rounding, debug)
    113 processed_functions = _round_and_partial_parameters_to_functions(
    114     necessary_functions, params, rounding
    115 )
    117 # Create input data.
--> 118 input_data = _create_input_data(
    119     data=data,
    120     processed_functions=processed_functions,
    121     targets=targets,
    122     columns_overriding_functions=columns_overriding_functions,
    123     check_minimal_specification=check_minimal_specification,
    124 )
    126 # Calculate results.
    127 tax_transfer_function = dags.concatenate_functions(
    128     processed_functions,
    129     targets,
   (...)
    132     enforce_signature=True,
    133 )

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:365, in _create_input_data(data, processed_functions, targets, columns_overriding_functions, check_minimal_specification)
    363 root_nodes = {n for n in dag.nodes if list(dag.predecessors(n)) == []}
    364 _fail_if_root_nodes_are_missing(root_nodes, data, processed_functions)
--> 365 data = _reduce_to_necessary_data(root_nodes, data, check_minimal_specification)
    367 # Convert series to numpy arrays
    368 data = {key: series.values for key, series in data.items()}

File ~/checkouts/readthedocs.org/user_builds/gettsim/checkouts/stable/src/_gettsim/interface.py:503, in _reduce_to_necessary_data(root_nodes, data, check_minimal_specification)
    501     warnings.warn(message, stacklevel=2)
    502 elif unnecessary_data and check_minimal_specification == "raise":
--> 503     raise ValueError(message)
    505 return {k: v for k, v in data.items() if k not in unnecessary_data}

ValueError: The following columns in 'data' are unused.


[
    "wohnfläche_hh",
    "y_pflichtbeitr_ab_40",
    "m_pfleg_berücks_zeit",
    "kapitaleink_brutto_m",
    "steuerklasse",
    "m_elterngeld_mut_hh",
    "m_arbeitsunfähig",
    "vermögen_bedürft_hh",
    "m_alg1_übergang",
    "m_mutterschutz",
    "rentner",
    "weiblich",
    "geburtstag",
    "bewohnt_eigentum_hh",
    "eink_selbst_m",
    "heizkosten_m_hh",
    "sozialv_pflicht_5j",
    "hh_typ",
    "betreuungskost_m",
    "selbstständig",
    "kind_unterh_erhalt_m",
    "entgeltp",
    "priv_rentenv_beitr_m",
    "bürgerg_bezug_vorj",
    "m_durchg_alg1_bezug",
    "jahr_renteneintr",
    "m_schul_ausbild",
    "m_freiw_beitrag",
    "kind",
    "eink_vermietung_m",
    "geburtsjahr",
    "hat_kinder",
    "bruttolohn_vorj_m",
    "grundr_entgeltp",
    "immobilie_baujahr_hh",
    "m_pflichtbeitrag",
    "sonstig_eink_m",
    "hh_id",
    "m_kind_berücks_zeit",
    "geburtsmonat",
    "in_priv_krankenv",
    "m_geringf_beschäft",
    "schwerbeh_g",
    "wohnort_ost",
    "priv_rente_m",
    "behinderungsgrad",
    "anwartschaftszeit",
    "m_krank_ab_16_bis_24",
    "m_elterngeld",
    "m_ausbild_suche",
    "arbeitssuchend",
    "kind_unterh_anspr_m",
    "m_elterngeld_vat_hh",
    "grundr_zeiten",
    "grundr_bew_zeiten",
    "p_id",
    "alleinerz",
    "m_ersatzzeit",
    "m_arbeitslos",
    "bruttokaltmiete_m_hh",
    "bruttolohn_m",
    "mietstufe",
]

Debug Mode#

In addition to errors and warnings compute_taxes_and_transfers can also be used in debug mode by setting the argument debug=True. In this mode, the function returns all inputs and outputs that can be computed while issuing error messages for the parts where the code fails. It is thus a very useful tool to help you set up your code correctly and detect the sources of problems that might arise in the process. Check out the troubleshooting tutorial for more information.

Computing Child Benefits and Taxes#

In this section we will compute lump-sum child benefits (Kindergeld) for example households. Since households can also claim a tax credit (Kinderfreibetrag) instead of the child benefit, we will also compute the income taxes for each household. By default, GETTSIM chooses the financially more favorable option for each case. The results will thus let us inspect how the policy affects different income levels in our data.

Income Taxes#

The income tax of a tax unit depends on the child benefit since the tax credit is only claimed if it more beneficial than the child benefit. To compare, we can additionally compute the income taxes for our data set eink_st_tu. We also compute the variable bruttolohn_m_tu, which gives the monthly gross income per tax unit (in our case, this is the combined income of the two adults in the household).

[16]:
df = compute_taxes_and_transfers(
    data=data,
    params=policy_params,
    targets=["eink_st_tu", "bruttolohn_m_tu", "kindergeld_m_tu"],
    functions=policy_functions,
)

Since the gross income and child benefit per tax unit is computed on a monthly level while taxes are computed for the time unit of one year, we multiply the former by 12 and drop unused variables as well as duplicates from our DataFrame. The final DataFrame contains the yearly gross income, income tax, child benefit, and number of children in the household.

[17]:
# Multiply variables by 12 to generate yearly values.
df[["bruttolohn_tu", "kindergeld_tu"]] = df[["bruttolohn_m_tu", "kindergeld_m_tu"]] * 12
# Select variables of interest for further steps.
df = df[["bruttolohn_tu", "eink_st_tu", "kindergeld_tu"]].drop_duplicates()
df.head().round(2)
[17]:
bruttolohn_tu eink_st_tu kindergeld_tu
0 12000.0 0.0 4896.0
4 12120.0 0.0 4896.0
8 12240.0 0.0 4896.0
12 12360.0 0.0 4896.0
16 12480.0 0.0 4896.0

At a certain income level (around €80,000-€90,000) the tax credit becomes more favorable and GETTSIM assigns the tax break. The next cells plot the resulting income tax and child benefits.

[18]:
def plot_kindergeld(df):
    """Plot the child benefit and income taxes by household type."""

    return px.line(
        data_frame=df,
        x="bruttolohn_tu",
        y=["eink_st_tu", "kindergeld_tu"],
    )
[19]:
plot_kindergeld(df).show()

Columns Overriding Functions#

Lastly, it is also possible to substitute internally computed variables using input columns in the data. To override an internal function, it is necessary to specify a column with the same name and pass it to compute_taxes_and_transfers using the argument columns_overriding_functions.

For instance, for this application we could override the internal function kindergeld_m and set the child benefit to 0.

[20]:
new_data = data.copy()
new_data["kindergeld_m"] = 0.0

Again, we compute the child benefit and income tax by tax unit. The argument columns_overriding_functions also accepts lists of columns to overwrite multiple functions.

[21]:
outputs = compute_taxes_and_transfers(
    data=new_data,
    params=policy_params,
    targets=["kindergeld_m_tu", "eink_st_tu", "bruttolohn_m_tu"],
    functions=policy_functions,
    columns_overriding_functions=["kindergeld_m"],
)
[22]:
df_new = outputs.set_index(new_data.tu_id)
df_new[["bruttolohn_tu", "kindergeld_tu"]] = (
    df_new[["bruttolohn_m_tu", "kindergeld_m_tu"]] * 12
)
df_new = df_new[["bruttolohn_tu", "eink_st_tu", "kindergeld_tu"]].drop_duplicates()

Since the child benefits are set to zero, GETTSIM computes the tax credit for all households instead.

[23]:
plot_kindergeld(df_new).show()

Aside from overriding internal function outputs using data columns, it is also possible to substitute the functions entirely. Please refer to the policy functions tutorial for more information.

Use Case for Columns Overriding Functions: Retirement Earnings#

Retirement earnings (ges_rente_m) can be calculated by GETTSIM which requires several input variables including entgeltp or grundr_zeiten.

However, in most data sets (e.g. the SOEP) retirement earnings are observed and those input variables are not. For some applications, it is, hence, more straight-forward to specify columns_overriding_functions=["ges_rente_m"] and use the measured retirement earnings directly. Then the pension-specific input variables like entgeltp or grundr_zeiten are not needed as input variables.