Changing Functions of the Taxes and Transfers System#

This tutorial focuses on the policy functions of GETTSIM, one of the two objects returned by the function set_up_policy_environment. Alongside policy parameters, these functions help GETTSIM define a date-specific policy environment based on which it can compute taxes and transfers for individual and household data.

Just like parameters, policy functions can be replaced, added or removed to make changes to the existing policy environment. This way, you can design a new tax or transfer for any specific group of people, e.g. invent a new tax for people that have income from renting an apartment, or change the conditions for receiving already existing transfers.

This tutorial showcases the policy functions using a concrete example. For a more comprehensive and abstract discussion of the feature, check out the how-to guide on Different Ways to Load Policy Functions.

[1]:
import copy
import warnings

import numpy as np
import plotly.express as px
from gettsim import (
    FunctionsAndColumnsOverlapWarning,
    compute_taxes_and_transfers,
    create_synthetic_data,
    set_up_policy_environment,
)

warnings.filterwarnings("ignore", category=FunctionsAndColumnsOverlapWarning)

Changing and Replacing Existing Function(s)#

Example: Receiving Multiple Transfers#

In the German system, there are some transfers for low-income families that cannot be received in combination. Per default, GETTSIM will always choose the most favorable transfers and set other transfers to zero. This assumption could model the behavior of households/families in a wrong way, if they do not always choose the optimal transfers (from a monetary perspective). For example, there could be a social stigma connected to certain transfers or some people simply do not know about some of the available transfers.

To account for these frictions, we can turn off this aspect of GETTSIM so that we see all the transfers a family/household is entitled to, even if the transfers cannot be received in combination. This can be useful for further analysis. For example you could speculate which transfers Germans receive in reality and implement this in GETTSIM.

Find the Function#

Here we can look for the function that implements the aspect we want to change.

[2]:
policy_params, policy_functions = set_up_policy_environment("2020")

Define Changes to the Function#

After you found the function that you want to change, copy the source code from the website to your notebook and change it just as you like:

[3]:
def arbeitsl_geld_2_m_bg(
    arbeitsl_geld_2_vor_vorrang_m_bg,
    # wohngeld_vorrang_hh,
    # kinderzuschl_vorrang_bg,
    # wohngeld_kinderzuschl_vorrang_hh,
    erwachsene_alle_rentner_hh,
):
    if (
        # wohngeld_vorrang_hh
        # | kinderzuschl_vorrang_bg
        # | wohngeld_kinderzuschl_vorrang_hh
        erwachsene_alle_rentner_hh
    ):
        out = 0.0
    else:
        out = arbeitsl_geld_2_vor_vorrang_m_bg

    return out

The lines of the cell above that start with “#” usually do the priority check as described above. With the hash, the lines become a comment and do not influence the code anymore.

Make GETTSIM Incorporate your Changes#

There are different ways to make GETTSIM incorporate your edited function.

Alternative 1:#

One way is to copy the policy_functions and replace the “old” function with the function we defined before.

[4]:
policy_functions_no_check = copy.deepcopy(policy_functions)
policy_functions_no_check["arbeitsl_geld_2_m_bg"] = arbeitsl_geld_2_m_bg

Computations with the new policy_functions_no_check will now have the characteristic of showing the value of all available transfers without checking which ones cannot be received in combination and without choosing the most profitable combination.

Let´s test if this works!

We import simulated data for households with two parents and two children. These households only vary in their income:

[5]:
## idea for use of synthetical data
data = create_synthetic_data(
    n_adults=2,
    n_children=2,
    specs_heterogeneous={
        "bruttolohn_m": [[i, 0, 0, 0] for i in np.linspace(500, 5000, 250)]
    },
)

# Compute sum of pension contributions in household and add it to data.
sum_ges_rente_priv_rente_m = compute_taxes_and_transfers(
    data=data,
    params=policy_params,
    targets="sum_ges_rente_priv_rente_m",
    functions=policy_functions,
)

data["sum_ges_rente_priv_rente_m"] = sum_ges_rente_priv_rente_m[
    "sum_ges_rente_priv_rente_m"
]
data.head(5)
[5]:
p_id hh_id hh_typ hat_kinder alleinerz anz_eig_kind_bis_24 weiblich alter kind in_ausbildung ... arbeitssuchend m_durchg_alg1_bezug sozialv_pflicht_5j kind_unterh_anspr_m kind_unterh_erhalt_m steuerklasse budgetsatz_erzieh voll_erwerbsgemind teilw_erwerbsgemind sum_ges_rente_priv_rente_m
0 0 0 couple_2_children True False 2 False 35 False False ... False 0.0 0.0 0.0 0.0 0 False False False 0.0
1 1 0 couple_2_children True False 2 True 35 False False ... False 0.0 0.0 0.0 0.0 0 False False False 0.0
2 2 0 couple_2_children False False 0 False 8 True True ... False 0.0 0.0 0.0 0.0 0 False False False 0.0
3 3 0 couple_2_children False False 0 True 5 True True ... False 0.0 0.0 0.0 0.0 0 False False False 0.0
4 4 1 couple_2_children True False 2 False 35 False False ... False 0.0 0.0 0.0 0.0 0 False False False 0.0

5 rows × 85 columns

For this data we can now compare the results of using GETTSIM with the policy_functions_no_check and the usual policy_functions.

We should expect to see positive values for wohngeld_m_hh, kinderzuschl_m_bg and arbeitsl_geld_2_m_bg at the same time if we do not check which combination of transfers is optimal (policy_functions_no_check).

On the other hand, if we use the default version of the policy_functions, wohngeld_m_hh and kinderzuschl_m_bg should be zero as long as arbeitsl_geld_2_m_bg is positive (and the other way around), as it is a characteristic of the German taxes and transfers system that Wohngeld and Kinderzuschlag cannot be received in combination with Arbeitslosengeld 2.

[6]:
targets = ["wohngeld_m_hh", "kinderzuschl_m_bg", "arbeitsl_geld_2_m_bg"]
[7]:
policies = {
    "Checked Favorability": policy_functions,
    "No Check of Favorabilty": policy_functions_no_check,
}
[8]:
# Loop through keys to plot both scenarios.
for k in policies:
    # Compute taxes and transfers.
    result = compute_taxes_and_transfers(
        data=data,
        functions=policies[k],
        params=policy_params,
        targets=targets,
    )
    # Add earnings and index to result DataFrame.
    result["bruttolohn_m"] = data["bruttolohn_m"]
    result.index = data["hh_id"]
    # Create DataFrame that contains the maximum value of the target variables
    # in the household and the household gross income.
    result = (
        result.groupby("hh_id")[targets]
        .max()
        .join(result.groupby("hh_id")["bruttolohn_m"].sum())
    )
    # Plot the results.
    fig = px.line(
        data_frame=result,
        x="bruttolohn_m",
        y=targets,
        title=k,
    )
    fig.update_layout(
        xaxis_title="Monthly gross income in € (per household)",
        yaxis_title="€ per month",
    )
    fig.show()

On first glance, both figures look quite confusing because of the complexity of the German taxes and transfers system. But if we take a closer look, the figures confirm our expectations. If we let GETTSIM check for the most favorable combination of transfers, wohngeld_m_hh and kinderzuschl_m_bg are zero as long as arbeitsl_geld_2_m_bg is positive (i.e. the best option for the household) and the other way around.

If we do not let GETTSIM do this check, this does not hold any longer and all transfers can be positive at the same time (which is what we were trying to achieve).

Alternative 2:#

Another way would be to mention the changed function in our compute_taxes_and_transfers-function. This works as follows:

[9]:
result_no_check_p = compute_taxes_and_transfers(
    data=data,
    params=policy_params,
    functions=[policy_functions, arbeitsl_geld_2_m_bg],
    targets=[
        "wohngeld_m_hh",
        "kinderzuschl_m_bg",
        "arbeitsl_geld_2_m_bg",
    ],
)

Executing this cell will allow you to reproduce the same analysis we did above. We do not want to do it twice, so we skip it.

There are three important points:

  1. Note that arbeitsl_geld_2_m_bg has the same function name as a pre-defined function inside GETTSIM. Thus, the internal function will be replaced with this version.

  2. In general, if there are multiple functions with the same name, internal functions have the lowest precedence. After that, the elements in the list passed to the functions argument are evaluated element by element. The functions in the leftmost element have the lowest precedence and the functions in the rightmost element have the highest.

  3. If policy_functions would not be necessary for this example, you can also directly pass the arbeitsl_geld_2_m_bg function to the functions argument.

Multiple Functions#

You can use exactly the same approach if you want to change more than one function of GETTSIM. But first, for our example we need to invent some changes to another function of GETTSIM. Imagine, we want to double the amount of Kindergeld every household receives in addition to the previously implemented function change.

[10]:
def kindergeld_m(
    kindergeld_anz_ansprüche: bool,
    kindergeld_params: dict,
) -> float:
    """Sum of Kindergeld for eligible children.

    Kindergeld claim for each child depends on the number of children Kindergeld is
    being claimed for.

    Parameters
    ----------
    kindergeld_anz_ansprüche
        See :func:`kindergeld_anz_ansprüche`.
    kindergeld_params
        See params documentation :ref:`kindergeld_params <kindergeld_params>`.

    Returns
    -------

    """

    if kindergeld_anz_ansprüche == 0:
        sum_kindergeld = 0.0
    else:
        sum_kindergeld = sum(
            kindergeld_params["kindergeld"][
                (
                    i
                    if i <= max(kindergeld_params["kindergeld"])
                    else max(kindergeld_params["kindergeld"])
                )
            ]
            for i in range(1, kindergeld_anz_ansprüche + 1)
        )

    return sum_kindergeld * 2

If you edit arbeitsl_geld_2_m_bg and kindergeld_m, your two options to make GETTSIM incorporate your changes would be:

Alternative 1:

[11]:
policy_functions_reformed = copy.deepcopy(policy_functions)
policy_functions_reformed["arbeitsl_geld_2_m_bg"] = arbeitsl_geld_2_m_bg
policy_functions_reformed["kindergeld_m"] = kindergeld_m

Alternative 2:

[12]:
df = compute_taxes_and_transfers(
    data=data,
    params=policy_params,
    functions=[policy_functions, arbeitsl_geld_2_m_bg, kindergeld_m],
    targets=[
        "wohngeld_m_hh",
        "kinderzuschl_m_bg",
        "arbeitsl_geld_2_m_bg",
        "kindergeld_m",
    ],
)

Adding a New Function#

Instead of replacing existing functions, we can similarly define completely new functions and add them to the policy environment.

Aggregation Functions#

Functions that aggregate columns are treated differently in GETTSIM. Column can be aggregated based on a group ID (e.g. on the household level) or based on the individual identifier p_id.

Aggregation Based on Group ID#

If we would like to add (or replace) such functions, we need to specify them in a dictionary which we provide to compute_taxes_and_transfers via the aggregate_by_group_specs argument. An example dictionary is as follows:

aggregate_by_group_specs = {
    "anz_erwachsene_sn": {"source_col": "erwachsen", "aggr": "sum"},
    "anz_personen_hh": {"aggr": "count"},
}

Aggregation Based on p_id#

Similarly to above, aggregation functions based on p_id are specified in a dictionary which can be provided to compute_taxes_and_transfers via the aggregate_by_p_id_specs argument. An example dictionary is as follows:

aggregate_by_p_id_kindergeld = {
    "kindergeld_anz_ansprüche": {
        "p_id_to_aggregate_by": "p_id_kindergeld_empf",
        "source_col": "kindergeld_anspruch",
        "aggr": "sum",
    },
}

See GEP 4 for more information on aggregation functions.