# Data Description

## Understanding the variables

The data can be decomposed into two parts: the income measure and the summary statistic. All variables have the following naming convention:

`INCOME MEASURE _ SUMMARY STATISTIC`

For example, `inc1_gini`

is the gini coeffcient of primary income (`inc1`

).

We also add the prefix `hhaa`

if the summary statistic is restricted to working age households.

### Summary statistics

Variable name | Concept | Stata code |
---|---|---|

mean | Mean | `sum VARIABLE [w=hwgt*nhhmem]` |

gini | Gini coefficient | `sgini VARIABLE [aw=hwgt*nhhmem]` |

conc | Concentration index | `sgini VARIABLE [aw=hwgt*nhhmem], sortvar(SORTVARIABLE)` |

kakwani | Kakwani index | `VARIABLE_conc_SORTVARIABLE - SORTVARIABLE_gini` |

All statistics are calculated at the individual level. We first calculate the measures at the household level (using the square root equivalence scale) but then we weight the summary statistics by the number of household members to provide an individual level summary statistic. This approach assumes the household resources are shared equally among the household members.

### Income measures

Variable name | Concept | Definition |
---|---|---|

inc1 | Primary income | Income from labor and capital |

inc2 | Market income | Primary income + pensions |

inc3 | Gross income | Market income + cash social transfers (other than pensions) |

inc4 | Disposable income | Gross income - income taxation and social security contribution (employer and employee) |

dhi | Disposable income | The survey measure available in the LIS database. |

### Tax, transfer and pension measures

In addition to income, we also calculate summary statistics of the following concepts:

Variable name | Concept | LIS variables |
---|---|---|

tax | Income tax, employee and employer social security contributions | `hxit + hsscer` |

transfer | All monetary social transfers from government but excluding pensions | `hits - pubpension` |

allpension | Pensions | `pension - hitsap` |

pubpension | Public pensions | `hitsil + hitsup` |

pripension | Private pensions | `hicvip` |

hxits | Employee social security contributions (LIS and imputed) | `hxits=hsscee if hxits==.` |

hsscee | Employee social security contributions (imputed) | |

hsscer | Employer social security contributions (imputed) | |

hssc | Social security contributions (imputed) |

## Working-age subsample

We calculate our summary statistics on the full sample of respondents for each national survey and we categorize pensions as part of income. Researchers may prefer to exclude pensioners and focus only on working-age households. We have also calculated our summary statistics for the subsample of working-age households.

We define working-age households as those whose household head is between 25 and 60 years of age at the survey date

## Details on the summary statistics

### Weighted mean

We estimate the population mean of a variable by weighting the sample mean with weights provided in each household survey. The weights are calculated to match the sample with the population.

### Gini coefficient

The Gini coefficient is a standardized measure of inequality which ranges from 0 to 1. Perfect equality has Gini coefficient of 0 and the most extreme level of inequality (where one person has everything and everyone else has nothing) has a Gini coefficient of 1. You can read more details on the Gini coefficient here.

### Concentration index

The concentration index summarizes the distribution of a variable over households, ranked by household income. The index ranges from -1 to 1. For example, if were studying the distribution of taxes, the concentration index is equal to one if the household with the largest income paid all the taxes. The concentration index is -1 if the household with the smallest income paid all the taxes.

### Kakwani index

The Kakwani index is the difference between the concentration index and the Gini index. The Kakwani index corrects the concentration index for the initial level of inequality. Intuitively, the Kakwani index measures the distance from proportionality. If the Kakwani index is equal to zero then the variable is distributed proportionally to income.

The index ranges from −1−Gini to 1−Gini. For transfers, the lower the Kakwani index, the higher is the rate at which transfers fall as income rises. The transfer system redistributes from rich to poor when the index is negative. For taxes, the higher the Kakwani index, the higher is the rate at which tax rises as income rises. The tax system redistributes from rich to poor when this index is positive.