Data Description

Understanding the variables

The data can be decomposed into two parts: the income measure and the summary statistic. All variables have the following naming convention:


For example, inc1_gini is the gini coeffcient of primary income (inc1).

We also add the prefix hhaa if the summary statistic is restricted to working age households.

Summary statistics

Variable name Concept Stata code
mean Mean sum VARIABLE [w=hwgt*nhhmem]
gini Gini coefficient sgini VARIABLE [aw=hwgt*nhhmem]
conc Concentration index sgini VARIABLE [aw=hwgt*nhhmem], sortvar(SORTVARIABLE)
kakwani Kakwani index VARIABLE_conc_SORTVARIABLE - SORTVARIABLE_gini

All statistics are calculated at the individual level. We first calculate the measures at the household level (using the square root equivalence scale) but then we weight the summary statistics by the number of household members to provide an individual level summary statistic. This approach assumes the household resources are shared equally among the household members.

Income measures

Variable name Concept Definition
inc1 Primary income Income from labor and capital
inc2 Market income Primary income + pensions
inc3 Gross income Market income + cash social transfers (other than pensions)
inc4 Disposable income Gross income - income taxation and social security contribution (employer and employee)
dhi Disposable income The survey measure available in the LIS database.

Tax, transfer and pension measures

In addition to income, we also calculate summary statistics of the following concepts:

Variable name Concept LIS variables
tax Income tax, employee and employer social security contributions hxit + hsscer
transfer All monetary social transfers from government but excluding pensions hits - pubpension
allpension Pensions pension - hitsap
pubpension Public pensions hitsil + hitsup
pripension Private pensions hicvip
hxits Employee social security contributions (LIS and imputed) hxits=hsscee if hxits==.
hsscee Employee social security contributions (imputed)  
hsscer Employer social security contributions (imputed)  
hssc Social security contributions (imputed)  

Working-age subsample

We calculate our summary statistics on the full sample of respondents for each national survey and we categorize pensions as part of income. Researchers may prefer to exclude pensioners and focus only on working-age households. We have also calculated our summary statistics for the subsample of working-age households.

We define working-age households as those whose household head is between 25 and 60 years of age at the survey date

Details on the summary statistics

Weighted mean

We estimate the population mean of a variable by weighting the sample mean with weights provided in each household survey. The weights are calculated to match the sample with the population.

Gini coefficient

The Gini coefficient is a standardized measure of inequality which ranges from 0 to 1. Perfect equality has Gini coefficient of 0 and the most extreme level of inequality (where one person has everything and everyone else has nothing) has a Gini coefficient of 1. You can read more details on the Gini coefficient here.

Concentration index

The concentration index summarizes the distribution of a variable over households, ranked by household income. The index ranges from -1 to 1. For example, if were studying the distribution of taxes, the concentration index is equal to one if the household with the largest income paid all the taxes. The concentration index is -1 if the household with the smallest income paid all the taxes.

Kakwani index

The Kakwani index is the difference between the concentration index and the Gini index. The Kakwani index corrects the concentration index for the initial level of inequality. Intuitively, the Kakwani index measures the distance from proportionality. If the Kakwani index is equal to zero then the variable is distributed proportionally to income.

The index ranges from −1−Gini to 1−Gini. For transfers, the lower the Kakwani index, the higher is the rate at which transfers fall as income rises. The transfer system redistributes from rich to poor when the index is negative. For taxes, the higher the Kakwani index, the higher is the rate at which tax rises as income rises. The tax system redistributes from rich to poor when this index is positive.