Today while trying to calculate gini coefficient from a grouped data, I was stuck as my formula was in office. A quick search in the web was not very helpful. I came across some literature on its underestimation, and hence, the need for calculating lower and upper bounds. Finally, I had to work it out myself on the back of an envelop using a Lorenz curve. I thought of sharing it with others.

We have a population divided into i=1,...,n groups, which have been ordered from the poorest to the richest. Let X-axis denote cummulative share of population, x

_{i}, and let Y-axis denote cummulative share of wealth (or, something like that), y

_{i}. The shares of population and wealth for each group is denoted as a

_{i}and b

_{i}respectively. If we compute the area below the Lorenz curve then the Gini coefficient formula is:

1-2(Σ

_{i}a

_{i}b

_{i}/2+Σ

_{i}(1-x

_{i})b

_{i})

This does reasonably well, but suffers from the lower/upper bound problems, that is, it will not give the value of zero and unity for complete equality and complete inequality respectively. However, there is a very interesting formula in Approximation of Gini Index from Grouped Data by Badenes-Plá (this paper is not to be quoted, but available on-line). I tinker with it and get a formula that corrects for the lower/upper bounds.

1-((Σ

_{i}(1-x

_{i})a

_{i}b

_{i})/(Σ

_{i}(1-x

_{i})(a

_{i})

^{2}))

If instead of wealth, we are dealing with share of poor (or some other deprivation) then X-axis denotes proportion of poor and Y-axis denotes proportion of population. The above formula continues. This formula can also be used for unit level data whether weighted or unweighted, but one should be careful in calculating the shares. Happy computing.

## No comments:

## Post a Comment