Since yesterday, I have been trying to work out a method of calculating gini coefficient from grouped data. A quick search in the web took to some literature - the most interesting one being Approximation of Gini Index from Grouped Data by Nuria Badenes-Plá. Our suggestions would be independent of this paper. Now, let us give some prelimnaries.
We have a population divided into i=1,...,n groups, which have been ordered from the poorest to the richest. Let X-axis denote cummulative share of population, xi, and let Y-axis denote cummulative share of wealth (or, something like that), yi. The shares of population and wealth for each group is denoted as ai and bi respectively. If we compute the area below the Lorenz curve then we get:
In a conventional sense the gini coefficient is:
The peoblem is that in complete inequality when n-1 groups do not have any wealth and the nth group has all wealth then the area above the Lorenz curve is not half, it is less than that by an/2. To address this, we correct for the anomaly and propose a method for calculating gini coefficient:
Our suggested method satisfies three properties. (1) At complete equality, if wealth share for each and every group is equal to its population share, its value is zero. (2) At complete inequality, if there is no wealth for n-1 groups and all the wealth is with the nth group, its value is unity. (3) The calculation is population invariant - we do not have to know the size of the total population. Thus, it follows that if a population is replicated m number of times then the inequality computed will remain the same.
If instead of wealth, we are dealing with share of poor (or some other deprivation) then X-axis denotes proportion of poor and Y-axis denotes proportion of population. The above formula continues. This formula can also be used for unit level data whether weighted or unweighted, but one should be careful in calculating the shares.
Let us get back to Badenes-Plá's paper. Here also the area being lower than (1/2) at complete inequality was discussed and a suggestion proposed in its equation (12). This makes use of total aggregate wealth, W, as well as wealth for each group, wi, ranks for each group, i, and population shares, ai, and the formula is:
We tinker with the formula by replacing wealth with wealth shares, yi, and instead of ranks, the population shares and what we have is:
I am not sure about Badenes-Plá's method satisfying the third property, but our tinkering of that method will satisfy the three properties that we have discussed earlier, but they are not the same as the area under the Lorenz curve normalized by the area under total possible inequality. Thus, I would strongly suggest our proposed method. To reiterate,
A final problem is that our method has been ignoring the population, N. The question that comes to mind is the richest group having all the wealth is not the same as only one single individual having all the wealth. In such case, the area above the Lorenz curve will fall sort of (1/2) by a factor of (1/N)/2, not an/2. This means we deal with discrete data or consider an additional group and both these can be handled with our suggested method. A note of caution for discrete data is that the third property will not hold because our correction factor will be population dependent. This is fine, because when we consider one individual as an independent group it also means that the population cannot be replicated.
Srijit Mishra Sep 25, 2009 02:18 PM
Needles to say, this also satisfies two other properties. (1) If wealth share shifts from a lower (higher) group to a higher (lower) group then inequality increases (decreases). (2) If population share shifts from a lower (higher) group to a higher (lower) group then inequality decreases (increases).