Using the 1 year recently released data on incomes, I've calculated new income inequality numbers for states using the mean proportional difference. The most eqaul are:
1. Alaska
2. Utah
3. Wyoming
4. Idaho
5. Nevada
6. Iowa
7. New Hampshire
8. Vermont
9. Wisconsin
10. Maine
So the West (especially the Mountain West), the Midwest, and New England represent the entire list. What do these states have in common? They are some of the least populated, except Wisconsin, and they do not contain very large metro areas. Milwaukee, Las Vegas, and Salt Lake are the largest urban areas among these states.
The most unequal are:
1. New York
2. Louisiana
3. Massachusetts
4. Mississippi
5. Alabama
6. Connecticut
7. Georgia
8. Kentucky
9. Texas
10. Rhode Island
Northeast states and southern states make up the list here. New York and Massachusetts are drug down by their largest urban areas, which ahve the highest income inequality among very large metros. Interestingly, many southern metros do much better than the states they are in, such as Atlanta and Dallas. In these states, the ineqaulity is likely the result of metro vs. rural income gaps.
Thursday, September 30, 2010
Gini Index: A practical example of its shortcomings
Wthin the last week, the Census Bureau released new data on the 2009 American Community Survey. The most publicized piece of information was the growing inequality of incomes in the US. The link below is a census piece on income from the ACS.
www.census.gov/prod/2010pubs/acsbr09-2.pdf
In figure 2, there are 3 states with a Gini index higher than the national average: Texas, Connecticut, and New York. There are only this many, largely because the margin of error is fairly large in comparison to the actual difference between state and the national Gini index numbers. Now, all three of these states are among those with the most unequal distribution of income. But are they really the worst three?
Texas vs. Louisiana
Texas has a slightly higher Gini index than Louisiana based on census data. The difference is well within the margin of error. For this exercise, I will treat the census numbers as actuals instead of estimates and presume there is no margin of error.
One method of using income data to estimate the Gini index is to look at means of various income quintiles. The census bureau also provides a number for mean income of the top 5%. Since these figures are aggregated, the Gini index calcluated based on these numbers will not be as precise as those produced by the Census Bureau. Indeed, more aggregated inputs result in lower inequality. Caluculating the Gini index shows .457 for Louisiana and .458 for Texas. The Census calculations are .473 and .474, higher for both but still separated by the same amount (.001).
Proportionally measuring each household income with all others however shows inequality to be greater in Louisiana. The means of the quintiles are given below, along with the mean for the top 5%, and a calcualted mean of those in the top quintile but OUTSIDE the top 5%.
Lowest: LA-$9,315; TX-$11,021
2nd: LA-$24,623; TX-$28,627
3rd: LA-$42,879; TX-$48,384
4th: LA-$69,492; TX-$76,793
Highest: LA-$148,787; TX-$170,844
Top 5%: LA-$254,657; TX-$302,285
Top quintile, not top 5%: LA-$113,497; $127,030
Raw numbers mean little, since income is higher across the board in Texas. Since we can divide the highest quintile into 2 subgroups, we have 6 distinct cohorts. Calculating the proportional gap between each cohort illustrates the relative gap at each step of the income ladder. For these 2 states, the numbers are listed below.
Lowest-2nd: LA: 2.64; TX-2.60
2nd-3rd: LA: 1.74; TX-1.69
3rd-4th: LA: 1.62; TX-1.59
4th-Highest outside top 5: LA: 1.63; TX: 1.65
Highest outside top 5-top 5: LA: 2.24; TX: 2.38
So in 3 cases Louisiana had a higher proportional gap, and in 2 Texas had the larger gap. In comparing each household against one another, some of these will apply and some won't - depending from which cohort each household is included. Given all the possible combinations of households, we know that these gaps will apply as follows:
Lowest-2nd: in 32% of cases
2nd-3rd: 48%
3rd-4th: 48%
4th-Highest outside top 5: 32%
Highest outside top 5-top 5: 9.5%
For example 20% of the population will fall below the break from Lowest-2nd, namely the lowest quintile. The other 80% will be above. If selecting a random combination of households, we'd have the following odds for combinations:
Lowest 20-Lowest 20: 20% x 20% = 4%
Lowest 20-Highest 80: 20% x 80% = 16%
Highest 80 - Lowest 20: 80% x 20% = 16%
Highest 80 - Highest 80: 80% x 80% = 64%
In the two middle cases, the selected households fall on opposite sides of the Lowest-2nd gap, and thus that proportional gap applies. The mean (geometric, since this involves proportions)gap between any 2 households in Louisiana is 2.84. In Texas it is 2.78. The actual mean difference in relative terms between household incomes is higher in Louisiana.
The Gini index does much the same thing, though it deal with ABSOLUTE values. Of course, absolute difference increases as all incomes increase proportionally, and so the Gini value is indexed to the mean. The consequence of this is to put much greater emphasis where absolute difference is highest-at the top of the income ladder. And that's place where we find larger gaps in Texas than Louisiana.
Cutting the Louisiana lowest quintile income in half would increase the mean relative difference significantly - to 3.54. The calculated Gini index would increase to .469, that is, it barely moved. What about doubling the top 5% incomes? Mean relative difference moves up to 3.03 since only 5% was affected instead of 20% as in the first case. The Gini jumps to .662!
What are the ramifications? A geography can look pretty equal even with a very poor underclass, with incomes way below the median, since this is less of a factor in calualting the Gini index. Overall, the Gini pretty closely shows income disparity. But there are better measures, no more difficult to calculate that could be used instead.
www.census.gov/prod/2010pubs/acsbr09-2.pdf
In figure 2, there are 3 states with a Gini index higher than the national average: Texas, Connecticut, and New York. There are only this many, largely because the margin of error is fairly large in comparison to the actual difference between state and the national Gini index numbers. Now, all three of these states are among those with the most unequal distribution of income. But are they really the worst three?
Texas vs. Louisiana
Texas has a slightly higher Gini index than Louisiana based on census data. The difference is well within the margin of error. For this exercise, I will treat the census numbers as actuals instead of estimates and presume there is no margin of error.
One method of using income data to estimate the Gini index is to look at means of various income quintiles. The census bureau also provides a number for mean income of the top 5%. Since these figures are aggregated, the Gini index calcluated based on these numbers will not be as precise as those produced by the Census Bureau. Indeed, more aggregated inputs result in lower inequality. Caluculating the Gini index shows .457 for Louisiana and .458 for Texas. The Census calculations are .473 and .474, higher for both but still separated by the same amount (.001).
Proportionally measuring each household income with all others however shows inequality to be greater in Louisiana. The means of the quintiles are given below, along with the mean for the top 5%, and a calcualted mean of those in the top quintile but OUTSIDE the top 5%.
Lowest: LA-$9,315; TX-$11,021
2nd: LA-$24,623; TX-$28,627
3rd: LA-$42,879; TX-$48,384
4th: LA-$69,492; TX-$76,793
Highest: LA-$148,787; TX-$170,844
Top 5%: LA-$254,657; TX-$302,285
Top quintile, not top 5%: LA-$113,497; $127,030
Raw numbers mean little, since income is higher across the board in Texas. Since we can divide the highest quintile into 2 subgroups, we have 6 distinct cohorts. Calculating the proportional gap between each cohort illustrates the relative gap at each step of the income ladder. For these 2 states, the numbers are listed below.
Lowest-2nd: LA: 2.64; TX-2.60
2nd-3rd: LA: 1.74; TX-1.69
3rd-4th: LA: 1.62; TX-1.59
4th-Highest outside top 5: LA: 1.63; TX: 1.65
Highest outside top 5-top 5: LA: 2.24; TX: 2.38
So in 3 cases Louisiana had a higher proportional gap, and in 2 Texas had the larger gap. In comparing each household against one another, some of these will apply and some won't - depending from which cohort each household is included. Given all the possible combinations of households, we know that these gaps will apply as follows:
Lowest-2nd: in 32% of cases
2nd-3rd: 48%
3rd-4th: 48%
4th-Highest outside top 5: 32%
Highest outside top 5-top 5: 9.5%
For example 20% of the population will fall below the break from Lowest-2nd, namely the lowest quintile. The other 80% will be above. If selecting a random combination of households, we'd have the following odds for combinations:
Lowest 20-Lowest 20: 20% x 20% = 4%
Lowest 20-Highest 80: 20% x 80% = 16%
Highest 80 - Lowest 20: 80% x 20% = 16%
Highest 80 - Highest 80: 80% x 80% = 64%
In the two middle cases, the selected households fall on opposite sides of the Lowest-2nd gap, and thus that proportional gap applies. The mean (geometric, since this involves proportions)gap between any 2 households in Louisiana is 2.84. In Texas it is 2.78. The actual mean difference in relative terms between household incomes is higher in Louisiana.
The Gini index does much the same thing, though it deal with ABSOLUTE values. Of course, absolute difference increases as all incomes increase proportionally, and so the Gini value is indexed to the mean. The consequence of this is to put much greater emphasis where absolute difference is highest-at the top of the income ladder. And that's place where we find larger gaps in Texas than Louisiana.
Cutting the Louisiana lowest quintile income in half would increase the mean relative difference significantly - to 3.54. The calculated Gini index would increase to .469, that is, it barely moved. What about doubling the top 5% incomes? Mean relative difference moves up to 3.03 since only 5% was affected instead of 20% as in the first case. The Gini jumps to .662!
What are the ramifications? A geography can look pretty equal even with a very poor underclass, with incomes way below the median, since this is less of a factor in calualting the Gini index. Overall, the Gini pretty closely shows income disparity. But there are better measures, no more difficult to calculate that could be used instead.
Saturday, September 25, 2010
Metro Inequality
Here are the most equal of all metro area. I've used a bit different census data set on this, and that has a small effect on the rankings.
1. St. George, UT
2. Ogden, UT
3. Wausau, WI
4. Sheboygan, WI
5. Monroe, MI
6. Holland, MI
7. Lebanon, PA
8. Appleton, WI
9. York, PA
10. Hinesville, GA
So a list dominated by small metros that originally grew up in agricultural areas in the North. Utah is, of course, a special case, and it's other 3 metros are all in the top 25 if this list were extended.
The opposite side of the spectrum is below.
1. College Station, TX
2. Gainesville, FL
3. Tuscaloosa, AL
4. Bridgeport, CT
5. Athens, GA
6. Monroe, LA
7. New York, NY
8. Auburn, AL
9. Bloomington, IN
10. McAllen, TX
I omitted the metro areas OMB defines for Puerto Rico; otherwise they would crowd out almost everyone else. Well, there's an obvious pattern here, and while I didn't foresee the result, it makes a lot of sense. Inequality is a population with inordinate numbers of elites and inordinate numbers of people just struggling to get by. What better represents that reality than a college campus. Of course, to some extent, it's probably a bit of a case of stats being deceiving, as many no low-income college students are doing just fine with the parents' money, which won't count as income.
Other than the college towns, there's Bridgeport with its excess overclass, Monroe and McAllen with big portions of underclass , and New York with quite a bit of both.
1. St. George, UT
2. Ogden, UT
3. Wausau, WI
4. Sheboygan, WI
5. Monroe, MI
6. Holland, MI
7. Lebanon, PA
8. Appleton, WI
9. York, PA
10. Hinesville, GA
So a list dominated by small metros that originally grew up in agricultural areas in the North. Utah is, of course, a special case, and it's other 3 metros are all in the top 25 if this list were extended.
The opposite side of the spectrum is below.
1. College Station, TX
2. Gainesville, FL
3. Tuscaloosa, AL
4. Bridgeport, CT
5. Athens, GA
6. Monroe, LA
7. New York, NY
8. Auburn, AL
9. Bloomington, IN
10. McAllen, TX
I omitted the metro areas OMB defines for Puerto Rico; otherwise they would crowd out almost everyone else. Well, there's an obvious pattern here, and while I didn't foresee the result, it makes a lot of sense. Inequality is a population with inordinate numbers of elites and inordinate numbers of people just struggling to get by. What better represents that reality than a college campus. Of course, to some extent, it's probably a bit of a case of stats being deceiving, as many no low-income college students are doing just fine with the parents' money, which won't count as income.
Other than the college towns, there's Bridgeport with its excess overclass, Monroe and McAllen with big portions of underclass , and New York with quite a bit of both.
Friday, September 24, 2010
Income Inequality in States
Using mean log difference, I've ranked the 50 states from most equal to most unequal.
1. Utah
2. Alaska
3. Idaho
4. Wyoming
5. Nevada
Western states control the top of the list.
6. Wisconsin
7. New Hampshire
8. Iowa
9. Nebraska
10. Vermont
11. South Dakota
12. Indiana
13. Minnesota
14. Kansas
15. Deleware
16. Montana
17. Maine
18. North Dakota
19. Arizona
20. Washington
21. Maryland
22. Hawaii
23. Oregon
24. Missouri
25. Florida
Half way down the list, the first southern state. And one that is not very southern culturally. It's also the first heavily populated state on the list.
26. Ohio
27. Colorado
28. Michigan
29. Oklahoma
30. West Virginia
31. Pennsylvania
32. Arkansas
33. North Carolina
34. Virginia
35. South Carolina
36. Illinois
37. New Mexico
38. Tennessee
39. California
40. Connecticut
41. Georgia
42. New Jersey
43. Texas
44. Rhode Island
45. Kentucky
46. Alabama
47. Mississippi
48. Massachusetts
49. Louisiana
50. New York
So the most unequal regions are those liberal bastions on the east coast, and the conservative states of the south. A combination of states dominated by large urban areas and ones that have large rural populations.
1. Utah
2. Alaska
3. Idaho
4. Wyoming
5. Nevada
Western states control the top of the list.
6. Wisconsin
7. New Hampshire
8. Iowa
9. Nebraska
10. Vermont
11. South Dakota
12. Indiana
13. Minnesota
14. Kansas
15. Deleware
16. Montana
17. Maine
18. North Dakota
19. Arizona
20. Washington
21. Maryland
22. Hawaii
23. Oregon
24. Missouri
25. Florida
Half way down the list, the first southern state. And one that is not very southern culturally. It's also the first heavily populated state on the list.
26. Ohio
27. Colorado
28. Michigan
29. Oklahoma
30. West Virginia
31. Pennsylvania
32. Arkansas
33. North Carolina
34. Virginia
35. South Carolina
36. Illinois
37. New Mexico
38. Tennessee
39. California
40. Connecticut
41. Georgia
42. New Jersey
43. Texas
44. Rhode Island
45. Kentucky
46. Alabama
47. Mississippi
48. Massachusetts
49. Louisiana
50. New York
So the most unequal regions are those liberal bastions on the east coast, and the conservative states of the south. A combination of states dominated by large urban areas and ones that have large rural populations.
Sunday, September 19, 2010
Correlations in Metro Area Income Inequality
I've taken a look at social statistics to try and find correlating data to income ineqaulity. Having reviewed metro areas with low vs. high inequality, it seemed that marriage and family may be a correlating factor. For one, all of Utah's metro areas are more equal than similar areas of the same size. Second, other places with low ineqaulity are located in the Northern Plains, Rockies, and Great Lakes where traditional families are more common. Once I observed marriage and family data by itself, namely what % of people live in married couple households, I determined a few obvious exceptions. First, heavy Latino areas such as the Rio Grande Valley and low-income white areas such as Appalachia have higher % of people in married family households, yet these areas have high inequaity. So I screened for education level.
Regression shows that there is clear correlation with both variables and income ineqaulity in a metro area. The Excel P-values are 1.76E-67 for the married family stat and 1.53E-35 for % over 25 who finished high school but who do not have graduate degrees. Note I tried other education parameters, but this was the most significant. High proportions of college dropouts or graduate degree holders apparently reduces an area's income equality.
The coefficient based on marriage in slightly higher and more significant than the education item.
Regression shows that there is clear correlation with both variables and income ineqaulity in a metro area. The Excel P-values are 1.76E-67 for the married family stat and 1.53E-35 for % over 25 who finished high school but who do not have graduate degrees. Note I tried other education parameters, but this was the most significant. High proportions of college dropouts or graduate degree holders apparently reduces an area's income equality.
The coefficient based on marriage in slightly higher and more significant than the education item.
Saturday, September 18, 2010
Points List to Represent US population
Using the US Census tracts, along with 2000 data on population and the latitiue and longitude of the tract centroids, I have created a list designed to represetn US population. The list was compiled in the following steps:
1. Start with one point and allocate it to the full set of tracts
2. Find the geographic mean of all tracts in a set using Census methods
3. Divide the set along an east-west or a north-south axis depedning on from which axis there exists a greater deviation within the set
4. Allocate any points from the old set to the 2 new sets based on their proportion of population, rounding to the nearest integer
5. With the new sets, repeat steps 2-4.
6. Once all points are assigned to individual tracts, stop.
7. Repeat full process from step 1 starting with 2 points, 3 points, etc.
For each iteration, the tracts which had points allocated from the previous round will remain and one new tract is added. This can go on for over 10,000 iterations before individual tracts will wind up with more than one point.
Example if we start with 8 points. The full country is split east-west and 5 points are allocated to the east, 3 to the west based on population. In the east, 3 of the 5 points are allocated to the north half, and 2 to the south. Among the 2 in the south, 1 goes to the more central south, and 1 to the farther south. At this point, points are awarded 1-0 at each split, with the point always going to the more populated subset. For the mid-south, this point finally gets allocated to a tract in Atlanta. For the more distant south, the point ends at a tract in Miami.
The list that is produced by this method up to 50 follows. Rather than list the tract, which won't mean anything to most, I'll list the zip code place name it is in.
1. New York, NY
2. Los Angeles, CA
3. Atlanta, GA
4. Dallas, TX
5. Chicago, IL
6. Takoma Park, MD
7. San Francisco, CA
8. Miami, FL
9. Detroit, MI
10. Kansas City, MO
11. Somerville, MA
12. Anaheim, CA
13. Charlotte, NC
14. Indianapolis, IN
15. Albuquerque, NM
16. Baton Rouge, LA
17. Buffalo, NY
18. Seattle, WA
19. Columbus, OH
20. Wheat Ridge, CO
21. Birmingham, AL
22. South Richmond Hill, NY
23. North Las Vegas, NV
24. Milwaukee, WI
25. Houston, TX
26. Raleigh, NC
27. Philadelphia, PA
28. Berkeley, CA
29. Cleveland, OH
30. Orlando, FL
31. Minneapolis, MN
32. Lowell, MA
33. Phoenix, AZ
34. Brooklyn, NY
35. Knoxville, TN
36. Austin, TX
37. Naperville, IL
38. Pinellas Park, FL
39. Downey, CA
40. Oakton, VA
41. Tulsa, OK
42. Pittsburgh, PA
43. Greensboro, NC
44. Portland, OR
45. Bronx, NY
46. Shreveport, LA
47. St. Louis, MO
48. Nashville, TN
49. San Fernando Valley, CA
50. Bethlehem, PA
1. Start with one point and allocate it to the full set of tracts
2. Find the geographic mean of all tracts in a set using Census methods
3. Divide the set along an east-west or a north-south axis depedning on from which axis there exists a greater deviation within the set
4. Allocate any points from the old set to the 2 new sets based on their proportion of population, rounding to the nearest integer
5. With the new sets, repeat steps 2-4.
6. Once all points are assigned to individual tracts, stop.
7. Repeat full process from step 1 starting with 2 points, 3 points, etc.
For each iteration, the tracts which had points allocated from the previous round will remain and one new tract is added. This can go on for over 10,000 iterations before individual tracts will wind up with more than one point.
Example if we start with 8 points. The full country is split east-west and 5 points are allocated to the east, 3 to the west based on population. In the east, 3 of the 5 points are allocated to the north half, and 2 to the south. Among the 2 in the south, 1 goes to the more central south, and 1 to the farther south. At this point, points are awarded 1-0 at each split, with the point always going to the more populated subset. For the mid-south, this point finally gets allocated to a tract in Atlanta. For the more distant south, the point ends at a tract in Miami.
The list that is produced by this method up to 50 follows. Rather than list the tract, which won't mean anything to most, I'll list the zip code place name it is in.
1. New York, NY
2. Los Angeles, CA
3. Atlanta, GA
4. Dallas, TX
5. Chicago, IL
6. Takoma Park, MD
7. San Francisco, CA
8. Miami, FL
9. Detroit, MI
10. Kansas City, MO
11. Somerville, MA
12. Anaheim, CA
13. Charlotte, NC
14. Indianapolis, IN
15. Albuquerque, NM
16. Baton Rouge, LA
17. Buffalo, NY
18. Seattle, WA
19. Columbus, OH
20. Wheat Ridge, CO
21. Birmingham, AL
22. South Richmond Hill, NY
23. North Las Vegas, NV
24. Milwaukee, WI
25. Houston, TX
26. Raleigh, NC
27. Philadelphia, PA
28. Berkeley, CA
29. Cleveland, OH
30. Orlando, FL
31. Minneapolis, MN
32. Lowell, MA
33. Phoenix, AZ
34. Brooklyn, NY
35. Knoxville, TN
36. Austin, TX
37. Naperville, IL
38. Pinellas Park, FL
39. Downey, CA
40. Oakton, VA
41. Tulsa, OK
42. Pittsburgh, PA
43. Greensboro, NC
44. Portland, OR
45. Bronx, NY
46. Shreveport, LA
47. St. Louis, MO
48. Nashville, TN
49. San Fernando Valley, CA
50. Bethlehem, PA
NFL and Geography
Several years ago, I analyzed how well the NFL teams are spread across the country. Does the geographic dispersion of teams accurately reflect the same of the population in general?
To start, I compared a simple east-west split at the geographic mean, since there is greater deviation along that access than north-south. There should be 20 of 32 teams east of that point based on population, but in actuality there is 22. Further and further breakdowns (always along the mean in the axis of highest deviation) produced these results.
East vs. West: 22 to 10 (population is closest to 20:12)
In the east, North vs. South: 15 vs. 7 (taking 20 for the east, should be 13:7)
In the west, center west vs. far west: 5 to 5 (from 12, should be 6:6)
East-North-East-East: 3 teams (Pats, Giants, Jets) population would call for 4
East-North-East-West: 4 teams (Bills, Eagles, Skins, Ravens) should be 3
East-North-West-East: 4 teams (Steelers, Browns, Lions, Bengals) should be 3
East-North-West-West: 4 teams (Bears, Packers, Colts, Rams) should be 3
East-South-North: 3 teams (Falcons, Titans, Panthers) should be 4
East-South-South: 4 teams (Bucs, Jags, Dolphins, Saints) should be 3
West-East-North: 3 teams (Broncos, Vikings, Chiefs) as it should be
West-East-South: 2 teams (Cowboys, Texans) should be 3
West-West-East: 2 teams (Cardinals, Chargers) should be 3
West-West-West: 3 teams (Seahawks, 49ers, Raiders) as it should be
So how to make the league better represent where people live? Move the Ravens to New England, the Jaguars to Raleigh or Richmond, the Rams back to Los Angeles, and the Steelers to San Antonio, Albuquerque or Okla. City.
To start, I compared a simple east-west split at the geographic mean, since there is greater deviation along that access than north-south. There should be 20 of 32 teams east of that point based on population, but in actuality there is 22. Further and further breakdowns (always along the mean in the axis of highest deviation) produced these results.
East vs. West: 22 to 10 (population is closest to 20:12)
In the east, North vs. South: 15 vs. 7 (taking 20 for the east, should be 13:7)
In the west, center west vs. far west: 5 to 5 (from 12, should be 6:6)
East-North-East-East: 3 teams (Pats, Giants, Jets) population would call for 4
East-North-East-West: 4 teams (Bills, Eagles, Skins, Ravens) should be 3
East-North-West-East: 4 teams (Steelers, Browns, Lions, Bengals) should be 3
East-North-West-West: 4 teams (Bears, Packers, Colts, Rams) should be 3
East-South-North: 3 teams (Falcons, Titans, Panthers) should be 4
East-South-South: 4 teams (Bucs, Jags, Dolphins, Saints) should be 3
West-East-North: 3 teams (Broncos, Vikings, Chiefs) as it should be
West-East-South: 2 teams (Cowboys, Texans) should be 3
West-West-East: 2 teams (Cardinals, Chargers) should be 3
West-West-West: 3 teams (Seahawks, 49ers, Raiders) as it should be
So how to make the league better represent where people live? Move the Ravens to New England, the Jaguars to Raleigh or Richmond, the Rams back to Los Angeles, and the Steelers to San Antonio, Albuquerque or Okla. City.
Inequality in Metro Divisions
Looking more closely at large metro areas - at their component divisions - shows an unsurprising characterisitic. In each case of metro divisions (11 metros have them), the division containing the center of the metro has the greatest inequality. The most unequal divisions:
1. New York 3.433
2. Miami 3.273
3. Boston 3.242
4. San Francisco 3.219
5. Philadelphia 3.203
This list is the central division in each of the five most uneqaul large metro areas. And in each case, the central division is a bit more uneqaul than the metro area as a whole. New York's number is in a league of its own.
Conversely, the most equal divisions contain no centers of metro areas.
1. Rockingham/Stafford counties, NH 2.724
2. Tacoma, WA 2.725
3. Nassau/Suffolk, NY 2.749
4. Bethesda/Frederick, MD 2.760
5. Lake County, IL/Kenosha, WI 2.791
1. New York 3.433
2. Miami 3.273
3. Boston 3.242
4. San Francisco 3.219
5. Philadelphia 3.203
This list is the central division in each of the five most uneqaul large metro areas. And in each case, the central division is a bit more uneqaul than the metro area as a whole. New York's number is in a league of its own.
Conversely, the most equal divisions contain no centers of metro areas.
1. Rockingham/Stafford counties, NH 2.724
2. Tacoma, WA 2.725
3. Nassau/Suffolk, NY 2.749
4. Bethesda/Frederick, MD 2.760
5. Lake County, IL/Kenosha, WI 2.791
Friday, September 17, 2010
Income Inequality in US Large Metros
Using 2006-08 American Community Survey data, the following are the large metros (at least 1 million households) with the highest income inequality, as measured by the mean log difference.
1. New York 3.271
2. Boston 3.099
3. San Francisco 3.088
4. Philadelphia 3.066
5. Miami 3.064
The numbers represent the geographic mean of the income ratio of 2 households drawn at random. The large Notheast cities obviously do not fare well. San Francsico had the lowest percentage of households between $25,000 and $100,000. Miami has the highest percentage of below $25,000 households.
The most equal metros among this group are below:
1. Riverside 2.729
2. Minneapolis 2.737
3. Phoenix 2.768
4. Washington 2.802
5. Tampa 2.803
This list includes all 4 metros (all but Washington) where over 80% of the population has incomes between $15,000 and $150,000. Riverside obviously benefits from being a somewhat suburban area of LA. Washington is the excpetion to the inequality on the East Coast, as a result of the influence of the unique federal influence.
1. New York 3.271
2. Boston 3.099
3. San Francisco 3.088
4. Philadelphia 3.066
5. Miami 3.064
The numbers represent the geographic mean of the income ratio of 2 households drawn at random. The large Notheast cities obviously do not fare well. San Francsico had the lowest percentage of households between $25,000 and $100,000. Miami has the highest percentage of below $25,000 households.
The most equal metros among this group are below:
1. Riverside 2.729
2. Minneapolis 2.737
3. Phoenix 2.768
4. Washington 2.802
5. Tampa 2.803
This list includes all 4 metros (all but Washington) where over 80% of the population has incomes between $15,000 and $150,000. Riverside obviously benefits from being a somewhat suburban area of LA. Washington is the excpetion to the inequality on the East Coast, as a result of the influence of the unique federal influence.
Alternate Method of Measuring Inequality
Given the problems with the Gini index ability to measure income inequality, I've looked to some alternate methods. These include average relative deviation from mean or median, both in absolute and logarithmic terms, and numerous measures of total difference within a population. One that performs well was the mean log difference. Unlike the mean difference (the numerator of the Gini) the mean log difference does not need normalization. And it is fairly easily explained - the geometric mean of the ratios between the income of any 2 people in the population.
One difficulty is calculation. In matrix form, the calculation is straightforward. Using a matrix is more challenging, however, when dealing with a list of numerous geographies.
One difficulty is calculation. In matrix form, the calculation is straightforward. Using a matrix is more challenging, however, when dealing with a list of numerous geographies.
Gini Index
In doing some tests on the Gini index, the results reveal some serious flaws in its ability to describe income inequality. Let's set up five income groups with incomes of 10, 25, 50, 100, 200. When equal portions are in each group, the Gini index for this population is .473. When the population is moved to intentionally produce greater inequality, say by taking everyone from the middle group (50) and distributing them equally to the extremes (10 and 200), the Gini index doesn't move much; it's now .488. And when the population is moved entirely to the 2 extremes: 50% each to both the 10 and 200 cohort, the index drops to .452. This is odd, given that this is the most extreme inequality possible with this cohort setup - half in the richest group, half in the poorest.
Moving people from the 200 cohort to the 10 cohort raises the Gini index, but doing the reverse actually lowers it. At 30% in the 10 and 70% in the 200, the Gini is .279. Vice versa, it's .596. But the total income difference is identical in both cases. The difference is the mean which is the denominator. And when the mean is highly unrepresentative of a large portion of the population, the index produces absurd results. This is likely a potential problem with any inequality measure which is normalized by a mean.
Moving people from the 200 cohort to the 10 cohort raises the Gini index, but doing the reverse actually lowers it. At 30% in the 10 and 70% in the 200, the Gini is .279. Vice versa, it's .596. But the total income difference is identical in both cases. The difference is the mean which is the denominator. And when the mean is highly unrepresentative of a large portion of the population, the index produces absurd results. This is likely a potential problem with any inequality measure which is normalized by a mean.