# Math quandary



## Datura (Nov 30, 2012)

Hello all!

I have a (probably incredibly obvious) math question. Pretty basic math, and I'm probably making it more complicated than it needs to be. It involves an xls file for a research project of mine. Sorry about the messiness of the file—I didn't anticipate anybody else would end up seeing it. The red cells are the only ones I'm concerned with at the moment.

What I'm trying to find is the number of people who are 25 and older who do _not_ have a high school degree. I want to find this number for each of the seven counties, and I want to find one for the entire seven-county area. The data in the spreadsheet is from the US Census Bureau.

There are two rows where I get this information: Less than 9th grade (Row 5) and 9th to 12th grade, no diploma (Row 6). I don't particularly care about which grade level people didn't complete; I only want to know _how many people do not have a high school diploma_. Thus, I want to find the sum of both rows, which gives me a total % of people who have not finished high school. I then want to take that percentage from Row 4, which is the total population 25 years and over.

So, using the first county (Kenosha) as an example, I found the number I want by using this formula:

f(x)=D4*(D5+D6)

or, with the actual numbers:

f(x)=107,283*(.035+.078)

which gives me the number I want for Kenosha County: 12,123. Easy.

Row 16 has the formulas for each county, and L16 is the sum of the numbers for each individual county. It's 144,629. This is the number I want for the entire seven-county area.

I wanted to find this number a different way and make sure everything checks out. Take a look at column L. L4 is the total population of all seven counties; L5 is the average of all %s in Row 5; L6 is the average of all %s in Row 6. If I did things correctly, I could use this formula:

L17=L4*(L5+L6)

... and get the same number. As evidenced by the number currently in L17, that's not the case. There's a difference of about 20,000, which is significant when dealing with a relatively small population like this one. Am I going about this the wrong way? Which of the two numbers should I rely more on? If anybody could provide some insight, I'd be very grateful.


----------



## 1. Luftballon (Nov 30, 2012)

"educational attainment xls.xls" all you need is a . before the first xls!

in any case, having different numbers is correct in this case, because, while L16 weights each county in proportion to its population, L17 calculates values for each county, and then combines them, giving each county equal weight in calculating averages, but multiplies the average by the population figure, which _is_ weighted (well, sort of).

L16 is correct.

your error is L17 is ... well, I'll try an analogy, though it might not work well. we'll see.

say we have a hypothetical region with ... a hundred counties, let's say; and ninety-nine of those counties contains one person, while the last county contains a hundred one. (clearly, something is wrong with the districting here, but bear with me.) now each of those ninety-nine counties has ... a 0% has-family-in-the-same-county rate, let's say, whereas that highly-populated county has, eh, 100%, everyone has family in the same county.

by the methodology in L17, we have an average rate of 1%, and a total population is 200, so we find that there are 2 people in all the counties with family in the same county. by L16, we have ninety-nine counties with zero, and one with a hundred one, and we arrive at the (correct) figure that 101 of persons in this absurd scenario have family in the same county.

(not absurd at all, no.)


----------

