## More Standard Deviation

### Transcript

More on standard deviation. This lesson will cover three more advanced topics concerning standard deviation. So, right away I'll say, if you're just worried about getting the basics of standard deviation, everything you learned in the last video, that will really help you with standard deviation. These are relatively specialized rare questions, and you would only be seeing these if you were really doing every well in math and doing the hardest questions ion the test.

So, I would not worry about this if you're just worried about the basics of standard deviation, you could just skip this video. These are very advanced topics. Topic number one concerns how the standard deviation would change when we include new members to a list, making the list longer. This is a tricky issue for a few reasons.

Let's say we have a set with 20 members, and the set has a mean of 50 and a standard deviation of 5. So what we have here are known as summary statistics, we know the overall mean, the overall standard deviation, we don't have the list of individual data. Suppose we are gonna include two more numbers to bring the total number of members to 22.

Suppose we include 80 and 80 as the 21st and 22nd members of the list. Of course, this would change the mean, and if we wanted to, we could calculate the new mean. The new mean would be slightly more then 50. But all the deviations change, so it is impossible to calculate the new standard deviation.

So first of all, this is a subtle distinction if we have the summary data just the old mean the old standard deviation and then we add these two new points. We have absolutely no way to calculate the new standard deviation. Now, if we had the list, the original list of 20 values, and then we added the 21st and 22nd values, if we had all the values, then we could calculate the standard deviation.

But even then, that's a calculation the test is not gonna ask you to do. So in either way, you're not gonna have to worry about this calculation. All we can say, is that if we include new members that are far away from the mean of the set, the standard deviation of the new set will be larger. So, that's very clear. The two values of 80 are way far away from all the other numbers.

They're really big outliers, so adding really big outliers, that's gonna increase the standard deviation. That you need to know. We can say a little bit more if we include a pair of numbers that doesn't change the mean. If we include the numbers that are equally spaced around the mean, so one is K units above the mean, and one is K units below the mean, then we will not change the mean.

So the deviations from the mean for all the numbers on the list will stay the same and we can draw some more detailed conclusions about what the standard deviation will do. Again, let's start out with that same set, 20 numbers, mean of 50, standard deviation of 5. Suppose we include 40 and 60 as our new members, so first thing to note is 40 and 60, one is 10 above the mean, one is 10 below the mean, they're equally spaced around the mean.

So that means, they're not gonna change the mean. So the mean of the new set is also 50. Well, right away that's good, that means that none of the other deviations change. Now, let's think about this, 40 and 60 each has a distance of 10 away from the mean while the standard deviation is only 5. So these are much further from the mean, in fact each one is two standard deviations away from the mean, each one is further from the mean than the standard deviation.

So adding bigger deviations than the standard deviation will increase the standard deviation of the list. Now, we're not gonna have to actually calculate the new value of the standard deviation. It's just enough to realize that if we add 40 and 60 to this set, we're going to increase the standard deviation, because we've added numbers that are further from the mean than the standard deviation.

All right. Reset. We start out with our set of 20. Now we're gonna include 45 and 55. Once again, adding two numbers that are symmetrically spaced, five above, five below, so this not gonna change the mean, we're gonna stick with the same mean, the mean of 50, and now we're adding two numbers that are exactly at the standard deviation on either side.

One is one standard deviation above the mean and the other is one standard deviation below the mean. Each one has a deviation from the mean exactly equal in size to the standard deviation, so that means they're not gonna change the standard deviation at all. This new set will have exactly the same standard deviation as the old set so the, the old standard deviation and the new standard deviation both equal five.

So this is the only time you'd need to know the new standard deviation, know the numerical value because it stays the same, it's the same as the old numerical value. All right, reset back to the set of 20. Now suppose we include two numbers that are closer to the mean, say 47 and 53. Well, now we add two numbers, again, symmetrically spaced, one is three below the mean, three above the mean, so this will not change the mean, the mean will stay the same.

And so now, notice that these have a separation of the mean that is less than the standard deviation. They are closer to the mean then the value of the standard deviation. And so that means they would decrease the standard deviation. Now we don't need to be able to calculate the new standard deviation, but we need to recognize that the standard deviation would decrease if we added these two numbers.

Now, we might get curious what pair of numbers could we include that would most decrease the standard deviation? Well, we would have to include a pair with the smallest possible distance from the mean. The smallest possible distance of the mean, of course, is 0. You can't have a distance smaller than 0.

If the two new members of the set we include are 50 and 50, these 2 members have distance, each have a distance of 0 from the mean, so they decrease the standard deviation the most. Of all possible pairs of new members, we can include in a set, including two new members equal to the mean of the set is the pair that would decrease the standard deviation the most.

The test likes to ask about that idea. So here we go, you can see, we've added those two points right at the mean. They're gonna have deviations of 0, and so if you think of the list of deviations, putting two more 0s on that list, that decreases the standard deviation. And once again, we don't need, need to be able to calculate this new standard deviation, but we just need to recognize that it has decreased the most.

The second topic concerns the standard deviation as a unit of measurement. What do I mean by this? In very large sets, for example, populations of countries or everyone who takes the SAT, some kind of gigantic set like that, we may want to specify the position of an individual with respect to the population.

If we're told that the mean of a certain set is 50, and somebody's score is 60, what exactly does that mean? Yes, that score of 60 is above the mean, is this just something kind of mildly above the mean, or is this a really really impressive score far above the mean? Well, think about it this way, if the mean is 50 and the standard deviation is 20, then 60 is above the mean, it's half a standard deviation above the mean.

But, presumably, many numbers are gonna be higher than that. It would not be unusual in a set to have numbers that are as far as a standard deviation away from the mean or even further. So we certainly expect that there be some numbers that are one standard deviation above the mean, maybe even one and a half standard deviations above the mean. So we'd expect scores of 70 and 80, so if 60 yeah, it's above the mean but it's not among the highest scores.

And so it's not, it's kind of on the good side of average rather then being a super impressive score in this particular set. Notice what happens if we change the standard deviation though. By contrast, if the mean is 50, and the standard deviation is 2, then 60 is very, very far above the average. Because it's the standard deviation that tells us how meaningful a certain separation from the mean is, mathematicians generally use the standard deviation of any set as a unit of measurement within that set.

If this individual is 10 units above the mean and the standard deviation is 2, then this individual is 5 standard deviations above the mean. Now it's hard to emphasize how extraordinary that would be, 5 standard deviations above the mean. In terms of musical abilities, that would be the best musician in the world. In terms of athletic abilities, that would be the best athlete in the world.

That would be just someone off the charts, like, once every hundred years kind of talented. That's what, that's how impressive this score would be if it were five standard deviations above the mean. Here's a practice problem, pause the video and then we'll talk about this.

On a certain test, the score had a mean of 300 and a standard deviation of 25. If John scored three standard deviations above the mean, what was John's score? Well, the mean is 300, and he scored 3 25s above 300. Well, that would be 300 plus 75, which is 375. So, it turns out, a question like this is, actually involves a very, very simple calculation.

You're just doing, you just have to be not intimidated by the question itself. The final topic, very advanced, concerns the actual calculation of standard deviation. The test will not, repeat, not ask you to calculate the standard deviation of the list from scratch, but on a very advanced question, it could ask about some detail, some concept related to the details of this calculation.

Here are the steps in the calculation. So, first of all, we start with the list of numbers, we have to find the mean. As we said we subtract the mean and this creates a second list, the list of deviations. Then we're gonna take that list of deviations, some are positive, some are negative, we're gonna square that list.

So that will be a list of squared deviations. A third list, that list will all positive numbers because we've squared them. Now we're gonna take an average of this third list. The average squared deviation, and that number's actually called the variance. Once we have the variance, we're gonna take the square root and this is the standard deviation.

Okay, so now we'll do a sample calculation. And we'll start with a very simple list, just the integers from 1 to 9. So this is a nice symmetrical evenly spaced list, of course the number in the middle, 5, that would be both the mean and the median of this list. So we have the mean, so it's easy to figure out the deviations. So you get the list of deviations, I'm just gonna take this list number one and subtract five from every number on the list.

So, 5 of course has a deviation of zero. The numbers less than five have negative deviations. The numbers bigger than five have positive deviations. That's the second list. Now to get the third list, we just square everything. So those are the squares, notice 0 squared is 0, everything else is positive, so now we have the list of the squared deviations.

Now, we average that third list. The average of that third list is something called the variance. So the variance, the average of that list happens to be 20 over 3. That's the variance, and that is the variance of the first list, the first list has a variance of 20 over 3. To find the standard deviation, we take the square route of the variance.

So we could write it in radical form, we could simplify that radical if we wanted to, typically the standard deviation is just written as a decimal. So, we'll write it in the decimal form. So notice that what we found here, 2.582, this is the standard deviation of the list from 1 to 9, and also because of what we learned in the last video, this should be the standard deviation of any nine consecutive integers.

So, any non-consecutive integers at all would have a standard deviation of not, of 2.582. That is the entire calculation for standard. Deviation in, in all its gory detail. The test will not ask you to repeat that entire procedure. Conceivably on the very hardest quant problems, it could present some part of that procedure, it could ask about some detail.

Detail. One thing to notice, incidentally, because we're squaring, numbers that are larger, numbers that have larger deviations make a larger contribution to the standard deviation. A much larger contribution. The effect of squaring amplifies the eff, the input of the numbers that have larger deviations.

That's an important thing to notice. You may wonder why the standard deviation is defined in this particular way. This is related to its principal use. We could find the standard deviation of any list or set, but in some ways the standard deviation is designed to accompany the normal distribution, which we will discuss in the next video.

Here's a practice problem. Pause the video and then we'll talk about this. Okay. So let's talk about this problem. A camp hs 30 girls whose heights have an average of 130 centimeters and a standard deviation of four centimeters. Suppose four more girls had joined the camp, so there'll be 34 all together.

Which set of heights for these four additional girls would most increase the standard deviation of all the girls at the camp? Well certainly if we wanted to least increase the standard, if we wanted to decrease the standard deviation, we'd be adding girls. Who all had values of 130. They'd all be equal to the mean.

Well, we don't have any like that but notice with C, two of them are equal to the mean and the other two are very close to the mean, so all four of them are closer to the mean then the standard deviation. So that's gonna decrease the mean. So that's certainly not gonna increase. If we add the ones in B, notice all four of those.

We have deviations of negative three, negative one, one, and three. So those are going to all be deviations that again are less than the deviation, so that's alc, also gonna decrease the standard deviation. So that's not gonna be right. Very interesting if we look at A, there we have deviations of negative four, negative four, four and four.

All four of those numbers are exactly one standard deviation away from the mean. So, in fact, adding those four girls will keep the standard deviation at exactly four, because we're just adding more standard deviations, more deviations of the same value. So, that's gonna stay the same, and so that's not an increase. So the only one that's left that's D and if we look at D, what we're adding there are four outliers.

Each one of them is two and a half standard deviations from the mean and that's far away from the mean, and so that's gonna change the mean of the whole camp, it's gonna upset all the deviations. And the net result is that you're gonna have much larger deviations and a much larger standard deviation. So D is the answer.

In summary, we discussed the effect of the stan, on the standard deviation of including a new pair of numbers in the set. And notice that we can talk about that most sensibly for adding two numbers that don't change the mean. We discussed using the standard deviation as a unit to indicate the position of an individual in a large population, talking about how many standard deviations above or below the mean.

And we discussed the technical dat, details of the exact calculation for the standard deviation.

Read full transcriptSo, I would not worry about this if you're just worried about the basics of standard deviation, you could just skip this video. These are very advanced topics. Topic number one concerns how the standard deviation would change when we include new members to a list, making the list longer. This is a tricky issue for a few reasons.

Let's say we have a set with 20 members, and the set has a mean of 50 and a standard deviation of 5. So what we have here are known as summary statistics, we know the overall mean, the overall standard deviation, we don't have the list of individual data. Suppose we are gonna include two more numbers to bring the total number of members to 22.

Suppose we include 80 and 80 as the 21st and 22nd members of the list. Of course, this would change the mean, and if we wanted to, we could calculate the new mean. The new mean would be slightly more then 50. But all the deviations change, so it is impossible to calculate the new standard deviation.

So first of all, this is a subtle distinction if we have the summary data just the old mean the old standard deviation and then we add these two new points. We have absolutely no way to calculate the new standard deviation. Now, if we had the list, the original list of 20 values, and then we added the 21st and 22nd values, if we had all the values, then we could calculate the standard deviation.

But even then, that's a calculation the test is not gonna ask you to do. So in either way, you're not gonna have to worry about this calculation. All we can say, is that if we include new members that are far away from the mean of the set, the standard deviation of the new set will be larger. So, that's very clear. The two values of 80 are way far away from all the other numbers.

They're really big outliers, so adding really big outliers, that's gonna increase the standard deviation. That you need to know. We can say a little bit more if we include a pair of numbers that doesn't change the mean. If we include the numbers that are equally spaced around the mean, so one is K units above the mean, and one is K units below the mean, then we will not change the mean.

So the deviations from the mean for all the numbers on the list will stay the same and we can draw some more detailed conclusions about what the standard deviation will do. Again, let's start out with that same set, 20 numbers, mean of 50, standard deviation of 5. Suppose we include 40 and 60 as our new members, so first thing to note is 40 and 60, one is 10 above the mean, one is 10 below the mean, they're equally spaced around the mean.

So that means, they're not gonna change the mean. So the mean of the new set is also 50. Well, right away that's good, that means that none of the other deviations change. Now, let's think about this, 40 and 60 each has a distance of 10 away from the mean while the standard deviation is only 5. So these are much further from the mean, in fact each one is two standard deviations away from the mean, each one is further from the mean than the standard deviation.

So adding bigger deviations than the standard deviation will increase the standard deviation of the list. Now, we're not gonna have to actually calculate the new value of the standard deviation. It's just enough to realize that if we add 40 and 60 to this set, we're going to increase the standard deviation, because we've added numbers that are further from the mean than the standard deviation.

All right. Reset. We start out with our set of 20. Now we're gonna include 45 and 55. Once again, adding two numbers that are symmetrically spaced, five above, five below, so this not gonna change the mean, we're gonna stick with the same mean, the mean of 50, and now we're adding two numbers that are exactly at the standard deviation on either side.

One is one standard deviation above the mean and the other is one standard deviation below the mean. Each one has a deviation from the mean exactly equal in size to the standard deviation, so that means they're not gonna change the standard deviation at all. This new set will have exactly the same standard deviation as the old set so the, the old standard deviation and the new standard deviation both equal five.

So this is the only time you'd need to know the new standard deviation, know the numerical value because it stays the same, it's the same as the old numerical value. All right, reset back to the set of 20. Now suppose we include two numbers that are closer to the mean, say 47 and 53. Well, now we add two numbers, again, symmetrically spaced, one is three below the mean, three above the mean, so this will not change the mean, the mean will stay the same.

And so now, notice that these have a separation of the mean that is less than the standard deviation. They are closer to the mean then the value of the standard deviation. And so that means they would decrease the standard deviation. Now we don't need to be able to calculate the new standard deviation, but we need to recognize that the standard deviation would decrease if we added these two numbers.

Now, we might get curious what pair of numbers could we include that would most decrease the standard deviation? Well, we would have to include a pair with the smallest possible distance from the mean. The smallest possible distance of the mean, of course, is 0. You can't have a distance smaller than 0.

If the two new members of the set we include are 50 and 50, these 2 members have distance, each have a distance of 0 from the mean, so they decrease the standard deviation the most. Of all possible pairs of new members, we can include in a set, including two new members equal to the mean of the set is the pair that would decrease the standard deviation the most.

The test likes to ask about that idea. So here we go, you can see, we've added those two points right at the mean. They're gonna have deviations of 0, and so if you think of the list of deviations, putting two more 0s on that list, that decreases the standard deviation. And once again, we don't need, need to be able to calculate this new standard deviation, but we just need to recognize that it has decreased the most.

The second topic concerns the standard deviation as a unit of measurement. What do I mean by this? In very large sets, for example, populations of countries or everyone who takes the SAT, some kind of gigantic set like that, we may want to specify the position of an individual with respect to the population.

If we're told that the mean of a certain set is 50, and somebody's score is 60, what exactly does that mean? Yes, that score of 60 is above the mean, is this just something kind of mildly above the mean, or is this a really really impressive score far above the mean? Well, think about it this way, if the mean is 50 and the standard deviation is 20, then 60 is above the mean, it's half a standard deviation above the mean.

But, presumably, many numbers are gonna be higher than that. It would not be unusual in a set to have numbers that are as far as a standard deviation away from the mean or even further. So we certainly expect that there be some numbers that are one standard deviation above the mean, maybe even one and a half standard deviations above the mean. So we'd expect scores of 70 and 80, so if 60 yeah, it's above the mean but it's not among the highest scores.

And so it's not, it's kind of on the good side of average rather then being a super impressive score in this particular set. Notice what happens if we change the standard deviation though. By contrast, if the mean is 50, and the standard deviation is 2, then 60 is very, very far above the average. Because it's the standard deviation that tells us how meaningful a certain separation from the mean is, mathematicians generally use the standard deviation of any set as a unit of measurement within that set.

If this individual is 10 units above the mean and the standard deviation is 2, then this individual is 5 standard deviations above the mean. Now it's hard to emphasize how extraordinary that would be, 5 standard deviations above the mean. In terms of musical abilities, that would be the best musician in the world. In terms of athletic abilities, that would be the best athlete in the world.

That would be just someone off the charts, like, once every hundred years kind of talented. That's what, that's how impressive this score would be if it were five standard deviations above the mean. Here's a practice problem, pause the video and then we'll talk about this.

On a certain test, the score had a mean of 300 and a standard deviation of 25. If John scored three standard deviations above the mean, what was John's score? Well, the mean is 300, and he scored 3 25s above 300. Well, that would be 300 plus 75, which is 375. So, it turns out, a question like this is, actually involves a very, very simple calculation.

You're just doing, you just have to be not intimidated by the question itself. The final topic, very advanced, concerns the actual calculation of standard deviation. The test will not, repeat, not ask you to calculate the standard deviation of the list from scratch, but on a very advanced question, it could ask about some detail, some concept related to the details of this calculation.

Here are the steps in the calculation. So, first of all, we start with the list of numbers, we have to find the mean. As we said we subtract the mean and this creates a second list, the list of deviations. Then we're gonna take that list of deviations, some are positive, some are negative, we're gonna square that list.

So that will be a list of squared deviations. A third list, that list will all positive numbers because we've squared them. Now we're gonna take an average of this third list. The average squared deviation, and that number's actually called the variance. Once we have the variance, we're gonna take the square root and this is the standard deviation.

Okay, so now we'll do a sample calculation. And we'll start with a very simple list, just the integers from 1 to 9. So this is a nice symmetrical evenly spaced list, of course the number in the middle, 5, that would be both the mean and the median of this list. So we have the mean, so it's easy to figure out the deviations. So you get the list of deviations, I'm just gonna take this list number one and subtract five from every number on the list.

So, 5 of course has a deviation of zero. The numbers less than five have negative deviations. The numbers bigger than five have positive deviations. That's the second list. Now to get the third list, we just square everything. So those are the squares, notice 0 squared is 0, everything else is positive, so now we have the list of the squared deviations.

Now, we average that third list. The average of that third list is something called the variance. So the variance, the average of that list happens to be 20 over 3. That's the variance, and that is the variance of the first list, the first list has a variance of 20 over 3. To find the standard deviation, we take the square route of the variance.

So we could write it in radical form, we could simplify that radical if we wanted to, typically the standard deviation is just written as a decimal. So, we'll write it in the decimal form. So notice that what we found here, 2.582, this is the standard deviation of the list from 1 to 9, and also because of what we learned in the last video, this should be the standard deviation of any nine consecutive integers.

So, any non-consecutive integers at all would have a standard deviation of not, of 2.582. That is the entire calculation for standard. Deviation in, in all its gory detail. The test will not ask you to repeat that entire procedure. Conceivably on the very hardest quant problems, it could present some part of that procedure, it could ask about some detail.

Detail. One thing to notice, incidentally, because we're squaring, numbers that are larger, numbers that have larger deviations make a larger contribution to the standard deviation. A much larger contribution. The effect of squaring amplifies the eff, the input of the numbers that have larger deviations.

That's an important thing to notice. You may wonder why the standard deviation is defined in this particular way. This is related to its principal use. We could find the standard deviation of any list or set, but in some ways the standard deviation is designed to accompany the normal distribution, which we will discuss in the next video.

Here's a practice problem. Pause the video and then we'll talk about this. Okay. So let's talk about this problem. A camp hs 30 girls whose heights have an average of 130 centimeters and a standard deviation of four centimeters. Suppose four more girls had joined the camp, so there'll be 34 all together.

Which set of heights for these four additional girls would most increase the standard deviation of all the girls at the camp? Well certainly if we wanted to least increase the standard, if we wanted to decrease the standard deviation, we'd be adding girls. Who all had values of 130. They'd all be equal to the mean.

Well, we don't have any like that but notice with C, two of them are equal to the mean and the other two are very close to the mean, so all four of them are closer to the mean then the standard deviation. So that's gonna decrease the mean. So that's certainly not gonna increase. If we add the ones in B, notice all four of those.

We have deviations of negative three, negative one, one, and three. So those are going to all be deviations that again are less than the deviation, so that's alc, also gonna decrease the standard deviation. So that's not gonna be right. Very interesting if we look at A, there we have deviations of negative four, negative four, four and four.

All four of those numbers are exactly one standard deviation away from the mean. So, in fact, adding those four girls will keep the standard deviation at exactly four, because we're just adding more standard deviations, more deviations of the same value. So, that's gonna stay the same, and so that's not an increase. So the only one that's left that's D and if we look at D, what we're adding there are four outliers.

Each one of them is two and a half standard deviations from the mean and that's far away from the mean, and so that's gonna change the mean of the whole camp, it's gonna upset all the deviations. And the net result is that you're gonna have much larger deviations and a much larger standard deviation. So D is the answer.

In summary, we discussed the effect of the stan, on the standard deviation of including a new pair of numbers in the set. And notice that we can talk about that most sensibly for adding two numbers that don't change the mean. We discussed using the standard deviation as a unit to indicate the position of an individual in a large population, talking about how many standard deviations above or below the mean.

And we discussed the technical dat, details of the exact calculation for the standard deviation.