Cauchy’s proof of Geometric Mean-Arithmetic Mean Inequality
I am trying to teach myself mathematical analysis. As with everyone who gets introduced to analysis, the Geometric Mean-Arithmetic Mean Inequality (GM-AM) is something which has to be studied. I looked at the GM-AM as an example of proof by induction. Cauchy’s proof served as the template. The proof is straightforward until the last step of the reverse induction when the proof demonstrates that P(N+1) implies P(N). P(N+1) is assumed and then the final term on the LHS and the RHS is chosen so that P(N+1) reduces to P(N). The choice made is the arithmetic mean up to the nth term. And the fact that choice is made to fix the (n+1)th term as arithmetic mean up to the nth term really bothers me. Why would that particular choice result in a general solution?
Not the usual thing that gets talked about here but it does say that this is a forum about mathematics too. And I know that there are some people with really high IQs around here so maybe you guys can help me out.
Not the usual thing that gets talked about here but it does say that this is a forum about mathematics too. And I know that there are some people with really high IQs around here so maybe you guys can help me out.
Have a look at this. Two proofs are given.
http://en.wikipedia.org/wiki/Inequality ... tric_means
ruveyn
Wikipedia strilkes again...
Note that the way you worded your question... you do realise that the full proof first unquestioningly shows that P(2^K) is true, and only then starts descending from P(N+1) to P(N).
In order to prove P(N), when we already KNOW that P(N+1) holds for ALL possible lists - hence it is certainly true for the special case where the final element just happens to be the average of the first N elements.
Actually, the proof given above does not merely add one term, and then rely on going back one step... it adds as many terms (all equal to the average of the first N) as you like, to go straight to any P(2^K) with 2^K > N.
The choice of the arithmetic mean of the terms so far is nothing but a convenient value to choose. I suspect a similar proof could use the geometric mean of the terms so far. It just gives a nice simplification of the workings needed.
...
And now... what I suspect is your real question...
How did Cauchy "guess" that it was a good way to proceed towards a proof?
That's the problem with mathematics - and any attempt to "automate" it... where do the flashes of insight come from, which turn a problem from "unsolved" into "solved". When the proof disappears off down an avenue that no one had noticed before.
_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer
Note that the way you worded your question... you do realise that the full proof first unquestioningly shows that P(2^K) is true, and only then starts descending from P(N+1) to P(N).
In order to prove P(N), when we already KNOW that P(N+1) holds for ALL possible lists - hence it is certainly true for the special case where the final element just happens to be the average of the first N elements.
Actually, the proof given above does not merely add one term, and then rely on going back one step... it adds as many terms (all equal to the average of the first N) as you like, to go straight to any P(2^K) with 2^K > N.
The choice of the arithmetic mean of the terms so far is nothing but a convenient value to choose. I suspect a similar proof could use the geometric mean of the terms so far. It just gives a nice simplification of the workings needed.
...
And now... what I suspect is your real question...
How did Cauchy "guess" that it was a good way to proceed towards a proof?
That's the problem with mathematics - and any attempt to "automate" it... where do the flashes of insight come from, which turn a problem from "unsolved" into "solved". When the proof disappears off down an avenue that no one had noticed before.
There is no way to automate mathematical creativity. To do that we would have to manufacture a mathematical "mind" of high caliber. That is ten pay grades above current technology.
ruveyn
Note that the way you worded your question... you do realise that the full proof first unquestioningly shows that P(2^K) is true, and only then starts descending from P(N+1) to P(N).
In order to prove P(N), when we already KNOW that P(N+1) holds for ALL possible lists - hence it is certainly true for the special case where the final element just happens to be the average of the first N elements.
Actually, the proof given above does not merely add one term, and then rely on going back one step... it adds as many terms (all equal to the average of the first N) as you like, to go straight to any P(2^K) with 2^K > N.
The choice of the arithmetic mean of the terms so far is nothing but a convenient value to choose. I suspect a similar proof could use the geometric mean of the terms so far. It just gives a nice simplification of the workings needed.
...
And now... what I suspect is your real question...
How did Cauchy "guess" that it was a good way to proceed towards a proof?
That's the problem with mathematics - and any attempt to "automate" it... where do the flashes of insight come from, which turn a problem from "unsolved" into "solved". When the proof disappears off down an avenue that no one had noticed before.
Thank you. I understand the argument, but for some reason I am not accepting it because I can easily imagine specific sequences of numbers where the arithmetic mean up to the nth term does not equal the numbers following it. In other words, I still cannot see it generalizing. It is one of those things that will take time and I know myself to be slow on the uptake.
On a less depressing note, the bidirectional induction is a brilliant and beautiful method of proof. It is certainly not something that I would have thought of. My study of mathematics thus far has certainly made me very happy. Yesterday I learned how to define limits in terms of all epsilon and some delta. I went to very good schools in Singapore and my teachers did prove things like if f(x) and g(x) have a limit at (a), then the composition of the functions would also have a limit at (a). And yet they made no effort to introduce formal mathematical notations like quantifiers. The classes on limits were expressed using baby talk of small (x) and small (y). And that is because most of my classmates, many of whom ended up going to medical school with me (I dropped out), were simply uninterested. Basic aspects like set notation should be taught immediately after grade school and repeated every year until graduation from high school so that supple young minds can become fluent in the language of mathematics. Instead, it was all about computation. In that world, it was not truth but performance that mattered.
Thank you. I understand the argument, but for some reason I am not accepting it because I can easily imagine specific sequences of numbers where the arithmetic mean up to the nth term does not equal the numbers following it. In other words, I still cannot see it generalizing. It is one of those things that will take time and I know myself to be slow on the uptake.
...
Imagine away... you are quite correct... there are infinitely many sequences that do not follow that pattern. We do not need them. We only need those that do fit the pattern in order to provide the proof of the inequality for the shorter sequences.
P(1) ia pretty self-evident.
We can prove P(2).
Then we prove that P(N) implies P(2*N).
Hence we have proved P(2^K) for all K.
Then, to prove P(N), where N ia not a power of two, we just extend the sequence to a longer sequence, which happens to be a power of two. We know the inequality holds for this sequence. The proof on the Wikipedia page just shows that given the inequality for the longer sequence, should we have chosen to extend the sequence with a particular, convenient set of values (which we are at complete liberty to choose), then P(N) can easily be proven to be true.
Just thinking of the way P(3) is derived...
If a, b and c are all equal, GM(a,b,c) = a = AM(a,b,c). Otherwise...
We know P(4) is true. I.e. GM(a,b,c,d) < AM(a,b,c,d) for all a, b, c and d.
In particular, GM(a,b,c,d) < AM(a,b,c,d) when d=(a+b+c)/3.
Note that d = AM(a,b,c) = AM(a,b,c,d).
So GM(a,b,c,d) < d.
I.e. (a*b*c*d)^(1/4) < d
Or a*b*c*d < d^4
Or a*b*c < d^3
Or (a*b*c)^(1/3) < d
Or GM(a,b,c) < d = AM(a,b,c)
_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer
P(1) ia pretty self-evident.
We can prove P(2).
Then we prove that P(N) implies P(2*N).
Hence we have proved P(2^K) for all K.
Then, to prove P(N), where N ia not a power of two, we just extend the sequence to a longer sequence, which happens to be a power of two. We know the inequality holds for this sequence. The proof on the Wikipedia page just shows that given the inequality for the longer sequence, should we have chosen to extend the sequence with a particular, convenient set of values (which we are at complete liberty to choose), then P(N) can easily be proven to be true.
Just thinking of the way P(3) is derived...
If a, b and c are all equal, GM(a,b,c) = a = AM(a,b,c). Otherwise...
We know P(4) is true. I.e. GM(a,b,c,d) < AM(a,b,c,d) for all a, b, c and d.
In particular, GM(a,b,c,d) < AM(a,b,c,d) when d=(a+b+c)/3.
Note that d = AM(a,b,c) = AM(a,b,c,d).
So GM(a,b,c,d) < d.
I.e. (a*b*c*d)^(1/4) < d
Or a*b*c*d < d^4
Or a*b*c < d^3
Or (a*b*c)^(1/3) < d
Or GM(a,b,c) < d = AM(a,b,c)
Slowly the mist begins to lift, but the dandelions are not yet evident. I spent last week reading an introductory text on mathematics written by Bob Dumas and John E. McCarthy called “Transition to Higher Mathematics”. The book is meant to be an intermediate step towards Walter Rudin’s “Functional Analysis”, which I have and intend to read. I covered in 4 days what the authors said ought to take 10 weeks. 2 days were spent reading an introductory text on probability theory.
It is fun but exhausting. So tiring in fact that by the end of each day, my brain literally rebels. Yesterday was spent on my rekindled love with the game Go. I actually managed to play for more than 12 hours straight.
I say all of this as something of an apology because it has taken me this long to reply. I am truly grateful for your help. I see it now. It does not matter how we choose the terms after the kth term. If for all 2^kth series the inequality is true, then we can select the particular 2^kth series that proves the kth series.
THANK YOU.
THANK YOU.
That's quite OK.
This was all reminiscent of my own problems, understanding the Hashlife algorithm.
_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer