Mathematician wrote two long, much appreciated comments to Uniformly and/or Randomly Driving Towards
One Half. I'll reproduce them
here, with only very slight formatting changes (I am not in favour of gaps
between the final word in a question and the question mark). My response follows below.
Mathematician Comment 1:
I'm glad to see that you are still trying to solve mathematical
problems, but I am a little bit disappointed to see that you are still not
using words as they should be used and that you still fall into common
misconception about probabilities.
Your whole point seems to be that the answer 1/2 is THE answer to
Bertrand's paradox, and that other methods of choosing a chord are skewed. I'm
not sure that I could explain why this is not a valid point directly, so I'm
really going to use your own words to make you see where your errors are.
First, the wording of the problem. Apparently, you seem to think
that the sentence:
> If you pick, at random, a line which passes through the circle
is equivalent to the initial formulation:
> If you pick, at random, a chord of the circle
You even say: "I deliberately used different and simpler
language, but I don't think that my wording introduces or omits anything of
consequence." and this is a major flaw in your reasoning. It has many
consequences. I would go even further, the Bertrand "paradox" (which
is no paradox at all, but that's another question) is ultimately an example to
understand that some sets do not come with a natural parametrization. If you
change the wording of the problem by identifying the given set with another new
set, then is is entirely possible that the new set has a natural parametrization.
What you are doing with your rephrasing of the problem is that you
inadvertently choose a particular parametrization of the set of chords. You are
actually identifying the set of chords with the set of lines which passes
through the circle. Now, this parametrization is useful and natural, but it is
in no way unique. There are many other way to identify the set of chords to
some other set. For example, I can identify the set of chords and the set of
pairs of point on the circle. I can also identify the set of chords and the set
of points inside the circle. Both identifications are perfectly natural and
very useful.
Imagine that I ask you the problem in the following way :
> If you pick, at random, two points on the circle, what is the
probability that segment between these two points will be longer than the sides
of the equilateral triangle?
I deliberately used different and simpler language, but I don't
think that my wording introduces or omits anything of consequence. (Does this
sentence remind you of something?) Is this wording of the problem less or more
natural than your own wording? I argue that both are natural, but both are not
equivalent to the original wording ...
It might seem obvious to YOU that your wording is more natural. It
might seem more obvious to ME that my wording is more natural. But in the end,
there is absolutely no reason to think that one is better than the other.
It seems that I need to cut my comment in two half because it is
too long ... so I will get back after a short break
Mathematician Comment 2:
Now after this wording of the problem, you argue that the answer
should be 1/2, referring to Jaynes treatment. I agree with you, but I don't
think that you understand the words you are using:
> Jaynes appears to be suggesting one, appealing to rotational
and scale invariance.
Let me ask you a question. I give you a precise circle. The set of
chord of this precise circle is a well-defined set. Why does it mean for a
measure on this set to be "scale invariant"?
The "scale and rotational invariance" is meant to be
applied to a measure on the set of all lines in the plane. This is a completely
different set from the set of chord on a specific circle. Now, what Jaynes
meant is that if we choose a parametrization of chords as lines that passes
through the circle, then we should put a measure on the set of chords that
comes from a measure on the set of lines. Moreover, the measure chosen on the
set of lines should not depend on the placement of the circle
(translationally/rotationally invariant) and on the size of the circle (scale
invariant). And fortunately, there is a unique measure on the set of all lines
in the plane that satisfies these properties.
But I can choose a different parametrization of the set of chords,
for example by the two endpoints on the circle. Then there is absolutely no
reason to think that the measure on the set of chords should come from a
measure on the set of lines. And the expression "scale invariant" is
meaningless here, because the circle is fixed, and the circle is NOT scale
invariant. The only thing that would make sense is "rotationally
invariant". It turns out that there is a natural measure on the set of
pairs of points on the circle, and it gives you a DIFFERENT measure on the set
of chord.
> However, my argument here is that the set of chords selected
by the 1/3 method is skewed.
Skewness is a relative notion. In this problem, there is no point
of reference which would be the "unskewed" result.
The fact that you can get back the 1/2 answer by pulling one of the
endpoint of the chord towards infinity is rather neat. But it is in no way an
indication that 1/2 is a better result ...
> This is precisely the problem (in my humble opinion) with the
1/4 method, because Cartesian co-ordinates are used within a circle.
What??? Again your misunderstanding of basic mathematics is
surfacing. There is absolutely no reason to use cartesian coordinate to define
a uniform probability distribution on the disc. And moreover, you can use
cartesian coordinate without "skewing" the result.
But more importantly your method of "randomly choosing a point
in che circle":
> we would select, at random, an angle 0 > θ > 2π from the
x axis and a distance from the locus of the circle, r, where 0 > r > R
does not produce the result that you probably think it produces.
The probability measure that is induced by this process is not the uniform
probability measure on the disc ...
If you were familiar with polar coordinates, you would know that
the usual area is given by "r dr dθ", but what you did was taking the
measure given by "dr dθ" (I'm simplifying the argument because it seems
irrelevant to make a full course on measure theory here). Of course, your
mistake is "lucky" because it produced the result that you wanted :
1/2. But the mistake is real, and hence the result is relatively meaningless.
In conclusion, your main mistake is to think that there is only one
natural parametrization of the set of chords. But even if there were a
"best" parametrization, it is irrelevant in the context. You should
understand that the Bertrand paradox is not about chords at all ...
Perhaps I should give you another example of a similar problem:
* Pick, at random, a right triangle inscribed inside a circle of
diameter 1. What is the probability that one angle of the triangle is less than
pi/6?
How would you answer the question?
neopolitan's response:
First off, I have to repeat that I am not a professional
mathematician and have no intention of spending another six years or more at
university to become a Doctor of Mathematics.
The last time I checked, the vast majority of the world's population,
maybe even the population of the universe, are not maths docs. Therefore, I think it is not unreasonable
that I don't always use the correct terminology agreed to within the
mathematical cabal.
It is possible that you are intentionally sending a message
along the line of "get a maths doctorate or shut the fuck up" but I
don't think you are. Unintentionally,
however, this is precisely the message you seem to be sending with some of your
comments (some have been far worse, perhaps with intent). I hope you take this with in the spirit with
which it is intended - I am curious, other people are curious, and we curious
people don't need to be frapped down by supercilious experts for the most minor
of infractions. If at all possible, it’s
better to get to the meat of our misunderstandings. And I was not being sarcastic when I wrote
that your comments are much appreciated.
You make a comment about lines and chords, as if I don't
know the difference. Please see A Farewell to the Bertrand Paradox
in which I think I make it pretty clear that I understand the difference (I
repeatedly wrote "You now have two points on the circle, between which is
a chord") - although in retrospect it seems like I don't understand the
word "farewell". My
implication, although not clearly expressed, is that a line passing through a
circle defines a chord and a chord (of length greater than zero) uniquely
defines a line. If this is fundamentally
wrong (as opposed to just oddly phrased), please advise.
You then get into parameterisation (or perhaps
parametrization, if there is a meaningful difference beyond our spelling
preferences). To the extent that I
understand you, I think I agree. The
method you use to define your chords, to select your chords, makes a
difference. What we differ on is whether
there is a "right" and "wrong" way to select chords to
satisfy the Bertrand Paradox, as stated.
My position, which I accept may be fundamentally wrong, is that if your
method doesn’t arrive at a uniform distribution of chords (by some reasonable
measure), then your method isn't "at random".
The question that arises immediately is by what
"reasonable measure" can I claim that the 1/2 method produces a
uniform distribution of chords and the other two methods don't. Below I may go some way to explain what I had
in my head, but first … invariance.
I suspect that we would agree that the distribution of
chords in a circle is invariant in terms of rotation, translation and
scale. To clarify, imagine a circle with
a locus at (0,0), defined by radius of 1 and an orientation such that θ=0
aligns with the positive y-axis (let's set Point A to (0,1)=(1,0), if you get
my little joke [and yes, I know it’s more conventional to take θ from the
x-axis, but it's just a convention and my convention has the vertex of the
triangle, and thus Point A, at the top]).
The distribution of chords in that circle (given the same
parameterisation) is not affected if: we move the circle (to a locus at
(random(x),random(y)); rotate the circle (to an orientation at random(θ) from
the y-axis); or increase the size of the circle (to a radius of
random(R)). I think we would
agree on that, but I may be missing something.
So long as I am not missing something crucial, therefore, a
circle of the type I suggested (locus at (0,0) and (r=1,θ=0) coinciding with
(x=0,y=1)) can stand in for circles of any size and location and Point A at
(0,1)=(1,0) can represent all possible positions on the circumference, because
using any other position on the circumference is equivalent to this circle
being rotated. If I am wrong about this,
then the following probably falls apart.
Borrowing from myself (at reddit), my "reasonable
measure" of a uniform distribution is such that if you drew a
representative sample of the chords with an arbitrarily small width (I am
aware that chords don't have widths), then the resultant density would be
smooth throughout the circle.
Wikipedia has something close to
what I am talking about, just before they get into the "classical
solution". However, I limit my visualisation of it to one orientation of
the circle, so
- 1) put Point 1 at the apex of a [notional] equilateral triangle, draw a set of chords with an arbitrarily small angular separation (say 360 of them 1 degree apart)
- 2) select a radius and extend it out to a diameter, draw the set of chords perpendicular to the diameter with an arbitrarily small separation, say 360 of them R/180 apart
- 3) take an arbitrarily large number of points equally separated within the circle, say 360 of them, draw the set of chords for which the points are their midpoint
I am pretty sure that only 2) will smoothly fill the circle
(for example if all chords are drawn with a width of R/360). I am also pretty
sure that there will be some arcane argument as to why this either doesn't
matter, is not (sufficiently) stringent or is completely wrong-headed - but as
I said, it is what I had in mind.
I don't want to spend a ridiculous amount of time on this,
so for the purposes of showing this, I will use 16 rather than 360 chords and I
am not going to fuss about making the images pretty:
1)
2)
3)
To my mind, the distribution created by method 2 is uniform
and smooth in a way that the others simply aren't. Someone did complain that my
argument here is apparently based on aesthetics - the method 2 result looks
nicer. That's not really my point, my
point is that the density of chords is smooth, no matter where you look in the
circle. In the other two, either you
have a clumping of chords (in method 1 there are more chords surrounding a
point directly below the top of the circle [it's actually worse than I've represented]) or (in method 3) chords crossing. I do understand that as you keep rotating the
circle to obtain a new set of chords in method 1, as your number of unique sets
approaches infinity, the gaps will disappear, but the clumping will remain at
the rim of the circle (some seem to want to call it a disc, hopefully I don't
confuse anyone by calling it a circle).
Similarly, I understand that the gaps will disappear if we consider more
and more chord midpoints in method 3, but this introduces a similar clumping
effect at the rim of the circle - to a greater extent, by which I mean the density
of chords at the rim will be even greater than produced by method 1.
You gave the following challenge: "Pick, at random, a
right triangle inscribed inside a circle of diameter 1. What is the probability
that one angle of the triangle is less than pi/6?"
I understand that this is the same question, because 2.cos(π/6) is √3, so to be
consistent, my answer would have to be 1/2.
This is, however, on the understanding that I am picking from an
existent set of all right triangles, not a set of triangles created by
bisecting the circle and then picking a point at random on the circumference
then drawing chords between that point and the two ends of the diameter
previously established.
I think you've helped me here though. If we think about the absolute maximum
proportion of unique right triangles with one angle less than π/6 that we could draw in the circle,
using any method, the answer comes to 1/2.
We could, for example, draw the set of right triangles using the 1/2
method for chord selection, then draw a diameter from one end of the chord,
then complete the triangle. This, I
think you would agree, results in 1/2.
Knowing this, it seems odd to me that you can walk back from this figure
to 1/3.
I note that, when unconstrained by a circle, can you draw your triangle by first drawing the
hypotenuse (say at length 2R), then drawing a line from one end at an angle to
the hypotenuse chosen at random such that 0 < θ < π/2 and then completing the triangle
by drawing the final side as required.
The criterion will be satisfied in the ranges 0 < θ < π/6 and π/3 < θ < π/2, so 2/3 of the time.
Also, as you do
this, the right angle vertex describes a circle - meaning that there is a
problem with the 1/3 answer. I've drawn
that below as well, hopefully it's sufficiently clear (note that I have tried
to show pi/3 around the created circle, I am not trying to imply that pi/6 (the
angle on the left in the triangle at this point) is pi/3).
Thinking about this did lead me to stumble over another way
to create a set of chords: draw a diameter (just line segment of length 2R will
do), then repeatedly draw circles around one end of the diameter (or line
segment) at arbitrarily small increments until you reach of circle of radius
2R. With each circle, draw the tangent
that intersects with the other end of the diameter (or line
segment). The points at which these
tangents touch each of this series of circles describe a semicircle.
The result obtained by this method is 1/2, despite not
appearing to be a reformulation of the classic 1/2 method. Perhaps it is but I’ve simply not worked out
how yet, but in a rough modelling (2000 data points) the distribution does not
appear to be the same when viewed in histogram form.
---
On parameterisation, I did some thinking about this along
the lines of saying that if you have a 1/3 answer, then it seems (to me) that
your selection method must simply have missed some of the chords. In my way of thinking (standard caveat about
the possibility of being wrong), if we are asked to select a chord "at
random" then it follows that we would be selecting from a set of ALL
chords, rather than from a specific subset, unless advised otherwise. Thought from this perspective, our first
concern is making sure that we have ALL chords available to select from. The question then is how to express this
properly. I'm probably going to mess this
up in some obscure way, but if you can at least try to understand what I am
saying (and criticise the best formulation of my argument, rather than the
worst), it would be appreciated.
I suggest that an expression for ALL chords in a circle
defined by x2+y2=1
(in units of R where R is the radius of the circle) goes something like
this:
The infinite set S of all unique sets Si of points that
fulfil the following criteria:
S:
-1 > c > 1 (defining the y axis
intercept of the chord)
0 > θ > 2π (defining the gradient of the chord)
Si:
-√((-cosθ)2+(c-sinθ)2) >
r > √((cosθ)2+(c+sinθ)2)
(x,y) = (r.cosθ,r.sinθ+c)
Note: the combined effect of these two conditions is (or is intended) to
include all and only points between intercepts of the line defined by (x,y) = (r.cosθ,r.sinθ+c)
and the circle defined by x2 + y2 = 1, thus defining a
chord. In other words a unique set Si
is intended to define a unique chord.
When corrected in
terms of mathematical terminology, etc, is this a parameterisation and, if so,
does it establish or define a structure (per u/Vietoris) for which there is a
defined probability measure (per u/Vietoris) or probability distribution
(per u/overconvergent)? And, if so, what Bertrand Paradox related answer
would be expected from this parameterisation and associated probability measure/distribution?
---
With luck, I have already addressed your other points either
here, or in comments at reddit. If I
have missed something key, please let me know.
> I think it is not unreasonable that I don't always use the correct terminology agreed to within the mathematical cabal.
ReplyDeleteIt's not unreasonable. You could have your own vocabulary and write perfectly coherent answers. The problem is that you are using your own definition for certain words (like "scale invariant" for example) and then you justify your reasoning with other sources (like Jaynes article) that doesn't use the same definition for the word. That's not reasonable.
> we curious people don't need to be frapped down by supercilious experts for the most minor of infractions.
You are writing a blog, which is probably intended to be read by as many people as possible. If the minor infractions lead to huge reasoning mistakes, then it seems important to say it. If not for you, then for the curious people without mathematical background that will read your blog. If you want people to learn things when reading your blog, you should perhaps learn them as well ...
Mathematics is powerful because it is precise and rigorous. What you might consider as "nitpicking" is the very reason why mathematics is so successful ...
> a line passing through a circle defines a chord and a chord uniquely defines a line.
Yes, this is a correct statement.
> My position, which I accept may be fundamentally wrong, is that if your method doesn’t arrive at a uniform distribution of chords (by some reasonable measure), then your method isn't "at random".
My position is that "uniform distribution" is one of these word that have a precise meaning in mathematics. It turns out that it doesn't mean anything when you talk about the set of chords. If you want to use your own definition of uniform, then by all means, do it. But remember that it would be an arbitrary choice of what "uniform" means.
> my "reasonable measure" of a uniform distribution is such that if you drew a representative sample of the chords with an arbitrarily small width , then the resultant density would be smooth throughout the circle.
First of all, what does "smooth" and "density" mean here ? Again, "smooth" and "density" are two of these words that have a precise definition in mathematics. And "smooth density" makes perfect sense with these definitions. But I'm quite certain that you are not using the colloquial definition of smooth and density here. Probably, the idea that you want to convey is the following :
if I take a small disc inside the circle, the proportion of chord that passes through this small disc, is proportional to the area of the disc. Does that sound good ?
Second of all. Even if I found this property reasonable, what would it prove ? As you say, this is YOUR reasonable measure. The point of the paradox is that different measures give rise to different answer. So yes, if you pick one measure that satisfy some nice property, you will get one nice answer. But there are many different nice properties that a measure could satisfy ... Asking for the "density of chord to be smooth" is relatively arbitrary. Why not the density of pairs of end points ? Why not the density of midpoints ? Why not some other thing ?
Second part (because of the 4096 character limit)
Delete> if you have a 1/3 answer, then it seems (to me) that your selection method must simply have missed some of the chords
This is completely wrong.
> if we are asked to select a chord "at random" then it follows that we would be selecting from a set of ALL chords, rather than from a specific subset, unless advised otherwise
Yes, and both method 1 and 3 are selecting from the set of ALL chords, so I don't really understand where you are going ...
> is this a parameterisation
Yes
> does it establish or define a structure for which there is a defined probability measure.
You are identifying the set of chords with a rectangle [-1,1]x[0,pi]. The rectangle has a nice measure (the uniform one). So I guess the answer is yes.
> And, if so, what Bertrand Paradox related answer would be expected from this parameterisation and associated probability measure/distribution?
Well, it requires some computations, but I would expect a result much higher than 1/2. If I had more time I could do it ...
> If the minor infractions lead to huge reasoning mistakes, then it seems important to say it. If not for you, then for the curious people without mathematical background that will read your blog. If you want people to learn things when reading your blog, you should perhaps learn them as well
DeleteI guess I agree with regard to "minor infractions" that lead to huge mistakes (but I would consider it to be a major infraction in disguise in that case). While it would be great to teach, I suppose, I don't have as great aspirations. I do tend to learn things myself while going through the processes that lead to articles, even if I don't necessarily seem to learn these things immediately or well. If someone learns something as a consequence of my writing process, even via my mistakes, then that's a happy bi-product.
> Well, it requires some computations, but I would expect a result much higher than 1/2.
It shouldn't, unless something strange has happened as a consequence of my using -1 > c > 1 instead of 0 > c > 1 and keeping 2π when in context I probably should have reduced it to π. What I've suggested here is the same as what leads to figure 2) - the 1/2 method extended out to a diameter.
> if I take a small disc inside the circle, the proportion of chord that passes through this small disc, is proportional to the area of the disc. Does that sound good?
Yes, I would add that the location of the disc should not change the proportion of chords passing through this "small disc". This is achieved with the 1/2 method, but not the 1/3 method or the 1/4 method.
I'm obviously focusing on other things, but it just struck me that you must be Vietoris. Certain things should have given it away earlier, but I just remembered that when quoting Vietoris, I also had to redact a space before a question mark. I thought at the time that it might be a weird mathematics convention, but then I see you talking about a disc rather than a circle and things fall into place. Or I may be wrong again and you are Vietoris' slight less evil twin. It's hard to tell.
I've looked up the precise meaning of disc in mathematical terms and can sort of see why you use it (although it seems to me that "the area of the disc" in the context that you used it is the disc). It does seem to avoid the problem of zero length chords in the 1/3 method, so long as the disc remains open :)
> Asking for the "density of chord to be smooth" is relatively arbitrary. Why not the density of pairs of end points? Why not the density of midpoints? Why not some other thing?
The question as posed talks about choosing chords "at random", it does not talk about choosing end points or midpoints "at random". If it did I would expect (absent any qualifying comment) a smooth density of end points or midpoints. The same applies with any other thing you'd care to mention.
> Well, it requires some computations, but I would expect a result much higher than 1/2. If I had more time I could do it ...
DeleteOk, actually I'm not sure that I am computing the correct probability here. Tell me if this is your idea :
First you pick a number between -1 and 1, uniformly on the interval [-1,1]. And then you pick an angle between 0 and pi, uniformly on the interval [0,pi]. The chord corresponding to the couple (r,θ) is the unique chord that has slope θ and that cuts the horizontal axis at r. Is that okay ?
So this defines a probability on the set of chords. And with this, the probability that a random chord is longer than sqrt(3) is given by the following formula :
P= 1/3 + ln(7+4*sqrt(3))/2pi = 0.7525...
I might be wrong here, but it seems reasonable.
> It shouldn't, unless something strange has happened as a consequence of my using -1 > c > 1 instead of 0 > c > 1 and keeping 2π when in context I probably should have reduced it to π. What I've suggested here is the same as what leads to figure 2) - the 1/2 method extended out to a diameter.
DeleteYes, sorry, I must be completely wrong here. I didn't understand quite well what you were saying with your definition and I interpreted it in a completely stupid way. So the previous comment (with the computation), does not make any sense.
Anyway, I still don't understand what you are saying. What is this parameter c ? You say it's the "y axis intercept of the chord", do you mean the intersection between the chord and the vertical axis ?
> Yes, I would add that the location of the disc should not change the proportion of chords passing through this "small disc". This is achieved with the 1/2 method, but not the 1/3 method or the 1/4 method.
You are completely right. But this doesn't change the fact that this requirement is relatively arbitrary. It probably looks very natural to you because it corresponds to the visual notion of "uniform", but there are other very natural requirement that would lead to other notion of "uniform".
For example, wouldn't you expect that when you choose chords "uniformly randomly" then the position of endpoints is also uniform ? I mean, I would expect that from a uniform probability (honestly, I'm not lying).
So if we look at the two endpoints of the chord with their angle coordinates in [0,2pi]x[0,2pi] (one endpoint of a chord is a point on the circle and it corresponds to an angle). Then it turns out that method 2 does not satisfy this very natural property of placement of endpoints. The position of points in the square [0,2pi]x[0,2pi] are "clustered" along a diagonal. But method 1 does give a nice uniform picture for the position of endpoints.
But again, I'm getting away from the initial problem. The problem is not to define one natural probabilty measure on the set of chords and then answer the question. The problem is that the question is ambiguous and requires a probability measure to be defined.
> it just struck me that you must be Vietoris
Well, I guess there is no point in hiding it anymore. My french habits (of putting a space before quotation marks) betrayed me
> I've looked up the precise meaning of disc in mathematical terms and can sort of see why you use it.
A circle is a 1-dimensional curve in the plane. A disc is the 2-dimensional surface that is enclosed by the circle. So it's important to make the distinction, whether you talk about the endpoints (which are on the circle) or the midpoint (which is in the disc) of a chord (which is the intersection of a line and a disc). I know that this is not a really important distinction but if we start using one word for another, we will never stop.
> The question as posed talks about choosing chords "at random", it does not talk about choosing end points or midpoints "at random"
You got me there. But again, I'm not saying that it's wrong to ask for this property. What I'm saying is that this is a rather arbitrary choice for a property. I mean let's phrase it that way : why should a "uniform" probability measure on the set of chords satisfy this property and not some other very nice property ?
I'm thinking about another problem right now to illustrate what I mean. I will try to describe it to you about it in another comment. I hope it will be understandable.
> Yes, sorry, I must be completely wrong here. I didn't understand quite well what you were saying with your definition and I interpreted it in a completely stupid way. So the previous comment (with the computation), does not make any sense.
DeleteI am disproportionately happy you say this, even if I think you might have been far less stupid and wrong than you imply. I've been beating myself up harshly since you wrote your comment with the P=1/3+ln(7+4*sqrt(3))/2pi answer, because this answer triggered a recognition that my attempt to nail down an appropriate set failed - I don't have an equation to back that assertion up, but I can do a rough modelling of it (which I will put in a post to follow this).
> For example, wouldn't you expect that when you choose chords "uniformly randomly" then the position of endpoints is also uniform ? I mean, I would expect that from a uniform probability (honestly, I'm not lying).
If I understand some of what has been said here and over at reddit, "uniform" can be moderated by the probability density - so if we wanted a "uniformly random" selection of adult height from a human population, we should expect to get a bell curve and thus to even it out we'd have to notionally divide our sample set by the standard bell curve. That is, at 175cm if we'd expect 1000 more random samples than at 120cm and we set 175cm samples to unity, then we would multiply the 120cm samples by a 1000. I'm not saying we'd do this necessarily, it's just that if we did this, a representative sample thus moderated would be flat. It'd be like saying "we got 100% of the expected 175cm samples, and 100% of the expected 120cm samples". Can you see what I mean?
In the same way, a uniform distribution of the position end points could be thus moderated - where would we expect the end points to be to achieve what we are after - a uniform distribution of chords?
> a chord ... is the intersection of a line and a disc
I know that what I do and don't like is inconsequential, but I really like this definition of a chord. It's so elegant. I realise that it may appear that I have a vested interest after having talked about lines passing through a circle - if only I had this terminology available to me at the time, I could have saved myself some heartache.
> I'm thinking about another problem right now to illustrate what I mean. I will try to describe it to you about it in another comment. I hope it will be understandable.
If you want to or need to use diagrams or otherwise do something that is awkward in these comments, please feel free to email me with what you want so say. I can then post it as separate article and give a considered response in a later one. You can find my email on this site, but to make it slightly easier without incurring a further storm of robot-initiated spam - my email is in the form of x @ elitemail.org where x=neopolitan. This address has some rather heavy spam filters after years of abuse, so let me know here if you send anything, just in case it gets smashed. (Don't forget to strip your personal data off anything you send, but if you forget, I will do so before posting anything.)
Me quoting me:
Delete> but I can do a rough modelling of it (which I will put in a post to follow this)
Done, here.
Okay, I will provide my example. Its purpose is to illustrate the fact that the notion of "uniformity" that you use is arbitrary and in no way "obvious". The notion seems to give a sensical answer in the context of chords but not in other very similar contexts, which makes it relatively meaningless.
ReplyDeleteFirst, I think I understand what you mean by "a smooth density of chords" or "uniform distribution of chords" even if that's clearly not the correct terms. I believe that it could be defined rigorously and we both agreed about a reasonable definition of it that I rephrase here :
«If the area of the large disc is A, we want that the proportion of chords that intersect a smaller disc of area b, to be exactly b/A». That's more or less the formulation of Jaynes argument from the wikipedia page. And it's a perfectly well-defined property.
Now the other problem :
«Say you have a compact interval of the real line (in other words, a line segment [a,b]). If you pick, at random, a subinterval of that interval. What is the probability that the length of the subinterval is more than half the length of the initial interval ?»
I hope the problem is clear enough. It seems to be of the same type than Bertrand problem : The answer will depend on the way I choose a random subinterval inside my interval. The answer that I would find the most natural is that you should pick uniformly the two endpoints. This gives a probability measure on the set of subintervals, and with this probability measure (which is in no way the only possible one), the answer is easily computed to be 1/4.
Obviously, you should disagree with this answer. Because we are not talking about the endpoints of the subintervals, but about the subintervals themselves. If you draw a random sample of subintervals with this method, you will find that there are much more subinterval around the middle of the interval than around the boundary of the interval. So, if I'm not mistaken, you should interpret this behavior of subinterval as something "not uniform". So let's try to do this with your point of view.
We use the same definition of "uniform" that I stated before, adapted to the situation : If the length of the large interval is L, we want that the proportion of subintervals that passes through a smaller interval of length k, to be exactly k/L. Does that seem like a correct way to phrase your intuition of uniformity in this case ?
FACT : With a probability distribution that satisfies this property, the answer is 0 ! (I'm actually very surprised because I wasn't expecting that when I started the computations)
Proof: (sorry for the mathematical jargon but it's necessary here) Let P be such a probability, and X a random subinterval of [0,1] with respect to P. The desired property is P(X meets [a,b]) = b-a.
The variable X gives rise to two random variables H and T that take value in [0,1], corresponding to the Head and Tail of the subinterval. We have
P(X meets [a,b]) = P(H in [0,a) and T in [a,1]) + P(H in [a,b]) = b-a
So P(H in [0,b] ) = b. This implies that P(H in [a,b]) = b-a
But then, for any a, P(H in [0,a) and T in [a,1]) = 0.
So in conclusion P(H=T) = 1.
In other words, the probability that a random subinterval has length 0 is equal to 1. End of proof.
So, do you accept that as a reasonable answer ? If not, what is different from the previous case with chords ?
My actual point of view, is that both answers are based on the arbitrary choice of a probability measure on the set of subintervals. None is better than the other, because the original question is not well-defined. I hope this was clear enough ...
My instinct would be to keep dividing the L up like this:
Delete1/1
1/2 (2)
1/3 (3) + 2/3 (2)
1/4 (4) 1/2 (1), 3/4 (2) (other subdivisions already accounted for)
1/5 (5), 2/5 (4), 3/5 (3), 4/5 (2)
1/6 (6), 1/3 (2), 2/3 (1), 5/6 (2) (other subdivisions already accounted for)
...
1/N (subdivisions already accounted for, if even, otherwise unknown)
There's probably a series of some sort that can work this out up to an arbitrarily large value of N. Of those I would take the sub-divisions that were greater than L/2 and divide that by the number of unique subdivisions in total.
Then I'd do it again with another even larger arbitrarily large number, say 2N.
From this I could tell whether the value had stabilised or was still tending towards another final value (perhaps zero, which could be possible as N->infinity). Zero does make sense given that for any k>L/2, you could divide the remaining length into an infinite number of infinitesimally short segments. I am aware that this isn't a particularly elegant way to think of it, I'm just pointing out that we would reason our way towards that conclusion quite easily (plus for every even value of N, some of the longer segments have already be considered, so as N->infinity, the proportion of longer values of k falls behind).
If I make N=5 (not really large, I know, but it's laborious by hand to say the least, not to mention fraught with danger that I might make a mistake):
P(N=5,k>L/2)=0.3448
P(2N=10,k>L/2)=0.2819
I'd agree that this is almost certainly heading towards zero (albeit in a bit of a drunken walk).
I have no idea about what you are doing ... No idea at all ...
DeleteBut you seem to think that the 0 answer is not unreasonable. That's a bit of a shock. Let me rephrase the conclusion :
"If I pick, at random, a subinterval of the interval [0,1], there is a 100% chance that it contains only one single point. "
Read that sentence again. You really, honestly believe that this could be the "one true answer" ?
What I was doing was thinking about the set of all possible subintervals in the interval (note that I see this as being quite a different sort of task as finding all the possible chords on a disc). Doing that I can imagine a regime in which the probability of selecting a subinterval k of length more than L/2 (with a granularity based on an arbitrarily large number N) approaches 0 as N->infinity.
DeleteBut, it seems wrong to me too. When I run a monte carlo on it, I get very close to 1/2 (after 40000 samples) - this is based on two random numbers in the interval [0,1]. So, I'd reject my instinct, and suggest that 1/2 might be a more reasonable figure than 0.
My apologies, I failed to drag down one of my columns.
DeleteThe answer is closer to 1/4.
(By which I mean, after a million iterations, the proportion I get is 0.249656.)
Delete> The answer is closer to 1/4
DeleteYes, the exact probability is 1/4 (the computation is an exercise for undergrad), if you assume that you pick a random interval with respect to the probability distribution given by the following method :
"Pick two points uniformly randomly in [0,1], take the subinterval between these two points"
However, if we require that the probability distribution is "translation and scale invariant" (as Jaynes argument in the chord problem), then we arrive at the answer 0.
This problem is not that different from the chords problem. In both cases you are considering an infinite set of straight line segments. Yet, there seems to be at least two different probability measures on such a set of straight line segment. One given by choosing endpoints uniformly, and the other by choosing the chord distribution to be "smooth" or "without granularity". One method is seemingly more natural in one case, and the other method is more natural in the other case.
Now, if I give you a third similar problem (let's say for example, "pick a straight line segment inside a disc, not necessarily a chord"), how would you choose A PRIORI what "at random" means ? What selection method is the "natural" one ? Should you use the Jaynes argument about invariance ? Should you use a uniform distribution of the endpoints ? Should you use something else ?
If you cannot answer these questions A PRIORI (without doing any prior experiment or any reference to another problem) it means that your choice of a "natural" probability measure is A POSTERIORI, and hence it means that it's not "natural" at all.
That was my point here ...
I think there's a difference between the line segment on disc question and the chord question, a difference that revolves around what I call granularity. In both cases in an interval (the 1/4 and 0 answers), you'd have a granularity of 1/N, where N is an arbitrarily large number that approaches infinity. As you bump up the granularity, you'll see the approximation draw closer and closer to either 1/4 or 0, depending on your method (I suspect that it might make a significant difference if you include the k=0.5L subintervals).
DeleteBut with the chord question, if you use the 1/2 method, or the variations of the other two methods that I suggest, then you have methods which are invariant to granularity (for any arbitrarily large value of N - with N close to one, there isn't any real granularity).
I do agree the subinterval question is curious - 0 seems to be profoundly wrong and 1/4 seems (to me - which you can emphasise, if you like) to be based less on selecting intervals and more on selecting two points.
I'd be very interested to see what the result is when "the probability distribution is 'translation and scale invariant'" and subintervals of length k=L/2 are included. Does it make a difference as I suspect it might?
Ok I thought I understood what granularity meant, but now I'm extremely confused ... so I had to read again the post where you defined granularity, and it seems to be a rather void definition.
ReplyDeleteIf I understand correctly, you are justifying that the 1/2 method does not have granularity (and the other two methods have granularity) by the following argument (stop me if I'm mistaken) :
The 1/2 method consist of choosing a radius (or an angle θ) at random (uniformly) and then a perpendicular (or a number -1 I'd be very interested to see what the result is when "the probability distribution is 'translation and scale invariant'" and subintervals of length k=L/2 are included. Does it make a difference as I suspect it might?
I thought I said that if we assume that the probability distribution is "translation and scale invariant" as in the Jaynes argument, then the only possible answer is 0. But I don't understand what you mean by "when subintervals of length k=L/2 are included", so ...
Hmm, there was apparently a problem with my post. The third paragraph starts with a sentence and suddenly jumps to the end of my answer ...
DeleteI will say it again more quickly :
The 1/2 method consist of choosing an angle θ in [0,pi] and a number c in [-1,1], both uniformly. Your argument about granularity is that if you pick an angle θ, and then choose a finite sample (of size N) of evenly spaced c, and draw chords with a certain width 1/N then the picture is "nice". (The chords do not intersect, and cover entirely the disc).
Your argument is that the other methods do not produce such a picture. But here is a problem, the order in which you choose θ and c is not important. It gives exactly the same chord, whether you choos c first or θ first.
So what happens if you try to draw to the similar picture by first choosing c, and then with c fixed, take N angles θ that are evely spaced ? Then the picture would be extremely different (the chords would intersect and would not cover the entire disc).
This is the same method. It gives the exact same probability distribution on the set of chords. And yet, just depending on how I split the method in two steps, I produce different pictures. One is nice (no granularity), the other is not nice (lot of granularity).
So this means, you cannot say that a probability distribution does or does not have granularity, because it depends on the way you choose to describe your probability distribution. Hence, that makes your notion of granularity useless ...
(I started this response before your second comment in this sequence, the one with the correction, but was called away to a canine emergency.)
DeleteAll methods can be thought of as having granularity (it's not an inherent thing in the method). It stems from the "arbitrarily large value of N, where N approaches infinity". I am sure that this is clumsy wording but what I mean is that if we use, say N=100, as the number of chords for the purposes of looking at the distribution, we will have noticeable gaps and noticeable clumping for the 1/3 and 1/4 methods, I understand that if you make N actually equal to infinity, then there will be an infinity of chords wherever on the disc you look. So, I do this approaching infinity thing, and look to see what the distribution looks like when N isn't infinity. At N=100, the 1/2 method does not have gaps or clumping.
This may have no mathematical meaning whatsoever. But for the purposes of making a colloquially comprehensible "at random" selection, it works and I can arrive at 1/2 using the three "classic" methods using reasonable adjustments to two of those methods. And I understand that this may not address the point of the Bertrand Paradox at all, and it might actually even bypass the point being made by the Bertrand Paradox (which is a bad thing to you and a good thing for me - which is not a mathematical statement at all, but a question of preference).
With the idea of including subintervals of length k=L/2, that just stuck out to me when I was doing the handraulic distributions up to 1/10ths. If I include all the 5/10ths at that time (and the 4/8ths and the 3/6th) then the proportion of subintervals such that k>=1/2 seems to approach a higher figure than zero. I am aware that it's a very short run, and it could just approach zero at a much slower rate, which is why I'd like to see what would happen. If I had the time and motivation, I could continue up to 1/20ths and see if distribution continued towards an as yet unknown value, or whether it still approaches zero in a more shallow hyperbolic-like manner.
(I'll respond to your corrected comment a bit later.)
> So what happens if you try to draw to the similar picture by first choosing c, and then with c fixed, take N angles θ that are evely spaced ? Then the picture would be extremely different (the chords would intersect and would not cover the entire disc).
DeleteIt occurs to me that what you have done, with this method is blend the 1/2 method with the 1/3 method. You could do something similar with the 1/4 method and 1/3 method (pick a point on the disc "at random" then flick a spinner to obtain a gradient "at random", draw the chord with that gradient that passes through the point). By doing so, all you end up doing is introducing the problems with the 1/3 method into the other methods.
Perhaps it was not clear to you, but the "corrected" 1/3 method (in which Point 1 was moved to infinity and lines that pass through the disc from this point are equispaced and the intersection of the lines and the disc are your chords), ends up being the 1/2 method, with the lines being parallel through the disc. We can more easily understand that process by thinking of a diameter that is perpendicular to the separation between Point 1 and the disc, and thinking of the chords that are perpendicular to the diameter (and thus parallel to the separation between Point 1 and the disc).
I use the same reasoning by which I reject the classic 1/3 method to reject your suggested method. And, no, I don't agree that it's the same thing - because I am not focussed on how we select chords, I am focussed on ensuring that we have ALL chords (and where N is less than infinity, a representative sample of ALL chords).
However ... just thinking about it, it seems that using your suggested method there will be a preponderance of chords greater than sqrt(3). Between -R/2 and R/2, all the chords will meet that criterion. Between -R and -R/2 and R/2 and R, there will be a decrease (as the distance from the locus increases) in the proportion of chords greater than sqrt(3) - until that proportion is 1/3 at the circumference. At a rough guess, I'd suggest that the answer, using this method, might be in the order of about 5/6 - but the exact figure may be more complicated.
> At N=100, the 1/2 method does not have gaps or clumping.
ReplyDeleteThe whole point of my "subintervals in intervals" example, was to show you that the problem of "gaps or clumping" is only a problem if you think it is. In the interval example, if you require that there is no "gaps or clumping" when N is finite, then the only possible answer is 0.
I think we agree that this is not reasonable at all. And that the most obvious way to select a subinterval will give gaps and clumping.
So how do you choose A PRIORI, in which contexts "gaps or clumping" are problematic, and in which contexts they are not ?
> if distribution continued towards an as yet unknown value, or whether it still approaches zero
As far as I understand what you are trying to do, I'm pretty sure that it will approach 0.
I'm not sure that what you are doing proves anything at all, but that's another problem ..
> Perhaps it was not clear to you, but the "corrected" 1/3 method, ends up being the 1/2 method
No it was clear.
> And, no, I don't agree that it's the same thing
Let me repeat something for sake of clarity :
For any (c,θ) in [-1,1]x[0,pi], there exists a unique chord that is at distance c from the center, in direction θ.
The "1/2 method of selecting a chord", amounts to pick a couple (c,θ) uniformly in the rectangle [-1,1]x[0,pi]. Do we agree on that ?
When you draw your picture to show "granularity", what you are doing is that you choose a θ, once and for all, and then you take 100 values of c that are evenly spaced in [-1,1].
What I'm suggesting is that you do the opposite : Choose a c, once and for all, and then take 100 values of θ that are evenly spaced in [0,pi].
In the end, this is exactly the same method, but you're not drawing the same picture. (In mathematical terms, you are just doing a projection on one of the coordinates)
> because I am not focussed on how we select chords, I am focussed on ensuring that we have ALL chords (and where N is less than infinity, a representative sample of ALL chords).
Can you provide a single example of a chord that you can get with the 1/2 method, but that you cannot get with the 1/3 method ?
> Between -R and -R/2 and R/2 and R, there will be a decrease in the proportion of chords greater than sqrt(3)
You are apparently thinking that "c" should be taken in a predefined direction, and then choose another direction θ. It's not what I said. Just fix some c, once and for all, and then choose a bunch of θ, and then draw the chords corresponding to (c,θ).
So, when c is between, R/2 and R (and between -R and -R/2), the proportion of chords greater than sqrt(3) is 0. So the final answer is 1/2. (Which is absolutely not surprising because it's exactly the same method)
> And I understand that this may not address the point of the Bertrand Paradox at all, and it might actually even bypass the point being made by the Bertrand Paradox
If we both agree that what you are trying to do is pointless, then great !
I'm kidding, but this is a fundamental problem in your thought process. Doing computations and experimenting is an excellent thing, and you should continue. But you need to have a deeper understanding of what is really going on, to be able to draw interesting conclusions from these computations and experiments. And by that I mean a mathematical understanding.
You did the same with Monty Hall. You performed some thought experiments or simulations (with urns and balls if I recall correctly), and you thought that you understood something new about the Monty Hall problem, when in fact you didn't understand what you were doing ...
The problem here is similar. You make some computations (with histograms), you draw some pictures (with granularity), and you try to draw interesting conclusions about Bertrand paradox. But again, it seems that you don't really understand what you are doing ...
To address some of your main points, I will generate a new article (it's a bit safer, with less risk of a long response being lost due to the vagaries of this commenting mechanism).
DeleteFor the moment:
> You did the same with Monty Hall. You performed some thought experiments or simulations (with urns and balls if I recall correctly), and you thought that you understood something new about the Monty Hall problem, when in fact you didn't understand what you were doing ...
> The problem here is similar. You make some computations (with histograms), you draw some pictures (with granularity), and you try to draw interesting conclusions about Bertrand paradox. But again, it seems that you don't really understand what you are doing ...
I think you are quite right to be cautious, I'd advise everyone to be cautious about everything anyone says (including myself). You say that it seems that I don't understand what I am doing. Is it possible (even vaguely) that we are talking past each other to some extent?
I've tried to explain the basis of my confusion with Monty Hall in (My) Ignorance Behind Marilyn Gets My Goat. I made another attempt to explain what I originally had in mind (before I got distracted by my confusion) in Monty Chooses. If you think that I still have it wrong, then I guess I have to throw my hands up and walk away from it (or from you, I suppose).
Response is here.
Delete