Proposed chi squared statistical test on the chronology of a class of writings from antiquity sometimes called the New Testament Apocrypha, often referred to also as the non canonical NT literature. It includes the"Gnostic literature", and the "Other Gospels". It does NOT include the bible, rather all other texts that have at one time been involved or associated with Christian origins.
The null hypothesis in this instance is the paradigm that this class of literature was authored in the 2nd, 3rd and 4th centuries. The alternative hypothesis here is that this class of literature appeared only after 325 CE.
If the [physical] primary evidence of Christian non canonical manuscripts in antiquity is summarised and analysed according to its chronology, a post Nicene provenance is strongly suggested. The secondary evidence furnished by the church in antiquity, that such manuscripts were known to the church prior to 325 CE, is being tested.
Discrete data arise fom a counting process, while continuous data arise from a measuring process. I am counting things: pages of evidence discovered to date. There are two "buckets"  one < 325 and one after 325 CE.
The following link is to a page that contains a chi squared statistical test on this issue. Is there anyone in this group who is able to provide any feedback or assessment on the validity of this proposed test?
Thanks in advance.
http://www.mountainman.com.au/essenes/W ... ostics.htm
There are a number of ways to count instances of the primary evidence (to be placed into two "buckets" .... a) < 325 CE, and b) > 325 CE.
1) Counting the total number of pages (leaves) discovered.
2) Counting the total number of codices (books) discovered.
3) Counting the distinct number of texts (stories) discovered.
4) Counting the number of distinct manuscript discoveries (i.e. The 12 Nag Hammadi codices would be counted as one discovery).
I have received opinion that of all the above, option 4 may be the most independent. Any comments?
Supposing one flipped a coin and recorded heads or tails. How many consecutive heads (or tails) would one have to flip in a row in order that the chi squared test provide a significant value and thus indicate that there was a significant bias in either the coin or the testing process?
A couple of things, just because you can throw statistics at something, doesn't mean that the statistics are meaningful. People generate all sorts of meaningless statistics all the time unfortunately.
So the primary question appears to be "what is the likelihood that the sources are actually pre325CE but most of these didn't survive? Instead the post325CE texts survived?" Perhaps they are trying to imply that the post325CE texts are older than they really are? As if they are some sort of duplicate? That results in a null hypothesis that post325CE texts are more likely to survive than pre325CE. Except there is an alternative null hypothesis not considered, the distribution of noncanonical texts pre and post325CE are not equal. Meaning that the noncanonical gospels are effectively all from the 4th century or later.
The statistical analysis can't distinguish between these two null hypotheses. All it can do is evaluate the data coming in and the data being input is widely skewed towards the post325CE texts because only one text has a date pre325CE (it actually doesn't have a a pre325CE date. It has a C14 date range that extends into the post325CE category too). This is compounded by an assumption the author makes by giving equal probability to finding pre and post325CE evidence (by saying that the total number of pages theoretically available for the pre and post325CE categories is equal at 800). This artificially skews the results based on their assumption of equal probability. Resulting in a "significant" pvalue indicating low survivorship of pre325CE texts. Except this conclusion is unsubstantiated by any corroborating evidence it appears. The only piece of evidence lending credibility to this conclusion would be a tentative C14 date that is within error of the post325CE category. And (as previously mentioned) that date alone doesn't substantiate the conclusion.
That means that it would actually be more likely that the noncanonical gospels are all 4th century or later.
A couple of things, just because you can throw statistics at something, doesn't mean that the statistics are meaningful. People generate all sorts of meaningless statistics all the time unfortunately.
The page referred to in the OP was one that I hastily put together and probably leaves a lot to be desired in terms of setting out the problem. I did maths, stats, pure and applied in the early 70's and much is not as sharp to me as it was then.So the primary question appears to be "what is the likelihood that the sources are actually pre325CE but most of these didn't survive? Instead the post325CE texts survived?" Perhaps they are trying to imply that the post325 CE texts are older than they really are? As if they are some sort of duplicate?
The background to the OP relates to the field of ancient history. The subject matter here is the question of the chronology of authorship of the literature classed as the NT apocrypha.
https://en.wikipedia.org/wiki/New_Testament_apocrypha
These books are currently presumed to have been authored over about a 300 year period between something like 125425 CE. The only evidence concerning these books prior to recent times was that recorded and preserved by the (4th century) church. Very few of these books were known or available  they had been destroyed. The paradigm of this authorship in the 2nd, 3rd and 4th centuries follows the secondary sources of the church fathers such as Irenaeus, who mentions for example the Gospel of Judas c.200 CE
In recent centuries numerous manuscript discoveries have occurred by which many of these books have become available to historians and academics. Such as the Nag Hammadi Codices. These represent the primary evidence.
That results in a null hypothesis that post325CE texts are more likely to survive than pre325CE. Except there is an alternative null hypothesis not considered, the distribution of noncanonical texts pre and post325CE are not equal. Meaning that the noncanonical gospels are effectively all from the 4th century or later.
The statistical analysis can't distinguish between these two null hypotheses. All it can do is evaluate the data coming in and the data being input is widely skewed towards the post325CE texts because only one text has a date pre325CE (it actually doesn't have a a pre325CE date. It has a C14 date range that extends into the post325CE category too). This is compounded by an assumption the author makes by giving equal probability to finding pre and post325CE evidence (by saying that the total number of pages theoretically available for the pre and post325CE categories is equal at 800). This artificially skews the results based on their assumption of equal probability. Resulting in a "significant" pvalue indicating low survivorship of pre325CE texts. Except this conclusion is unsubstantiated by any corroborating evidence it appears. The only piece of evidence lending credibility to this conclusion would be a tentative C14 date that is within error of the post325CE category. And (as previously mentioned) that date alone doesn't substantiate the conclusion.
That means that it would actually be more likely that the noncanonical gospels are all 4th century or later.
This is actually the position that I am defending. Again I must apologise for not setting out the problem clearly in the first place. It is so long since I used stats and since I set out a hypothesis testing arrangement in a statistical manner.
Allow me to set out the null and alternate hypotheses.
NULL HYPOTHESIS:
Mainstream paradigm (following the secondary evidence of the church) that the authorship of books in the NT Apocrypha occurred reasonably continuously over the 300 year period 125425 CE.
ALTERNATE HYPOTHESIS:
Following the recently available primary evidence, dated almost completely and entirely in the mid 4th century, the authorship of the books of the NTA occurred after 325 CE. They represent a literary reaction to the appearance and authority of the Bible which was at that time widely published by Constantine, and used as a political instrument. Authorship between 325335 CE.
TEST SCENARIO
I broke the period 125435 CE into two periods:
125325 CE: 200 years (bucket 1)
325425 CE: 100 years (bucket 2)
The unequal bucket ranges were only designed to be over generous to the null hypothesis.
Arguably ALL the primary evidence available falls into bucket 2 anyway.
CHI SQUARED TEST
I was under the impression that this test may have been the best one to use in order to test the null and alternate hypotheses.
My assumption is that if the null hypothesis indicates that books were written in the 2nd and 3rd (as well as 4th) centuries then in theory some remnants of these books could be discovered and be dated to the 2nd or 3rd century, but none really have to date as far as I can ascertain.
The alternative hypothesis says we will only find these books in the 2nd bucket.
Thus I am attempting to set out this statistical testing exercise and write a report after being away from statistical mathematics for 40 years.
Hence the "feedback sought" in the OP.
Thanks once again for your feedback. I apologise for being quite unclear in setting up the background and context and framework of this exercise. If you know of some sort of blueprint I could use in this exercise, I'd appreciate a pointer or a link to it.
The linked page above has been updated.
http://www.mountainman.com.au/essenes/W ... ostics.htm
Thanks for any feedback.'