Tuesday, January 29, 2008

The hammer, or the sledgehammer? A small study in simulation

RKN over at Epidemiology blog had a small problem he solved using simulation:

I have been interested in the following question: if there are, let's say, 5 genes involved in the liability of fx risk, and each gene has two alleles with one of them conferring a greater risk, what is the chance that an individual will have 0, 1, 2, 3, 4, or 5 risk alleles?

Obviously, the answer depends on the allelic frequency. I am too lazy to work out by algebraic probability, so I have conducted a simple simulation (using SAS) as follows:

There are 5 genes with allelic frequencies being 1%, 5%, 10%, 15% and 20%
Assuming that they are independent (zero linkage disequilibrium)

He then programmed a short simulation in SAS using 100,000 replicates. However, remembering the advice at this site, I wondered if SAS's random number generator was up to the task. The period of this generator (a linear congruential generator) is 232-1, and the quality decreases after the production of the square root of this number of pseudorandom numbers. (Around 216, or 65536.) At 500,000 numbers, the quality of the random numbers should start to decrease.

I didn't run the battery of tests on the numbers, but I did replicate the experiment using R, which uses the Mersenne twister as its default generator. I got the following as a result:


0 1 2 3 4
0.57481 0.34678 0.07177 0.00633 0.00031

This isn't too far off of what RKN got. So maybe the sledgehammer wasn't necessary here.