Repeated Sampling

I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?

asked Nov 11 '18 at 23:41

Johannes Heß

Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38

add a comment |

asked Nov 11 '18 at 23:41

Johannes Heß

Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38

add a comment |

asked Nov 11 '18 at 23:41

Johannes Heß

r random statistics sampling

asked Nov 11 '18 at 23:41

Johannes Heß

asked Nov 11 '18 at 23:41

Johannes Heß

asked Nov 11 '18 at 23:41

Johannes Heß

asked Nov 11 '18 at 23:41

Johannes Heß

asked Nov 11 '18 at 23:41

Johannes Heß

Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38

add a comment |

Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38

Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38

add a comment |

4 Answers
4

active

oldest

votes

If you're controlling for the seed, both approaches should yield identical outcomes:

set.seed(1)

mean(sample(1:9, 3))

#[1] 5.666667

mean(sample(1:9, 3))

#[1] 4

mean(sample(1:9, 3))

# [1] 5.333333



set.seed(1)

x <- sample(1:9)

mean(x[1:3])

#[1] 5.666667

mean(x[4:6])

#[1] 4

mean(x[7:9])

# [1] 5.333333

answered Nov 11 '18 at 23:59

12b345b6b78

767115

I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56

By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21

add a comment |

Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.

# set seed to make reproducible 

set.seed(95014)

# generate 10,000 means of 1,000 items pulled from a uniform distribution 

mean_x <- NULL

for (i in 1:10000){

     mean_x <- c(mean_x,mean(runif(1000)))  

}

hist(mean_x)

...and the output:

enter image description here

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

add a comment |

@ Len Greski
I can also do it that way right?

a <- runif(10000000)

j <- 1

x <- NULL

while (j <= 10000000){

  x <- c(x,mean(a[j:(j+999)]))

  j <- j + 1000

}

x

hist(x)

answered Nov 12 '18 at 1:32

Johannes Heß

Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11

add a comment |

I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.

#First Method

seed <- 5554

set.seed(seed)

group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))

set.seed(seed)

mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))



#Method you propose

set.seed(5554)

big_sample <- data.frame(

                group=rep(1:10000, each=1000),

                samples=rnorm(10000 * 1000, 0, 1)

)



group_means_2 <- aggregate(samples ~ group,

                           FUN = mean, 

                           data=big_sample)



mean_of_means_2 <- mean(group_means_2$samples)



#comparison

mean_of_means_1 == mean_of_means_2



t.test(group_of_means_1, group_means_2$samples)

answered Nov 12 '18 at 5:47

Kgrey

1613

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254368%2frepeated-sampling%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

If you're controlling for the seed, both approaches should yield identical outcomes:

set.seed(1)

mean(sample(1:9, 3))

#[1] 5.666667

mean(sample(1:9, 3))

#[1] 4

mean(sample(1:9, 3))

# [1] 5.333333



set.seed(1)

x <- sample(1:9)

mean(x[1:3])

#[1] 5.666667

mean(x[4:6])

#[1] 4

mean(x[7:9])

# [1] 5.333333

answered Nov 11 '18 at 23:59

12b345b6b78

767115

I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56

By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21

add a comment |

If you're controlling for the seed, both approaches should yield identical outcomes:

set.seed(1)

mean(sample(1:9, 3))

#[1] 5.666667

mean(sample(1:9, 3))

#[1] 4

mean(sample(1:9, 3))

# [1] 5.333333



set.seed(1)

x <- sample(1:9)

mean(x[1:3])

#[1] 5.666667

mean(x[4:6])

#[1] 4

mean(x[7:9])

# [1] 5.333333

answered Nov 11 '18 at 23:59

12b345b6b78

767115

I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56

By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21

add a comment |

If you're controlling for the seed, both approaches should yield identical outcomes:

set.seed(1)

mean(sample(1:9, 3))

#[1] 5.666667

mean(sample(1:9, 3))

#[1] 4

mean(sample(1:9, 3))

# [1] 5.333333



set.seed(1)

x <- sample(1:9)

mean(x[1:3])

#[1] 5.666667

mean(x[4:6])

#[1] 4

mean(x[7:9])

# [1] 5.333333

answered Nov 11 '18 at 23:59

12b345b6b78

767115

If you're controlling for the seed, both approaches should yield identical outcomes:

set.seed(1)

mean(sample(1:9, 3))

#[1] 5.666667

mean(sample(1:9, 3))

#[1] 4

mean(sample(1:9, 3))

# [1] 5.333333



set.seed(1)

x <- sample(1:9)

mean(x[1:3])

#[1] 5.666667

mean(x[4:6])

#[1] 4

mean(x[7:9])

# [1] 5.333333

answered Nov 11 '18 at 23:59

12b345b6b78

767115

answered Nov 11 '18 at 23:59

12b345b6b78

767115

answered Nov 11 '18 at 23:59

12b345b6b78

767115

answered Nov 11 '18 at 23:59

12b345b6b78

767115

I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56

By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21

add a comment |

I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56

By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21

I don't get the same results using your code.
– Johannes Heß
Nov 12 '18 at 0:56

By default, sampling is without replacement, so the example is wrong.
– user2554330
Nov 12 '18 at 1:21

add a comment |

# set seed to make reproducible 

set.seed(95014)

# generate 10,000 means of 1,000 items pulled from a uniform distribution 

mean_x <- NULL

for (i in 1:10000){

     mean_x <- c(mean_x,mean(runif(1000)))  

}

hist(mean_x)

...and the output:

enter image description here

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

add a comment |

# set seed to make reproducible 

set.seed(95014)

# generate 10,000 means of 1,000 items pulled from a uniform distribution 

mean_x <- NULL

for (i in 1:10000){

     mean_x <- c(mean_x,mean(runif(1000)))  

}

hist(mean_x)

...and the output:

enter image description here

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

add a comment |

# set seed to make reproducible 

set.seed(95014)

# generate 10,000 means of 1,000 items pulled from a uniform distribution 

mean_x <- NULL

for (i in 1:10000){

     mean_x <- c(mean_x,mean(runif(1000)))  

}

hist(mean_x)

...and the output:

enter image description here

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

# set seed to make reproducible 

set.seed(95014)

# generate 10,000 means of 1,000 items pulled from a uniform distribution 

mean_x <- NULL

for (i in 1:10000){

     mean_x <- c(mean_x,mean(runif(1000)))  

}

hist(mean_x)

...and the output:

enter image description here

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

answered Nov 12 '18 at 0:15

Len Greski

3,1201421

add a comment |

@ Len Greski
I can also do it that way right?

a <- runif(10000000)

j <- 1

x <- NULL

while (j <= 10000000){

  x <- c(x,mean(a[j:(j+999)]))

  j <- j + 1000

}

x

hist(x)

answered Nov 12 '18 at 1:32

Johannes Heß

Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11

add a comment |

@ Len Greski
I can also do it that way right?

a <- runif(10000000)

j <- 1

x <- NULL

while (j <= 10000000){

  x <- c(x,mean(a[j:(j+999)]))

  j <- j + 1000

}

x

hist(x)

answered Nov 12 '18 at 1:32

Johannes Heß

Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11

add a comment |

@ Len Greski
I can also do it that way right?

a <- runif(10000000)

j <- 1

x <- NULL

while (j <= 10000000){

  x <- c(x,mean(a[j:(j+999)]))

  j <- j + 1000

}

x

hist(x)

answered Nov 12 '18 at 1:32

Johannes Heß

@ Len Greski
I can also do it that way right?

a <- runif(10000000)

j <- 1

x <- NULL

while (j <= 10000000){

  x <- c(x,mean(a[j:(j+999)]))

  j <- j + 1000

}

x

hist(x)

answered Nov 12 '18 at 1:32

Johannes Heß

answered Nov 12 '18 at 1:32

Johannes Heß

answered Nov 12 '18 at 1:32

Johannes Heß

answered Nov 12 '18 at 1:32

Johannes Heß

Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11

add a comment |

Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11

Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
– Len Greski
Nov 22 '18 at 15:11

add a comment |

#First Method

seed <- 5554

set.seed(seed)

group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))

set.seed(seed)

mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))



#Method you propose

set.seed(5554)

big_sample <- data.frame(

                group=rep(1:10000, each=1000),

                samples=rnorm(10000 * 1000, 0, 1)

)



group_means_2 <- aggregate(samples ~ group,

                           FUN = mean, 

                           data=big_sample)



mean_of_means_2 <- mean(group_means_2$samples)



#comparison

mean_of_means_1 == mean_of_means_2



t.test(group_of_means_1, group_means_2$samples)

answered Nov 12 '18 at 5:47

Kgrey

1613

add a comment |

#First Method

seed <- 5554

set.seed(seed)

group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))

set.seed(seed)

mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))



#Method you propose

set.seed(5554)

big_sample <- data.frame(

                group=rep(1:10000, each=1000),

                samples=rnorm(10000 * 1000, 0, 1)

)



group_means_2 <- aggregate(samples ~ group,

                           FUN = mean, 

                           data=big_sample)



mean_of_means_2 <- mean(group_means_2$samples)



#comparison

mean_of_means_1 == mean_of_means_2



t.test(group_of_means_1, group_means_2$samples)

answered Nov 12 '18 at 5:47

Kgrey

1613

add a comment |

#First Method

seed <- 5554

set.seed(seed)

group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))

set.seed(seed)

mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))



#Method you propose

set.seed(5554)

big_sample <- data.frame(

                group=rep(1:10000, each=1000),

                samples=rnorm(10000 * 1000, 0, 1)

)



group_means_2 <- aggregate(samples ~ group,

                           FUN = mean, 

                           data=big_sample)



mean_of_means_2 <- mean(group_means_2$samples)



#comparison

mean_of_means_1 == mean_of_means_2



t.test(group_of_means_1, group_means_2$samples)

answered Nov 12 '18 at 5:47

Kgrey

1613

#First Method

seed <- 5554

set.seed(seed)

group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))

set.seed(seed)

mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))



#Method you propose

set.seed(5554)

big_sample <- data.frame(

                group=rep(1:10000, each=1000),

                samples=rnorm(10000 * 1000, 0, 1)

)



group_means_2 <- aggregate(samples ~ group,

                           FUN = mean, 

                           data=big_sample)



mean_of_means_2 <- mean(group_means_2$samples)



#comparison

mean_of_means_1 == mean_of_means_2



t.test(group_of_means_1, group_means_2$samples)

answered Nov 12 '18 at 5:47

Kgrey

1613

answered Nov 12 '18 at 5:47

Kgrey

1613

answered Nov 12 '18 at 5:47

Kgrey

1613

answered Nov 12 '18 at 5:47

Kgrey

1613

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu