Repeated Sampling












0














I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?










share|improve this question






















  • Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
    – Rui Barradas
    Nov 12 '18 at 11:38


















0














I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?










share|improve this question






















  • Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
    – Rui Barradas
    Nov 12 '18 at 11:38
















0












0








0


1





I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?










share|improve this question













I have a question about repeated sampling. Let's say I am interested in the distribution of sample means. So what I would do is generate 10000 times a sample of size 1000 and look at the mean of each sample. Can I instead just take one sample of size 10000*1000 and then look at the mean of the first 1000 elements than from 1001 to 2000 and so on?







r random statistics sampling






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 '18 at 23:41









Johannes Heß

1




1












  • Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
    – Rui Barradas
    Nov 12 '18 at 11:38




















  • Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
    – Rui Barradas
    Nov 12 '18 at 11:38


















Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38






Yes, but it's simpler to do it the first way: X <- replicate(10000, rnorm(1000)); colMeans(X). Instead of rnorm use the distribution of your choice. And you should set.seed(<something>) before generating pseudo-random numbers.
– Rui Barradas
Nov 12 '18 at 11:38














4 Answers
4






active

oldest

votes


















0














If you're controlling for the seed, both approaches should yield identical outcomes:



set.seed(1)
mean(sample(1:9, 3))
#[1] 5.666667
mean(sample(1:9, 3))
#[1] 4
mean(sample(1:9, 3))
# [1] 5.333333

set.seed(1)
x <- sample(1:9)
mean(x[1:3])
#[1] 5.666667
mean(x[4:6])
#[1] 4
mean(x[7:9])
# [1] 5.333333





share|improve this answer





















  • I don't get the same results using your code.
    – Johannes Heß
    Nov 12 '18 at 0:56










  • By default, sampling is without replacement, so the example is wrong.
    – user2554330
    Nov 12 '18 at 1:21



















0














Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.



# set seed to make reproducible 
set.seed(95014)
# generate 10,000 means of 1,000 items pulled from a uniform distribution
mean_x <- NULL
for (i in 1:10000){
mean_x <- c(mean_x,mean(runif(1000)))
}
hist(mean_x)


...and the output:



enter image description here






share|improve this answer





























    0














    @ Len Greski
    I can also do it that way right?



    a <- runif(10000000)
    j <- 1
    x <- NULL
    while (j <= 10000000){
    x <- c(x,mean(a[j:(j+999)]))
    j <- j + 1000
    }
    x
    hist(x)





    share|improve this answer





















    • Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
      – Len Greski
      Nov 22 '18 at 15:11



















    0














    I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.



    #First Method
    seed <- 5554
    set.seed(seed)
    group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
    set.seed(seed)
    mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))

    #Method you propose
    set.seed(5554)
    big_sample <- data.frame(
    group=rep(1:10000, each=1000),
    samples=rnorm(10000 * 1000, 0, 1)
    )

    group_means_2 <- aggregate(samples ~ group,
    FUN = mean,
    data=big_sample)

    mean_of_means_2 <- mean(group_means_2$samples)

    #comparison
    mean_of_means_1 == mean_of_means_2

    t.test(group_of_means_1, group_means_2$samples)





    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254368%2frepeated-sampling%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      If you're controlling for the seed, both approaches should yield identical outcomes:



      set.seed(1)
      mean(sample(1:9, 3))
      #[1] 5.666667
      mean(sample(1:9, 3))
      #[1] 4
      mean(sample(1:9, 3))
      # [1] 5.333333

      set.seed(1)
      x <- sample(1:9)
      mean(x[1:3])
      #[1] 5.666667
      mean(x[4:6])
      #[1] 4
      mean(x[7:9])
      # [1] 5.333333





      share|improve this answer





















      • I don't get the same results using your code.
        – Johannes Heß
        Nov 12 '18 at 0:56










      • By default, sampling is without replacement, so the example is wrong.
        – user2554330
        Nov 12 '18 at 1:21
















      0














      If you're controlling for the seed, both approaches should yield identical outcomes:



      set.seed(1)
      mean(sample(1:9, 3))
      #[1] 5.666667
      mean(sample(1:9, 3))
      #[1] 4
      mean(sample(1:9, 3))
      # [1] 5.333333

      set.seed(1)
      x <- sample(1:9)
      mean(x[1:3])
      #[1] 5.666667
      mean(x[4:6])
      #[1] 4
      mean(x[7:9])
      # [1] 5.333333





      share|improve this answer





















      • I don't get the same results using your code.
        – Johannes Heß
        Nov 12 '18 at 0:56










      • By default, sampling is without replacement, so the example is wrong.
        – user2554330
        Nov 12 '18 at 1:21














      0












      0








      0






      If you're controlling for the seed, both approaches should yield identical outcomes:



      set.seed(1)
      mean(sample(1:9, 3))
      #[1] 5.666667
      mean(sample(1:9, 3))
      #[1] 4
      mean(sample(1:9, 3))
      # [1] 5.333333

      set.seed(1)
      x <- sample(1:9)
      mean(x[1:3])
      #[1] 5.666667
      mean(x[4:6])
      #[1] 4
      mean(x[7:9])
      # [1] 5.333333





      share|improve this answer












      If you're controlling for the seed, both approaches should yield identical outcomes:



      set.seed(1)
      mean(sample(1:9, 3))
      #[1] 5.666667
      mean(sample(1:9, 3))
      #[1] 4
      mean(sample(1:9, 3))
      # [1] 5.333333

      set.seed(1)
      x <- sample(1:9)
      mean(x[1:3])
      #[1] 5.666667
      mean(x[4:6])
      #[1] 4
      mean(x[7:9])
      # [1] 5.333333






      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Nov 11 '18 at 23:59









      12b345b6b78

      767115




      767115












      • I don't get the same results using your code.
        – Johannes Heß
        Nov 12 '18 at 0:56










      • By default, sampling is without replacement, so the example is wrong.
        – user2554330
        Nov 12 '18 at 1:21


















      • I don't get the same results using your code.
        – Johannes Heß
        Nov 12 '18 at 0:56










      • By default, sampling is without replacement, so the example is wrong.
        – user2554330
        Nov 12 '18 at 1:21
















      I don't get the same results using your code.
      – Johannes Heß
      Nov 12 '18 at 0:56




      I don't get the same results using your code.
      – Johannes Heß
      Nov 12 '18 at 0:56












      By default, sampling is without replacement, so the example is wrong.
      – user2554330
      Nov 12 '18 at 1:21




      By default, sampling is without replacement, so the example is wrong.
      – user2554330
      Nov 12 '18 at 1:21













      0














      Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.



      # set seed to make reproducible 
      set.seed(95014)
      # generate 10,000 means of 1,000 items pulled from a uniform distribution
      mean_x <- NULL
      for (i in 1:10000){
      mean_x <- c(mean_x,mean(runif(1000)))
      }
      hist(mean_x)


      ...and the output:



      enter image description here






      share|improve this answer


























        0














        Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.



        # set seed to make reproducible 
        set.seed(95014)
        # generate 10,000 means of 1,000 items pulled from a uniform distribution
        mean_x <- NULL
        for (i in 1:10000){
        mean_x <- c(mean_x,mean(runif(1000)))
        }
        hist(mean_x)


        ...and the output:



        enter image description here






        share|improve this answer
























          0












          0








          0






          Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.



          # set seed to make reproducible 
          set.seed(95014)
          # generate 10,000 means of 1,000 items pulled from a uniform distribution
          mean_x <- NULL
          for (i in 1:10000){
          mean_x <- c(mean_x,mean(runif(1000)))
          }
          hist(mean_x)


          ...and the output:



          enter image description here






          share|improve this answer












          Here is an example that generates 10,000 sample means of 1,000 items drawn randomly from a uniform distribution. Based on the Central Limit Theorem, we expect these means to be normally distributed with a mean of 0.5.



          # set seed to make reproducible 
          set.seed(95014)
          # generate 10,000 means of 1,000 items pulled from a uniform distribution
          mean_x <- NULL
          for (i in 1:10000){
          mean_x <- c(mean_x,mean(runif(1000)))
          }
          hist(mean_x)


          ...and the output:



          enter image description here







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 12 '18 at 0:15









          Len Greski

          3,1201421




          3,1201421























              0














              @ Len Greski
              I can also do it that way right?



              a <- runif(10000000)
              j <- 1
              x <- NULL
              while (j <= 10000000){
              x <- c(x,mean(a[j:(j+999)]))
              j <- j + 1000
              }
              x
              hist(x)





              share|improve this answer





















              • Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
                – Len Greski
                Nov 22 '18 at 15:11
















              0














              @ Len Greski
              I can also do it that way right?



              a <- runif(10000000)
              j <- 1
              x <- NULL
              while (j <= 10000000){
              x <- c(x,mean(a[j:(j+999)]))
              j <- j + 1000
              }
              x
              hist(x)





              share|improve this answer





















              • Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
                – Len Greski
                Nov 22 '18 at 15:11














              0












              0








              0






              @ Len Greski
              I can also do it that way right?



              a <- runif(10000000)
              j <- 1
              x <- NULL
              while (j <= 10000000){
              x <- c(x,mean(a[j:(j+999)]))
              j <- j + 1000
              }
              x
              hist(x)





              share|improve this answer












              @ Len Greski
              I can also do it that way right?



              a <- runif(10000000)
              j <- 1
              x <- NULL
              while (j <= 10000000){
              x <- c(x,mean(a[j:(j+999)]))
              j <- j + 1000
              }
              x
              hist(x)






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 12 '18 at 1:32









              Johannes Heß

              1




              1












              • Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
                – Len Greski
                Nov 22 '18 at 15:11


















              • Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
                – Len Greski
                Nov 22 '18 at 15:11
















              Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
              – Len Greski
              Nov 22 '18 at 15:11




              Yes, you can also accomplish the same result with the code you posted above if you add set.seed(95014) before the a <- runif(...) line. You can confirm this by calculating the difference between x in your code and mean_x in mine. The result will be a vector of zeroes.
              – Len Greski
              Nov 22 '18 at 15:11











              0














              I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.



              #First Method
              seed <- 5554
              set.seed(seed)
              group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
              set.seed(seed)
              mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))

              #Method you propose
              set.seed(5554)
              big_sample <- data.frame(
              group=rep(1:10000, each=1000),
              samples=rnorm(10000 * 1000, 0, 1)
              )

              group_means_2 <- aggregate(samples ~ group,
              FUN = mean,
              data=big_sample)

              mean_of_means_2 <- mean(group_means_2$samples)

              #comparison
              mean_of_means_1 == mean_of_means_2

              t.test(group_of_means_1, group_means_2$samples)





              share|improve this answer


























                0














                I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.



                #First Method
                seed <- 5554
                set.seed(seed)
                group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
                set.seed(seed)
                mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))

                #Method you propose
                set.seed(5554)
                big_sample <- data.frame(
                group=rep(1:10000, each=1000),
                samples=rnorm(10000 * 1000, 0, 1)
                )

                group_means_2 <- aggregate(samples ~ group,
                FUN = mean,
                data=big_sample)

                mean_of_means_2 <- mean(group_means_2$samples)

                #comparison
                mean_of_means_1 == mean_of_means_2

                t.test(group_of_means_1, group_means_2$samples)





                share|improve this answer
























                  0












                  0








                  0






                  I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.



                  #First Method
                  seed <- 5554
                  set.seed(seed)
                  group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
                  set.seed(seed)
                  mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))

                  #Method you propose
                  set.seed(5554)
                  big_sample <- data.frame(
                  group=rep(1:10000, each=1000),
                  samples=rnorm(10000 * 1000, 0, 1)
                  )

                  group_means_2 <- aggregate(samples ~ group,
                  FUN = mean,
                  data=big_sample)

                  mean_of_means_2 <- mean(group_means_2$samples)

                  #comparison
                  mean_of_means_1 == mean_of_means_2

                  t.test(group_of_means_1, group_means_2$samples)





                  share|improve this answer












                  I would say yes. In taking 10,000,000 samples you've randomly sampled most of the experimental space. If you set.seed the same for both the approaches you mention you get the exact same answer. If you change the seed and run a t-test, the results are not significantly different.



                  #First Method
                  seed <- 5554
                  set.seed(seed)
                  group_of_means_1 <- replicate(n=10000, expr = mean(rnorm(1000)))
                  set.seed(seed)
                  mean_of_means_1 <- mean(replicate(n=10000, expr = mean(rnorm(1000))))

                  #Method you propose
                  set.seed(5554)
                  big_sample <- data.frame(
                  group=rep(1:10000, each=1000),
                  samples=rnorm(10000 * 1000, 0, 1)
                  )

                  group_means_2 <- aggregate(samples ~ group,
                  FUN = mean,
                  data=big_sample)

                  mean_of_means_2 <- mean(group_means_2$samples)

                  #comparison
                  mean_of_means_1 == mean_of_means_2

                  t.test(group_of_means_1, group_means_2$samples)






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 12 '18 at 5:47









                  Kgrey

                  1613




                  1613






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53254368%2frepeated-sampling%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Full-time equivalent

                      さくらももこ

                      13 indicted, 8 arrested in Calif. drug cartel investigation