random_state and shuffle together











up vote
3
down vote

favorite












I am kind of confused here about using random_state and shuffle together. I want to split the data without shuffling it. It seems to me that when I set shuffle to False it doesn't matter what is the number I choose for random_state, I have the same output (the splits are the same for random_state 42 or 2, 7, 17, etc). Why?



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )


But if shuffle is True, I have different outputs (splits) for different random_states which makes sense.



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)









share|improve this question




























    up vote
    3
    down vote

    favorite












    I am kind of confused here about using random_state and shuffle together. I want to split the data without shuffling it. It seems to me that when I set shuffle to False it doesn't matter what is the number I choose for random_state, I have the same output (the splits are the same for random_state 42 or 2, 7, 17, etc). Why?



    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )


    But if shuffle is True, I have different outputs (splits) for different random_states which makes sense.



    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)









    share|improve this question


























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I am kind of confused here about using random_state and shuffle together. I want to split the data without shuffling it. It seems to me that when I set shuffle to False it doesn't matter what is the number I choose for random_state, I have the same output (the splits are the same for random_state 42 or 2, 7, 17, etc). Why?



      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )


      But if shuffle is True, I have different outputs (splits) for different random_states which makes sense.



      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)









      share|improve this question















      I am kind of confused here about using random_state and shuffle together. I want to split the data without shuffling it. It seems to me that when I set shuffle to False it doesn't matter what is the number I choose for random_state, I have the same output (the splits are the same for random_state 42 or 2, 7, 17, etc). Why?



      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )


      But if shuffle is True, I have different outputs (splits) for different random_states which makes sense.



      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)






      python scikit-learn shuffle






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 11 at 16:08









      TimH

      2,32221322




      2,32221322










      asked Nov 11 at 14:16









      matin

      335




      335
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          If you set shuffle to False, train_test_split just reads in your data in its original order. Therefore the parameter random_state is completely ignored.



          Example:



          X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
          y = X # just for testing
          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

          print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]


          As soon as you set shuffle to True, random_state is used as seed for the random number generator. As a result, your data set gets randomly split into train and test set.



          Example with random_state=42:



          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)

          print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]


          Example with random_state=44:



          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

          print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249603%2frandom-state-and-shuffle-together%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            If you set shuffle to False, train_test_split just reads in your data in its original order. Therefore the parameter random_state is completely ignored.



            Example:



            X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
            y = X # just for testing
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

            print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]


            As soon as you set shuffle to True, random_state is used as seed for the random number generator. As a result, your data set gets randomly split into train and test set.



            Example with random_state=42:



            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)

            print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]


            Example with random_state=44:



            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

            print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]





            share|improve this answer

























              up vote
              1
              down vote



              accepted










              If you set shuffle to False, train_test_split just reads in your data in its original order. Therefore the parameter random_state is completely ignored.



              Example:



              X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
              y = X # just for testing
              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

              print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]


              As soon as you set shuffle to True, random_state is used as seed for the random number generator. As a result, your data set gets randomly split into train and test set.



              Example with random_state=42:



              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)

              print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]


              Example with random_state=44:



              X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

              print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]





              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                If you set shuffle to False, train_test_split just reads in your data in its original order. Therefore the parameter random_state is completely ignored.



                Example:



                X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
                y = X # just for testing
                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

                print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]


                As soon as you set shuffle to True, random_state is used as seed for the random number generator. As a result, your data set gets randomly split into train and test set.



                Example with random_state=42:



                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)

                print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]


                Example with random_state=44:



                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

                print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]





                share|improve this answer












                If you set shuffle to False, train_test_split just reads in your data in its original order. Therefore the parameter random_state is completely ignored.



                Example:



                X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
                y = X # just for testing
                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

                print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]


                As soon as you set shuffle to True, random_state is used as seed for the random number generator. As a result, your data set gets randomly split into train and test set.



                Example with random_state=42:



                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)

                print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]


                Example with random_state=44:



                X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

                print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 11 at 16:21









                TimH

                2,32221322




                2,32221322






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53249603%2frandom-state-and-shuffle-together%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Full-time equivalent

                    さくらももこ

                    13 indicted, 8 arrested in Calif. drug cartel investigation