Count of unique values per group as new column with pandas












1















I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



df = pd.DataFrame({
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})

df.groupby('mID')['uID'].nunique()


Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


(this obviously does not work)



It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



Thanks










share|improve this question





























    1















    I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



    df = pd.DataFrame({
    'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
    'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
    })

    df.groupby('mID')['uID'].nunique()


    Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



    df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


    (this obviously does not work)



    It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



    Thanks










    share|improve this question



























      1












      1








      1








      I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



      df = pd.DataFrame({
      'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
      'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
      })

      df.groupby('mID')['uID'].nunique()


      Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



      df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


      (this obviously does not work)



      It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



      Thanks










      share|improve this question
















      I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.



      df = pd.DataFrame({
      'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
      'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
      })

      df.groupby('mID')['uID'].nunique()


      Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:



      df['ncount'] = df.groupby('mID')['uID'].transform('nunique')


      (this obviously does not work)



      It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.



      Thanks







      python pandas dataframe group-by pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 30 '18 at 16:10









      coldspeed

      125k23126213




      125k23126213










      asked Nov 12 '18 at 23:35









      ZeroStackZeroStack

      386116




      386116
























          2 Answers
          2






          active

          oldest

          votes


















          1














          GroupBy.transform('nunique')



          On v0.23.4, your solution works for me.



          df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
          df
          uID mID ncount
          0 James A 5
          1 Henry B 2
          2 Abe A 5
          3 James B 2
          4 Henry A 5
          5 Brian A 5
          6 Claude A 5
          7 James C 1





          GroupBy.nunique + pd.Series.map



          Additionally, with your existing solution, you could map the series back to mID:



          df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
          df
          uID mID ncount
          0 James A 5
          1 Henry B 2
          2 Abe A 5
          3 James B 2
          4 Henry A 5
          5 Brian A 5
          6 Claude A 5
          7 James C 1





          share|improve this answer































            1














            You are very close!



            df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

            uID mID ncount
            0 James A 5
            1 Henry B 2
            2 Abe A 5
            3 James B 2
            4 Henry A 5
            5 Brian A 5
            6 Claude A 5
            7 James C 1





            share|improve this answer
























            • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

              – ZeroStack
              Nov 12 '18 at 23:47













            • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

              – Peter Leimbigler
              Nov 12 '18 at 23:53











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fcount-of-unique-values-per-group-as-new-column-with-pandas%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            GroupBy.transform('nunique')



            On v0.23.4, your solution works for me.



            df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
            df
            uID mID ncount
            0 James A 5
            1 Henry B 2
            2 Abe A 5
            3 James B 2
            4 Henry A 5
            5 Brian A 5
            6 Claude A 5
            7 James C 1





            GroupBy.nunique + pd.Series.map



            Additionally, with your existing solution, you could map the series back to mID:



            df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
            df
            uID mID ncount
            0 James A 5
            1 Henry B 2
            2 Abe A 5
            3 James B 2
            4 Henry A 5
            5 Brian A 5
            6 Claude A 5
            7 James C 1





            share|improve this answer




























              1














              GroupBy.transform('nunique')



              On v0.23.4, your solution works for me.



              df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
              df
              uID mID ncount
              0 James A 5
              1 Henry B 2
              2 Abe A 5
              3 James B 2
              4 Henry A 5
              5 Brian A 5
              6 Claude A 5
              7 James C 1





              GroupBy.nunique + pd.Series.map



              Additionally, with your existing solution, you could map the series back to mID:



              df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
              df
              uID mID ncount
              0 James A 5
              1 Henry B 2
              2 Abe A 5
              3 James B 2
              4 Henry A 5
              5 Brian A 5
              6 Claude A 5
              7 James C 1





              share|improve this answer


























                1












                1








                1







                GroupBy.transform('nunique')



                On v0.23.4, your solution works for me.



                df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1





                GroupBy.nunique + pd.Series.map



                Additionally, with your existing solution, you could map the series back to mID:



                df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1





                share|improve this answer













                GroupBy.transform('nunique')



                On v0.23.4, your solution works for me.



                df['ncount'] = df.groupby('mID')['uID'].transform('nunique')
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1





                GroupBy.nunique + pd.Series.map



                Additionally, with your existing solution, you could map the series back to mID:



                df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())
                df
                uID mID ncount
                0 James A 5
                1 Henry B 2
                2 Abe A 5
                3 James B 2
                4 Henry A 5
                5 Brian A 5
                6 Claude A 5
                7 James C 1






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 13 '18 at 0:57









                coldspeedcoldspeed

                125k23126213




                125k23126213

























                    1














                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1





                    share|improve this answer
























                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

                      – ZeroStack
                      Nov 12 '18 at 23:47













                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

                      – Peter Leimbigler
                      Nov 12 '18 at 23:53
















                    1














                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1





                    share|improve this answer
























                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

                      – ZeroStack
                      Nov 12 '18 at 23:47













                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

                      – Peter Leimbigler
                      Nov 12 '18 at 23:53














                    1












                    1








                    1







                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1





                    share|improve this answer













                    You are very close!



                    df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)

                    uID mID ncount
                    0 James A 5
                    1 Henry B 2
                    2 Abe A 5
                    3 James B 2
                    4 Henry A 5
                    5 Brian A 5
                    6 Claude A 5
                    7 James C 1






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 12 '18 at 23:43









                    Peter LeimbiglerPeter Leimbigler

                    3,8981415




                    3,8981415













                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

                      – ZeroStack
                      Nov 12 '18 at 23:47













                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

                      – Peter Leimbigler
                      Nov 12 '18 at 23:53



















                    • Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

                      – ZeroStack
                      Nov 12 '18 at 23:47













                    • @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

                      – Peter Leimbigler
                      Nov 12 '18 at 23:53

















                    Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

                    – ZeroStack
                    Nov 12 '18 at 23:47







                    Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

                    – ZeroStack
                    Nov 12 '18 at 23:47















                    @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

                    – Peter Leimbigler
                    Nov 12 '18 at 23:53





                    @ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

                    – Peter Leimbigler
                    Nov 12 '18 at 23:53


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fcount-of-unique-values-per-group-as-new-column-with-pandas%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Full-time equivalent

                    Bicuculline

                    さくらももこ