Count of unique values per group as new column with pandas

I would like to count the unique observations by a group in a pandas dataframe and create a new column that has the unique count. Importantly, I would not like to reduce the rows in the dataframe; effectively performing something similar to a window function in SQL.

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

asked Nov 12 '18 at 23:35

ZeroStack

386116

add a comment |

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

asked Nov 12 '18 at 23:35

ZeroStack

386116

add a comment |

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

asked Nov 12 '18 at 23:35

ZeroStack

386116

df = pd.DataFrame({

         'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],

         'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']

})



df.groupby('mID')['uID'].nunique()

Will get the unique count per group, but it summarises (reduces the rows), I would effectively like to do something along the lines of:

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

(this obviously does not work)

It is possible to accomplish the desired outcome by taking the unique summarised dataframe and joining it to the original dataframe but I am wondering if there is a more minimal solution.

Thanks

python pandas dataframe group-by pandas-groupby

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

asked Nov 12 '18 at 23:35

ZeroStack

386116

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

asked Nov 12 '18 at 23:35

ZeroStack

386116

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

edited Dec 30 '18 at 16:10

coldspeed

125k23126213

asked Nov 12 '18 at 23:35

ZeroStack

386116

asked Nov 12 '18 at 23:35

ZeroStack

386116

asked Nov 12 '18 at 23:35

ZeroStack

386116

add a comment |

2 Answers
2

active

oldest

votes

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

add a comment |

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

– ZeroStack
Nov 12 '18 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

– Peter Leimbigler
Nov 12 '18 at 23:53

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271655%2fcount-of-unique-values-per-group-as-new-column-with-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

add a comment |

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

add a comment |

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

`GroupBy.transform('nunique')`

On v0.23.4, your solution works for me.

df['ncount'] = df.groupby('mID')['uID'].transform('nunique')

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

`GroupBy.nunique` + `pd.Series.map`

Additionally, with your existing solution, you could map the series back to mID:

df['ncount'] = df.mID.map(df.groupby('mID')['uID'].nunique())

df

      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

answered Nov 13 '18 at 0:57

coldspeed

125k23126213

add a comment |

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

– ZeroStack
Nov 12 '18 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

– Peter Leimbigler
Nov 12 '18 at 23:53

add a comment |

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

– ZeroStack
Nov 12 '18 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

– Peter Leimbigler
Nov 12 '18 at 23:53

add a comment |

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

You are very close!

df['ncount'] = df.groupby('mID')['uID'].transform(pd.Series.nunique)



      uID mID  ncount

0   James   A       5

1   Henry   B       2

2     Abe   A       5

3   James   B       2

4   Henry   A       5

5   Brian   A       5

6  Claude   A       5

7   James   C       1

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

answered Nov 12 '18 at 23:43

Peter Leimbigler

3,8981415

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

– ZeroStack
Nov 12 '18 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

– Peter Leimbigler
Nov 12 '18 at 23:53

add a comment |

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

– ZeroStack
Nov 12 '18 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

– Peter Leimbigler
Nov 12 '18 at 23:53

Thanks Peter, on my original data I get a ValueError: Length mismatch: Expected axis has 29101 elements, new values have 29457 elements, i'm not even creating a new column just assigning to a new variable. Your solution does answer the question, any ideas on this error? EDIT: NA values were the culprit here.

– ZeroStack
Nov 12 '18 at 23:47

@ZeroStack, that might be this bug: github.com/pandas-dev/pandas/issues/17093 I would try df.fillna(0).groupby(...), and if that works, investigate further how to fill any missing values in the columns mID and/or uID.

– Peter Leimbigler
Nov 12 '18 at 23:53

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

AMyvKgt I0JGW rPsEMkxGr1d9hA3DUX,7rS,fWLrd,E87riJDwVr2WxEo1L

搜尋此網誌

Nrthugu