Rename result columns from Pandas aggregation (“FutureWarning: using a dict with renaming is deprecated”)












31















I'm trying to do some aggregations on a pandas data frame. Here is a sample code:



import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]:
Amount
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1


Which generates the following warning:




FutureWarning: using a dict with renaming is deprecated and will be
removed in a future version return super(DataFrameGroupBy,
self).aggregate(arg, *args, **kwargs)




How can I avoid this?










share|improve this question




















  • 4





    I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it?

    – Stephen McAteer
    Nov 24 '17 at 5:31











  • To focus on the keywords of the solution rather than just the existing warning, I retitled "rename result columns from aggregation" and tagged. Now people might even find this question :) ahead of (say) the not-so-canonicalNaming returned columns in Pandas aggregate function?

    – smci
    May 8 '18 at 23:32








  • 2





    Hopefully this will be addressed in github.com/pandas-dev/pandas/issues/18366

    – Nickolay
    May 28 '18 at 8:06











  • How would this work if I don't do a "groupby" but I'm doing "pivot" instead?

    – avloss
    Jun 11 '18 at 12:57
















31















I'm trying to do some aggregations on a pandas data frame. Here is a sample code:



import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]:
Amount
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1


Which generates the following warning:




FutureWarning: using a dict with renaming is deprecated and will be
removed in a future version return super(DataFrameGroupBy,
self).aggregate(arg, *args, **kwargs)




How can I avoid this?










share|improve this question




















  • 4





    I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it?

    – Stephen McAteer
    Nov 24 '17 at 5:31











  • To focus on the keywords of the solution rather than just the existing warning, I retitled "rename result columns from aggregation" and tagged. Now people might even find this question :) ahead of (say) the not-so-canonicalNaming returned columns in Pandas aggregate function?

    – smci
    May 8 '18 at 23:32








  • 2





    Hopefully this will be addressed in github.com/pandas-dev/pandas/issues/18366

    – Nickolay
    May 28 '18 at 8:06











  • How would this work if I don't do a "groupby" but I'm doing "pivot" instead?

    – avloss
    Jun 11 '18 at 12:57














31












31








31


13






I'm trying to do some aggregations on a pandas data frame. Here is a sample code:



import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]:
Amount
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1


Which generates the following warning:




FutureWarning: using a dict with renaming is deprecated and will be
removed in a future version return super(DataFrameGroupBy,
self).aggregate(arg, *args, **kwargs)




How can I avoid this?










share|improve this question
















I'm trying to do some aggregations on a pandas data frame. Here is a sample code:



import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})

Out[1]:
Amount
Sum Count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1


Which generates the following warning:




FutureWarning: using a dict with renaming is deprecated and will be
removed in a future version return super(DataFrameGroupBy,
self).aggregate(arg, *args, **kwargs)




How can I avoid this?







python pandas aggregate rename






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 8 '18 at 23:31









smci

14.7k672104




14.7k672104










asked Jun 19 '17 at 16:28









Victor MayrinkVictor Mayrink

3331314




3331314








  • 4





    I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it?

    – Stephen McAteer
    Nov 24 '17 at 5:31











  • To focus on the keywords of the solution rather than just the existing warning, I retitled "rename result columns from aggregation" and tagged. Now people might even find this question :) ahead of (say) the not-so-canonicalNaming returned columns in Pandas aggregate function?

    – smci
    May 8 '18 at 23:32








  • 2





    Hopefully this will be addressed in github.com/pandas-dev/pandas/issues/18366

    – Nickolay
    May 28 '18 at 8:06











  • How would this work if I don't do a "groupby" but I'm doing "pivot" instead?

    – avloss
    Jun 11 '18 at 12:57














  • 4





    I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it?

    – Stephen McAteer
    Nov 24 '17 at 5:31











  • To focus on the keywords of the solution rather than just the existing warning, I retitled "rename result columns from aggregation" and tagged. Now people might even find this question :) ahead of (say) the not-so-canonicalNaming returned columns in Pandas aggregate function?

    – smci
    May 8 '18 at 23:32








  • 2





    Hopefully this will be addressed in github.com/pandas-dev/pandas/issues/18366

    – Nickolay
    May 28 '18 at 8:06











  • How would this work if I don't do a "groupby" but I'm doing "pivot" instead?

    – avloss
    Jun 11 '18 at 12:57








4




4





I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it?

– Stephen McAteer
Nov 24 '17 at 5:31





I'd love to know why this is being depreciated (I'm sure there is a good reason). Does anyone have a link to a discussion on it?

– Stephen McAteer
Nov 24 '17 at 5:31













To focus on the keywords of the solution rather than just the existing warning, I retitled "rename result columns from aggregation" and tagged. Now people might even find this question :) ahead of (say) the not-so-canonicalNaming returned columns in Pandas aggregate function?

– smci
May 8 '18 at 23:32







To focus on the keywords of the solution rather than just the existing warning, I retitled "rename result columns from aggregation" and tagged. Now people might even find this question :) ahead of (say) the not-so-canonicalNaming returned columns in Pandas aggregate function?

– smci
May 8 '18 at 23:32






2




2





Hopefully this will be addressed in github.com/pandas-dev/pandas/issues/18366

– Nickolay
May 28 '18 at 8:06





Hopefully this will be addressed in github.com/pandas-dev/pandas/issues/18366

– Nickolay
May 28 '18 at 8:06













How would this work if I don't do a "groupby" but I'm doing "pivot" instead?

– avloss
Jun 11 '18 at 12:57





How would this work if I don't do a "groupby" but I'm doing "pivot" instead?

– avloss
Jun 11 '18 at 12:57












3 Answers
3






active

oldest

votes


















52














Use groupby apply and return a Series to rename columns



Use the groupby apply method to perform an aggregation that




  • Renames the columns

  • Allows for spaces in the names

  • Allows you to order the returned columns in any way you choose

  • Allows for interactions between columns

  • Returns a single level index and NOT a MultiIndex


To do this:




  • create a custom function that you pass to apply

  • This custom function is passed each group as a DataFrame

  • Return a Series

  • The index of the Series will be the new columns


Create fake data



df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
'Score': [9, 1, 8, 7, 7, 6, 9]})


enter image description here



create custom function that returns a Series

The variable x inside of my_agg is a DataFrame



def my_agg(x):
names = {
'Amount mean': x['Amount'].mean(),
'Amount std': x['Amount'].std(),
'Amount range': x['Amount'].max() - x['Amount'].min(),
'Score Max': x['Score'].max(),
'Score Sum': x['Score'].sum(),
'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
'Score Sum', 'Score Max', 'Amount Score Sum'])


Pass this custom function to the groupby apply method



df.groupby('User').apply(my_agg)


enter image description here



The big downside is that this function will be much slower than agg for the cythonized aggregations



Using a dictionary with groupby agg method



Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.



df.groupby('User')['Amount'].agg(['sum', 'count'])


Output



       sum  count
User
user1 18.0 2
user2 20.5 3
user3 10.5 1


It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.



df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
'Other': [1,2,3,4,5,6]})

df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})


Output



      Amount       Other          
sum count max std
User
user1 18.0 2 6 3.535534
user2 20.5 3 5 1.527525
user3 10.5 1 4 NaN





share|improve this answer





















  • 13





    But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

    – ErnestScribbler
    Sep 25 '17 at 14:13






  • 2





    Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

    – mgmonteleone
    Oct 9 '17 at 22:40













  • If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

    – Ted Petrou
    Oct 9 '17 at 22:44











  • @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

    – Ted Petrou
    Dec 7 '17 at 14:45






  • 2





    How would the apply method work with 'first' and 'last'?

    – Gregory Saxton
    Feb 16 '18 at 19:14



















7














If you replace the internal dictionary with a list of tuples it gets rid of the warning message



import pandas as pd

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})





share|improve this answer
























  • Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

    – Peanut
    Jun 6 '18 at 12:42













  • @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

    – Jacob Stevenson
    Jun 6 '18 at 16:30






  • 3





    This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

    – Ted Petrou
    Jun 13 '18 at 12:27











  • Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

    – Jacob Stevenson
    Jun 13 '18 at 16:35



















0














This is what I did:



Create a fake dataset:



import pandas as pd
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
"Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
'Score': [9, 1, 8, 7, 7, 6, 9]})
df


O/P:



    Amount  Score   User
0 10.0 9 user1
1 5.0 1 user2
2 8.0 8 user2
3 10.5 7 user3
4 7.5 7 user2
5 8.0 6 user1
6 9.0 9 user3


I first made the User the index, and then a groupby:



ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
ans


Solution:



    Sum Count
User
user1 18.0 2
user2 20.5 3
user3 19.5 2





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44635626%2frename-result-columns-from-pandas-aggregation-futurewarning-using-a-dict-with%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    52














    Use groupby apply and return a Series to rename columns



    Use the groupby apply method to perform an aggregation that




    • Renames the columns

    • Allows for spaces in the names

    • Allows you to order the returned columns in any way you choose

    • Allows for interactions between columns

    • Returns a single level index and NOT a MultiIndex


    To do this:




    • create a custom function that you pass to apply

    • This custom function is passed each group as a DataFrame

    • Return a Series

    • The index of the Series will be the new columns


    Create fake data



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
    'Score': [9, 1, 8, 7, 7, 6, 9]})


    enter image description here



    create custom function that returns a Series

    The variable x inside of my_agg is a DataFrame



    def my_agg(x):
    names = {
    'Amount mean': x['Amount'].mean(),
    'Amount std': x['Amount'].std(),
    'Amount range': x['Amount'].max() - x['Amount'].min(),
    'Score Max': x['Score'].max(),
    'Score Sum': x['Score'].sum(),
    'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
    'Score Sum', 'Score Max', 'Amount Score Sum'])


    Pass this custom function to the groupby apply method



    df.groupby('User').apply(my_agg)


    enter image description here



    The big downside is that this function will be much slower than agg for the cythonized aggregations



    Using a dictionary with groupby agg method



    Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.



    df.groupby('User')['Amount'].agg(['sum', 'count'])


    Output



           sum  count
    User
    user1 18.0 2
    user2 20.5 3
    user3 10.5 1


    It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
    'Other': [1,2,3,4,5,6]})

    df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})


    Output



          Amount       Other          
    sum count max std
    User
    user1 18.0 2 6 3.535534
    user2 20.5 3 5 1.527525
    user3 10.5 1 4 NaN





    share|improve this answer





















    • 13





      But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

      – ErnestScribbler
      Sep 25 '17 at 14:13






    • 2





      Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

      – mgmonteleone
      Oct 9 '17 at 22:40













    • If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

      – Ted Petrou
      Oct 9 '17 at 22:44











    • @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

      – Ted Petrou
      Dec 7 '17 at 14:45






    • 2





      How would the apply method work with 'first' and 'last'?

      – Gregory Saxton
      Feb 16 '18 at 19:14
















    52














    Use groupby apply and return a Series to rename columns



    Use the groupby apply method to perform an aggregation that




    • Renames the columns

    • Allows for spaces in the names

    • Allows you to order the returned columns in any way you choose

    • Allows for interactions between columns

    • Returns a single level index and NOT a MultiIndex


    To do this:




    • create a custom function that you pass to apply

    • This custom function is passed each group as a DataFrame

    • Return a Series

    • The index of the Series will be the new columns


    Create fake data



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
    'Score': [9, 1, 8, 7, 7, 6, 9]})


    enter image description here



    create custom function that returns a Series

    The variable x inside of my_agg is a DataFrame



    def my_agg(x):
    names = {
    'Amount mean': x['Amount'].mean(),
    'Amount std': x['Amount'].std(),
    'Amount range': x['Amount'].max() - x['Amount'].min(),
    'Score Max': x['Score'].max(),
    'Score Sum': x['Score'].sum(),
    'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
    'Score Sum', 'Score Max', 'Amount Score Sum'])


    Pass this custom function to the groupby apply method



    df.groupby('User').apply(my_agg)


    enter image description here



    The big downside is that this function will be much slower than agg for the cythonized aggregations



    Using a dictionary with groupby agg method



    Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.



    df.groupby('User')['Amount'].agg(['sum', 'count'])


    Output



           sum  count
    User
    user1 18.0 2
    user2 20.5 3
    user3 10.5 1


    It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
    'Other': [1,2,3,4,5,6]})

    df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})


    Output



          Amount       Other          
    sum count max std
    User
    user1 18.0 2 6 3.535534
    user2 20.5 3 5 1.527525
    user3 10.5 1 4 NaN





    share|improve this answer





















    • 13





      But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

      – ErnestScribbler
      Sep 25 '17 at 14:13






    • 2





      Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

      – mgmonteleone
      Oct 9 '17 at 22:40













    • If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

      – Ted Petrou
      Oct 9 '17 at 22:44











    • @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

      – Ted Petrou
      Dec 7 '17 at 14:45






    • 2





      How would the apply method work with 'first' and 'last'?

      – Gregory Saxton
      Feb 16 '18 at 19:14














    52












    52








    52







    Use groupby apply and return a Series to rename columns



    Use the groupby apply method to perform an aggregation that




    • Renames the columns

    • Allows for spaces in the names

    • Allows you to order the returned columns in any way you choose

    • Allows for interactions between columns

    • Returns a single level index and NOT a MultiIndex


    To do this:




    • create a custom function that you pass to apply

    • This custom function is passed each group as a DataFrame

    • Return a Series

    • The index of the Series will be the new columns


    Create fake data



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
    'Score': [9, 1, 8, 7, 7, 6, 9]})


    enter image description here



    create custom function that returns a Series

    The variable x inside of my_agg is a DataFrame



    def my_agg(x):
    names = {
    'Amount mean': x['Amount'].mean(),
    'Amount std': x['Amount'].std(),
    'Amount range': x['Amount'].max() - x['Amount'].min(),
    'Score Max': x['Score'].max(),
    'Score Sum': x['Score'].sum(),
    'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
    'Score Sum', 'Score Max', 'Amount Score Sum'])


    Pass this custom function to the groupby apply method



    df.groupby('User').apply(my_agg)


    enter image description here



    The big downside is that this function will be much slower than agg for the cythonized aggregations



    Using a dictionary with groupby agg method



    Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.



    df.groupby('User')['Amount'].agg(['sum', 'count'])


    Output



           sum  count
    User
    user1 18.0 2
    user2 20.5 3
    user3 10.5 1


    It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
    'Other': [1,2,3,4,5,6]})

    df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})


    Output



          Amount       Other          
    sum count max std
    User
    user1 18.0 2 6 3.535534
    user2 20.5 3 5 1.527525
    user3 10.5 1 4 NaN





    share|improve this answer















    Use groupby apply and return a Series to rename columns



    Use the groupby apply method to perform an aggregation that




    • Renames the columns

    • Allows for spaces in the names

    • Allows you to order the returned columns in any way you choose

    • Allows for interactions between columns

    • Returns a single level index and NOT a MultiIndex


    To do this:




    • create a custom function that you pass to apply

    • This custom function is passed each group as a DataFrame

    • Return a Series

    • The index of the Series will be the new columns


    Create fake data



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
    'Score': [9, 1, 8, 7, 7, 6, 9]})


    enter image description here



    create custom function that returns a Series

    The variable x inside of my_agg is a DataFrame



    def my_agg(x):
    names = {
    'Amount mean': x['Amount'].mean(),
    'Amount std': x['Amount'].std(),
    'Amount range': x['Amount'].max() - x['Amount'].min(),
    'Score Max': x['Score'].max(),
    'Score Sum': x['Score'].sum(),
    'Amount Score Sum': (x['Amount'] * x['Score']).sum()}

    return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',
    'Score Sum', 'Score Max', 'Amount Score Sum'])


    Pass this custom function to the groupby apply method



    df.groupby('User').apply(my_agg)


    enter image description here



    The big downside is that this function will be much slower than agg for the cythonized aggregations



    Using a dictionary with groupby agg method



    Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.



    df.groupby('User')['Amount'].agg(['sum', 'count'])


    Output



           sum  count
    User
    user1 18.0 2
    user2 20.5 3
    user3 10.5 1


    It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.



    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],
    'Other': [1,2,3,4,5,6]})

    df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})


    Output



          Amount       Other          
    sum count max std
    User
    user1 18.0 2 6 3.535534
    user2 20.5 3 5 1.527525
    user3 10.5 1 4 NaN






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 7 '17 at 14:41

























    answered Jun 19 '17 at 16:33









    Ted PetrouTed Petrou

    21.7k76864




    21.7k76864








    • 13





      But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

      – ErnestScribbler
      Sep 25 '17 at 14:13






    • 2





      Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

      – mgmonteleone
      Oct 9 '17 at 22:40













    • If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

      – Ted Petrou
      Oct 9 '17 at 22:44











    • @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

      – Ted Petrou
      Dec 7 '17 at 14:45






    • 2





      How would the apply method work with 'first' and 'last'?

      – Gregory Saxton
      Feb 16 '18 at 19:14














    • 13





      But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

      – ErnestScribbler
      Sep 25 '17 at 14:13






    • 2





      Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

      – mgmonteleone
      Oct 9 '17 at 22:40













    • If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

      – Ted Petrou
      Oct 9 '17 at 22:44











    • @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

      – Ted Petrou
      Dec 7 '17 at 14:45






    • 2





      How would the apply method work with 'first' and 'last'?

      – Gregory Saxton
      Feb 16 '18 at 19:14








    13




    13





    But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

    – ErnestScribbler
    Sep 25 '17 at 14:13





    But suppose you do want the renaming aspect as well, to have different column names than the default after aggregation. Is there some syntax to still get this functionality?

    – ErnestScribbler
    Sep 25 '17 at 14:13




    2




    2





    Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

    – mgmonteleone
    Oct 9 '17 at 22:40







    Had same question about naming, since I am using the same col twice (one min and one max) and need a way to refer to them uniquely when I put results back in objects.

    – mgmonteleone
    Oct 9 '17 at 22:40















    If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

    – Ted Petrou
    Oct 9 '17 at 22:44





    If you want to rename the columns you will have to do it manually. Either use a list to replace all of them df.columns = ['your', 'new', 'col', 'names'] or use the rename method which will be a bit difficult since you have this results in a MultiIndex.

    – Ted Petrou
    Oct 9 '17 at 22:44













    @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

    – Ted Petrou
    Dec 7 '17 at 14:45





    @ErnestScribbler I have updated this answer to show a fairly simple way of performing an aggregation, renaming and returning a single-level index.

    – Ted Petrou
    Dec 7 '17 at 14:45




    2




    2





    How would the apply method work with 'first' and 'last'?

    – Gregory Saxton
    Feb 16 '18 at 19:14





    How would the apply method work with 'first' and 'last'?

    – Gregory Saxton
    Feb 16 '18 at 19:14













    7














    If you replace the internal dictionary with a list of tuples it gets rid of the warning message



    import pandas as pd

    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

    df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})





    share|improve this answer
























    • Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

      – Peanut
      Jun 6 '18 at 12:42













    • @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

      – Jacob Stevenson
      Jun 6 '18 at 16:30






    • 3





      This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

      – Ted Petrou
      Jun 13 '18 at 12:27











    • Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

      – Jacob Stevenson
      Jun 13 '18 at 16:35
















    7














    If you replace the internal dictionary with a list of tuples it gets rid of the warning message



    import pandas as pd

    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

    df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})





    share|improve this answer
























    • Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

      – Peanut
      Jun 6 '18 at 12:42













    • @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

      – Jacob Stevenson
      Jun 6 '18 at 16:30






    • 3





      This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

      – Ted Petrou
      Jun 13 '18 at 12:27











    • Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

      – Jacob Stevenson
      Jun 13 '18 at 16:35














    7












    7








    7







    If you replace the internal dictionary with a list of tuples it gets rid of the warning message



    import pandas as pd

    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

    df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})





    share|improve this answer













    If you replace the internal dictionary with a list of tuples it gets rid of the warning message



    import pandas as pd

    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})

    df.groupby(["User"]).agg({"Amount": [("Sum", "sum"), ("Count", "count")]})






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jun 5 '18 at 9:44









    Jacob StevensonJacob Stevenson

    1,753178




    1,753178













    • Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

      – Peanut
      Jun 6 '18 at 12:42













    • @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

      – Jacob Stevenson
      Jun 6 '18 at 16:30






    • 3





      This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

      – Ted Petrou
      Jun 13 '18 at 12:27











    • Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

      – Jacob Stevenson
      Jun 13 '18 at 16:35



















    • Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

      – Peanut
      Jun 6 '18 at 12:42













    • @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

      – Jacob Stevenson
      Jun 6 '18 at 16:30






    • 3





      This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

      – Ted Petrou
      Jun 13 '18 at 12:27











    • Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

      – Jacob Stevenson
      Jun 13 '18 at 16:35

















    Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

    – Peanut
    Jun 6 '18 at 12:42







    Do you know if this will also (presumably) work in the future release or is also just to workaround the warning message?

    – Peanut
    Jun 6 '18 at 12:42















    @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

    – Jacob Stevenson
    Jun 6 '18 at 16:30





    @Peanut, I do not know. But if the warning message is not there, then presumably (as you say) it will continue to be supported.

    – Jacob Stevenson
    Jun 6 '18 at 16:30




    3




    3





    This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

    – Ted Petrou
    Jun 13 '18 at 12:27





    This is an undocumented and accidental feature and I would highly suggest no one using this syntax as it may not work in the future.

    – Ted Petrou
    Jun 13 '18 at 12:27













    Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

    – Jacob Stevenson
    Jun 13 '18 at 16:35





    Thanks for the information @TedPetrou. And thanks for the link to the discussion in your answer. It sounds like a difficult question to find the right syntax.

    – Jacob Stevenson
    Jun 13 '18 at 16:35











    0














    This is what I did:



    Create a fake dataset:



    import pandas as pd
    df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
    "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
    'Score': [9, 1, 8, 7, 7, 6, 9]})
    df


    O/P:



        Amount  Score   User
    0 10.0 9 user1
    1 5.0 1 user2
    2 8.0 8 user2
    3 10.5 7 user3
    4 7.5 7 user2
    5 8.0 6 user1
    6 9.0 9 user3


    I first made the User the index, and then a groupby:



    ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
    ans


    Solution:



        Sum Count
    User
    user1 18.0 2
    user2 20.5 3
    user3 19.5 2





    share|improve this answer




























      0














      This is what I did:



      Create a fake dataset:



      import pandas as pd
      df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
      "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
      'Score': [9, 1, 8, 7, 7, 6, 9]})
      df


      O/P:



          Amount  Score   User
      0 10.0 9 user1
      1 5.0 1 user2
      2 8.0 8 user2
      3 10.5 7 user3
      4 7.5 7 user2
      5 8.0 6 user1
      6 9.0 9 user3


      I first made the User the index, and then a groupby:



      ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
      ans


      Solution:



          Sum Count
      User
      user1 18.0 2
      user2 20.5 3
      user3 19.5 2





      share|improve this answer


























        0












        0








        0







        This is what I did:



        Create a fake dataset:



        import pandas as pd
        df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
        "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
        'Score': [9, 1, 8, 7, 7, 6, 9]})
        df


        O/P:



            Amount  Score   User
        0 10.0 9 user1
        1 5.0 1 user2
        2 8.0 8 user2
        3 10.5 7 user3
        4 7.5 7 user2
        5 8.0 6 user1
        6 9.0 9 user3


        I first made the User the index, and then a groupby:



        ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
        ans


        Solution:



            Sum Count
        User
        user1 18.0 2
        user2 20.5 3
        user3 19.5 2





        share|improve this answer













        This is what I did:



        Create a fake dataset:



        import pandas as pd
        df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],
        "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],
        'Score': [9, 1, 8, 7, 7, 6, 9]})
        df


        O/P:



            Amount  Score   User
        0 10.0 9 user1
        1 5.0 1 user2
        2 8.0 8 user2
        3 10.5 7 user3
        4 7.5 7 user2
        5 8.0 6 user1
        6 9.0 9 user3


        I first made the User the index, and then a groupby:



        ans = df.set_index('User').groupby(level=0)['Amount'].agg([('Sum','sum'),('Count','count')])
        ans


        Solution:



            Sum Count
        User
        user1 18.0 2
        user2 20.5 3
        user3 19.5 2






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 12 '18 at 14:46









        Jyothsna HarithsaJyothsna Harithsa

        445




        445






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44635626%2frename-result-columns-from-pandas-aggregation-futurewarning-using-a-dict-with%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown