Add string column to float matrix NumPy












2















I'm looking for a method to add a column of float values to a matrix of string values.



Mymatrix = 
[["a","b"],
["c","d"]]


I need to have a matrix like this =



[["a","b",0.4],
["c","d",0.6]]









share|improve this question




















  • 4





    You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.

    – jdehesa
    Nov 13 '18 at 10:59











  • You're right! Thank you so much

    – Vin B.
    Nov 13 '18 at 13:03
















2















I'm looking for a method to add a column of float values to a matrix of string values.



Mymatrix = 
[["a","b"],
["c","d"]]


I need to have a matrix like this =



[["a","b",0.4],
["c","d",0.6]]









share|improve this question




















  • 4





    You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.

    – jdehesa
    Nov 13 '18 at 10:59











  • You're right! Thank you so much

    – Vin B.
    Nov 13 '18 at 13:03














2












2








2








I'm looking for a method to add a column of float values to a matrix of string values.



Mymatrix = 
[["a","b"],
["c","d"]]


I need to have a matrix like this =



[["a","b",0.4],
["c","d",0.6]]









share|improve this question
















I'm looking for a method to add a column of float values to a matrix of string values.



Mymatrix = 
[["a","b"],
["c","d"]]


I need to have a matrix like this =



[["a","b",0.4],
["c","d",0.6]]






python string numpy matrix floating-point






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 11:02









Mehrdad Pedramfar

5,16211337




5,16211337










asked Nov 13 '18 at 10:54









Vin B.Vin B.

134




134








  • 4





    You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.

    – jdehesa
    Nov 13 '18 at 10:59











  • You're right! Thank you so much

    – Vin B.
    Nov 13 '18 at 13:03














  • 4





    You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.

    – jdehesa
    Nov 13 '18 at 10:59











  • You're right! Thank you so much

    – Vin B.
    Nov 13 '18 at 13:03








4




4





You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.

– jdehesa
Nov 13 '18 at 10:59





You cannot have that in NumPy (unless you have an array of object, which is not usually very useful). Consider using just Python (nested) lists or other advanced structures like a Pandas data frame.

– jdehesa
Nov 13 '18 at 10:59













You're right! Thank you so much

– Vin B.
Nov 13 '18 at 13:03





You're right! Thank you so much

– Vin B.
Nov 13 '18 at 13:03












3 Answers
3






active

oldest

votes


















0














As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype= argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for loops when you want to copy the entire contents between arrays. See my example below (using your data):



Mymatrix = np.array([["a","b"], ["c","d"]])
Mycol = np.array([0.4, 0.6])

dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
new_recarr = np.empty((2,), dtype=dt)
new_recarr['col0'] = Mymatrix[:,0]
new_recarr['col1'] = Mymatrix[:,1]
new_recarr['col2'] = Mycol[:]
print (new_recarr)


Resulting output looks like this:



[('a', 'b',  0.4) ('c', 'd',  0.6)]


From there, use formatted strings to print.

You can also copy from a recarray to an ndarray if you reverse assignment order in my example.

Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
is ndarray faster than recarray access?






share|improve this answer































    1














    I would suggest using a pandas DataFrame instead:



    import pandas as pd

    df = pd.DataFrame([["a","b",0.4],
    ["c","d",0.6]])

    print(df)

    0 1 2
    0 a b 0.4
    1 c d 0.6


    You can also specify column (Series) names:



    df = pd.DataFrame([["a","b",0.4],
    ["c","d",0.6]], columns=['A', 'B', 'C'])
    df
    A B C
    0 a b 0.4
    1 c d 0.6





    share|improve this answer































      0














      You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:



      raw=[["a","b",0.4],
      ["c","d",0.6]]

      dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])

      aligned=ndarray(len(raw),dt)

      for i in range (len(raw)):
      for j in range (len(dt)):
      aligned[i][j]=raw[i][j]


      You can also use pandas, but you loose often some performance.






      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279442%2fadd-string-column-to-float-matrix-numpy%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0














        As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype= argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for loops when you want to copy the entire contents between arrays. See my example below (using your data):



        Mymatrix = np.array([["a","b"], ["c","d"]])
        Mycol = np.array([0.4, 0.6])

        dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
        new_recarr = np.empty((2,), dtype=dt)
        new_recarr['col0'] = Mymatrix[:,0]
        new_recarr['col1'] = Mymatrix[:,1]
        new_recarr['col2'] = Mycol[:]
        print (new_recarr)


        Resulting output looks like this:



        [('a', 'b',  0.4) ('c', 'd',  0.6)]


        From there, use formatted strings to print.

        You can also copy from a recarray to an ndarray if you reverse assignment order in my example.

        Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
        is ndarray faster than recarray access?






        share|improve this answer




























          0














          As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype= argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for loops when you want to copy the entire contents between arrays. See my example below (using your data):



          Mymatrix = np.array([["a","b"], ["c","d"]])
          Mycol = np.array([0.4, 0.6])

          dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
          new_recarr = np.empty((2,), dtype=dt)
          new_recarr['col0'] = Mymatrix[:,0]
          new_recarr['col1'] = Mymatrix[:,1]
          new_recarr['col2'] = Mycol[:]
          print (new_recarr)


          Resulting output looks like this:



          [('a', 'b',  0.4) ('c', 'd',  0.6)]


          From there, use formatted strings to print.

          You can also copy from a recarray to an ndarray if you reverse assignment order in my example.

          Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
          is ndarray faster than recarray access?






          share|improve this answer


























            0












            0








            0







            As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype= argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for loops when you want to copy the entire contents between arrays. See my example below (using your data):



            Mymatrix = np.array([["a","b"], ["c","d"]])
            Mycol = np.array([0.4, 0.6])

            dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
            new_recarr = np.empty((2,), dtype=dt)
            new_recarr['col0'] = Mymatrix[:,0]
            new_recarr['col1'] = Mymatrix[:,1]
            new_recarr['col2'] = Mycol[:]
            print (new_recarr)


            Resulting output looks like this:



            [('a', 'b',  0.4) ('c', 'd',  0.6)]


            From there, use formatted strings to print.

            You can also copy from a recarray to an ndarray if you reverse assignment order in my example.

            Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
            is ndarray faster than recarray access?






            share|improve this answer













            As noted you can't mix data types in a ndarray, but can do so in a structured or record array. They are similar in that you can mix datatypes as defined by the dtype= argument (it defines the datatypes and field names). Record arrays allow access to fields of structured arrays by attribute instead of only by index. You don't need for loops when you want to copy the entire contents between arrays. See my example below (using your data):



            Mymatrix = np.array([["a","b"], ["c","d"]])
            Mycol = np.array([0.4, 0.6])

            dt=np.dtype([('col0','U1'),('col1','U1'),('col2',float)])
            new_recarr = np.empty((2,), dtype=dt)
            new_recarr['col0'] = Mymatrix[:,0]
            new_recarr['col1'] = Mymatrix[:,1]
            new_recarr['col2'] = Mycol[:]
            print (new_recarr)


            Resulting output looks like this:



            [('a', 'b',  0.4) ('c', 'd',  0.6)]


            From there, use formatted strings to print.

            You can also copy from a recarray to an ndarray if you reverse assignment order in my example.

            Note: I discovered there can be a significant performance penalty when using recarrays. See answer in this thread:
            is ndarray faster than recarray access?







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 13 '18 at 16:19









            kcw78kcw78

            345110




            345110

























                1














                I would suggest using a pandas DataFrame instead:



                import pandas as pd

                df = pd.DataFrame([["a","b",0.4],
                ["c","d",0.6]])

                print(df)

                0 1 2
                0 a b 0.4
                1 c d 0.6


                You can also specify column (Series) names:



                df = pd.DataFrame([["a","b",0.4],
                ["c","d",0.6]], columns=['A', 'B', 'C'])
                df
                A B C
                0 a b 0.4
                1 c d 0.6





                share|improve this answer




























                  1














                  I would suggest using a pandas DataFrame instead:



                  import pandas as pd

                  df = pd.DataFrame([["a","b",0.4],
                  ["c","d",0.6]])

                  print(df)

                  0 1 2
                  0 a b 0.4
                  1 c d 0.6


                  You can also specify column (Series) names:



                  df = pd.DataFrame([["a","b",0.4],
                  ["c","d",0.6]], columns=['A', 'B', 'C'])
                  df
                  A B C
                  0 a b 0.4
                  1 c d 0.6





                  share|improve this answer


























                    1












                    1








                    1







                    I would suggest using a pandas DataFrame instead:



                    import pandas as pd

                    df = pd.DataFrame([["a","b",0.4],
                    ["c","d",0.6]])

                    print(df)

                    0 1 2
                    0 a b 0.4
                    1 c d 0.6


                    You can also specify column (Series) names:



                    df = pd.DataFrame([["a","b",0.4],
                    ["c","d",0.6]], columns=['A', 'B', 'C'])
                    df
                    A B C
                    0 a b 0.4
                    1 c d 0.6





                    share|improve this answer













                    I would suggest using a pandas DataFrame instead:



                    import pandas as pd

                    df = pd.DataFrame([["a","b",0.4],
                    ["c","d",0.6]])

                    print(df)

                    0 1 2
                    0 a b 0.4
                    1 c d 0.6


                    You can also specify column (Series) names:



                    df = pd.DataFrame([["a","b",0.4],
                    ["c","d",0.6]], columns=['A', 'B', 'C'])
                    df
                    A B C
                    0 a b 0.4
                    1 c d 0.6






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 13 '18 at 11:12









                    AlexAlex

                    763621




                    763621























                        0














                        You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:



                        raw=[["a","b",0.4],
                        ["c","d",0.6]]

                        dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])

                        aligned=ndarray(len(raw),dt)

                        for i in range (len(raw)):
                        for j in range (len(dt)):
                        aligned[i][j]=raw[i][j]


                        You can also use pandas, but you loose often some performance.






                        share|improve this answer




























                          0














                          You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:



                          raw=[["a","b",0.4],
                          ["c","d",0.6]]

                          dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])

                          aligned=ndarray(len(raw),dt)

                          for i in range (len(raw)):
                          for j in range (len(dt)):
                          aligned[i][j]=raw[i][j]


                          You can also use pandas, but you loose often some performance.






                          share|improve this answer


























                            0












                            0








                            0







                            You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:



                            raw=[["a","b",0.4],
                            ["c","d",0.6]]

                            dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])

                            aligned=ndarray(len(raw),dt)

                            for i in range (len(raw)):
                            for j in range (len(dt)):
                            aligned[i][j]=raw[i][j]


                            You can also use pandas, but you loose often some performance.






                            share|improve this answer













                            You need to understand why you do that. Numpy is efficient because data are aligned in memory. So mixing types is generally source of bad performance. but in your case you can preserve alignement, since all your strings have same length. since types are not homogeneous, you can use structured array:



                            raw=[["a","b",0.4],
                            ["c","d",0.6]]

                            dt=dtype([('col0','U1'),('col1','U1'),('col2',float)])

                            aligned=ndarray(len(raw),dt)

                            for i in range (len(raw)):
                            for j in range (len(dt)):
                            aligned[i][j]=raw[i][j]


                            You can also use pandas, but you loose often some performance.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 13 '18 at 12:49









                            B. M.B. M.

                            13.2k11934




                            13.2k11934






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53279442%2fadd-string-column-to-float-matrix-numpy%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Full-time equivalent

                                さくらももこ

                                13 indicted, 8 arrested in Calif. drug cartel investigation