Loading pandas table with column names and dtypes












0














I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.



Here's an example of what the data looks like:



#    Header text
# Header text
# id col1 col2 col3 col4
0 0.44:66 0 1600 45.6e-3
1 0.25:7f 0 1600 52.1e-3
2 0.31:5e 0 1600 33.7e-3
...
2500 0.42.6f 0 1400 42.1e-3
# END
# Footer text


I am reading it in as follows:



import pandas as pd

with open(filename, 'rt') as f:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')


Then when I print(df.dtypes) I get the following:



# id        int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object


This is fine, except for the # in the name of the first column. So I tried specifying the names:



df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
names=["id", "col1", "col2", "col3", "col4"])


but then I get print(df.dtypes)



id          object
col1 object
col2 object
col3 object
col4 object
dtype: object


So I tried specifying both names and dtypes:



df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
names=["id", "col1", "col2", "col3", "col4"],
dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})


but this gives an error:



ValueError: Unable to convert column id to type <class 'int'>


What's wrong? How can I load the table with the column names I want and the appropriate dtypes?










share|improve this question





























    0














    I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.



    Here's an example of what the data looks like:



    #    Header text
    # Header text
    # id col1 col2 col3 col4
    0 0.44:66 0 1600 45.6e-3
    1 0.25:7f 0 1600 52.1e-3
    2 0.31:5e 0 1600 33.7e-3
    ...
    2500 0.42.6f 0 1400 42.1e-3
    # END
    # Footer text


    I am reading it in as follows:



    import pandas as pd

    with open(filename, 'rt') as f:
    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')


    Then when I print(df.dtypes) I get the following:



    # id        int64
    col1 object
    col2 int64
    col3 int64
    col4 float64
    dtype: object


    This is fine, except for the # in the name of the first column. So I tried specifying the names:



    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
    names=["id", "col1", "col2", "col3", "col4"])


    but then I get print(df.dtypes)



    id          object
    col1 object
    col2 object
    col3 object
    col4 object
    dtype: object


    So I tried specifying both names and dtypes:



    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
    names=["id", "col1", "col2", "col3", "col4"],
    dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})


    but this gives an error:



    ValueError: Unable to convert column id to type <class 'int'>


    What's wrong? How can I load the table with the column names I want and the appropriate dtypes?










    share|improve this question



























      0












      0








      0


      0





      I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.



      Here's an example of what the data looks like:



      #    Header text
      # Header text
      # id col1 col2 col3 col4
      0 0.44:66 0 1600 45.6e-3
      1 0.25:7f 0 1600 52.1e-3
      2 0.31:5e 0 1600 33.7e-3
      ...
      2500 0.42.6f 0 1400 42.1e-3
      # END
      # Footer text


      I am reading it in as follows:



      import pandas as pd

      with open(filename, 'rt') as f:
      df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')


      Then when I print(df.dtypes) I get the following:



      # id        int64
      col1 object
      col2 int64
      col3 int64
      col4 float64
      dtype: object


      This is fine, except for the # in the name of the first column. So I tried specifying the names:



      df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
      names=["id", "col1", "col2", "col3", "col4"])


      but then I get print(df.dtypes)



      id          object
      col1 object
      col2 object
      col3 object
      col4 object
      dtype: object


      So I tried specifying both names and dtypes:



      df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
      names=["id", "col1", "col2", "col3", "col4"],
      dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})


      but this gives an error:



      ValueError: Unable to convert column id to type <class 'int'>


      What's wrong? How can I load the table with the column names I want and the appropriate dtypes?










      share|improve this question















      I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.



      Here's an example of what the data looks like:



      #    Header text
      # Header text
      # id col1 col2 col3 col4
      0 0.44:66 0 1600 45.6e-3
      1 0.25:7f 0 1600 52.1e-3
      2 0.31:5e 0 1600 33.7e-3
      ...
      2500 0.42.6f 0 1400 42.1e-3
      # END
      # Footer text


      I am reading it in as follows:



      import pandas as pd

      with open(filename, 'rt') as f:
      df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')


      Then when I print(df.dtypes) I get the following:



      # id        int64
      col1 object
      col2 int64
      col3 int64
      col4 float64
      dtype: object


      This is fine, except for the # in the name of the first column. So I tried specifying the names:



      df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
      names=["id", "col1", "col2", "col3", "col4"])


      but then I get print(df.dtypes)



      id          object
      col1 object
      col2 object
      col3 object
      col4 object
      dtype: object


      So I tried specifying both names and dtypes:



      df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 
      names=["id", "col1", "col2", "col3", "col4"],
      dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})


      but this gives an error:



      ValueError: Unable to convert column id to type <class 'int'>


      What's wrong? How can I load the table with the column names I want and the appropriate dtypes?







      python pandas python-3.6






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 12 '18 at 12:44









      Sandeep Kadapa

      6,092429




      6,092429










      asked Oct 26 '18 at 10:25









      berkelemberkelem

      8702720




      8702720
























          3 Answers
          3






          active

          oldest

          votes


















          1














          A few comments.



          Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.



          Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)



          But to answer your question:



          If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.






          share|improve this answer





























            1














            I have found a workaround solution but I am open to better solutions if they are out there.



            I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:



            df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
            df.rename(columns={'# id':'id'}, inplace=True)


            Then I used print(df.dtypes) to get the desired output:



            id          int64
            col1 object
            col2 int64
            col3 int64
            col4 float64
            dtype: object





            share|improve this answer































              0














              Use astype



              df['id'] = df['id'].astype(int)





              share|improve this answer





















                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53006661%2floading-pandas-table-with-column-names-and-dtypes%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                1














                A few comments.



                Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.



                Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)



                But to answer your question:



                If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.






                share|improve this answer


























                  1














                  A few comments.



                  Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.



                  Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)



                  But to answer your question:



                  If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.






                  share|improve this answer
























                    1












                    1








                    1






                    A few comments.



                    Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.



                    Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)



                    But to answer your question:



                    If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.






                    share|improve this answer












                    A few comments.



                    Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.



                    Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)



                    But to answer your question:



                    If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Oct 26 '18 at 11:32









                    jhansenjhansen

                    590313




                    590313

























                        1














                        I have found a workaround solution but I am open to better solutions if they are out there.



                        I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:



                        df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
                        df.rename(columns={'# id':'id'}, inplace=True)


                        Then I used print(df.dtypes) to get the desired output:



                        id          int64
                        col1 object
                        col2 int64
                        col3 int64
                        col4 float64
                        dtype: object





                        share|improve this answer




























                          1














                          I have found a workaround solution but I am open to better solutions if they are out there.



                          I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:



                          df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
                          df.rename(columns={'# id':'id'}, inplace=True)


                          Then I used print(df.dtypes) to get the desired output:



                          id          int64
                          col1 object
                          col2 int64
                          col3 int64
                          col4 float64
                          dtype: object





                          share|improve this answer


























                            1












                            1








                            1






                            I have found a workaround solution but I am open to better solutions if they are out there.



                            I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:



                            df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
                            df.rename(columns={'# id':'id'}, inplace=True)


                            Then I used print(df.dtypes) to get the desired output:



                            id          int64
                            col1 object
                            col2 int64
                            col3 int64
                            col4 float64
                            dtype: object





                            share|improve this answer














                            I have found a workaround solution but I am open to better solutions if they are out there.



                            I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:



                            df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
                            df.rename(columns={'# id':'id'}, inplace=True)


                            Then I used print(df.dtypes) to get the desired output:



                            id          int64
                            col1 object
                            col2 int64
                            col3 int64
                            col4 float64
                            dtype: object






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 1 '18 at 12:13









                            Sandeep Kadapa

                            6,092429




                            6,092429










                            answered Oct 26 '18 at 10:25









                            berkelemberkelem

                            8702720




                            8702720























                                0














                                Use astype



                                df['id'] = df['id'].astype(int)





                                share|improve this answer


























                                  0














                                  Use astype



                                  df['id'] = df['id'].astype(int)





                                  share|improve this answer
























                                    0












                                    0








                                    0






                                    Use astype



                                    df['id'] = df['id'].astype(int)





                                    share|improve this answer












                                    Use astype



                                    df['id'] = df['id'].astype(int)






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Oct 26 '18 at 11:06









                                    VishnudevVishnudev

                                    1,131517




                                    1,131517






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53006661%2floading-pandas-table-with-column-names-and-dtypes%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Coverage of Google Street View

                                        Full-time equivalent

                                        Surfing