Loading pandas table with column names and dtypes

I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.

Here's an example of what the data looks like:

#    Header text

#    Header text

# id col1 col2 col3 col4

0 0.44:66 0 1600 45.6e-3

1 0.25:7f 0 1600 52.1e-3

2 0.31:5e 0 1600 33.7e-3

...

2500 0.42.6f 0 1400 42.1e-3

# END

# Footer text

I am reading it in as follows:

import pandas as pd



with open(filename, 'rt') as f:

    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

Then when I print(df.dtypes) I get the following:

# id        int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

This is fine, except for the # in the name of the first column. So I tried specifying the names:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"])

but then I get print(df.dtypes)

id          object

col1        object

col2        object

col3        object

col4        object

dtype: object

So I tried specifying both names and dtypes:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"], 

                   dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})

but this gives an error:

ValueError: Unable to convert column id to type <class 'int'>

What's wrong? How can I load the table with the column names I want and the appropriate dtypes?

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

asked Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.

Here's an example of what the data looks like:

#    Header text

#    Header text

# id col1 col2 col3 col4

0 0.44:66 0 1600 45.6e-3

1 0.25:7f 0 1600 52.1e-3

2 0.31:5e 0 1600 33.7e-3

...

2500 0.42.6f 0 1400 42.1e-3

# END

# Footer text

I am reading it in as follows:

import pandas as pd



with open(filename, 'rt') as f:

    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

Then when I print(df.dtypes) I get the following:

# id        int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

This is fine, except for the # in the name of the first column. So I tried specifying the names:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"])

but then I get print(df.dtypes)

id          object

col1        object

col2        object

col3        object

col4        object

dtype: object

So I tried specifying both names and dtypes:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"], 

                   dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})

but this gives an error:

ValueError: Unable to convert column id to type <class 'int'>

What's wrong? How can I load the table with the column names I want and the appropriate dtypes?

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

asked Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.

Here's an example of what the data looks like:

#    Header text

#    Header text

# id col1 col2 col3 col4

0 0.44:66 0 1600 45.6e-3

1 0.25:7f 0 1600 52.1e-3

2 0.31:5e 0 1600 33.7e-3

...

2500 0.42.6f 0 1400 42.1e-3

# END

# Footer text

I am reading it in as follows:

import pandas as pd



with open(filename, 'rt') as f:

    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

Then when I print(df.dtypes) I get the following:

# id        int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

This is fine, except for the # in the name of the first column. So I tried specifying the names:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"])

but then I get print(df.dtypes)

id          object

col1        object

col2        object

col3        object

col4        object

dtype: object

So I tried specifying both names and dtypes:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"], 

                   dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})

but this gives an error:

ValueError: Unable to convert column id to type <class 'int'>

What's wrong? How can I load the table with the column names I want and the appropriate dtypes?

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

asked Oct 26 '18 at 10:25

berkelem

8702720

I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.

Here's an example of what the data looks like:

#    Header text

#    Header text

# id col1 col2 col3 col4

0 0.44:66 0 1600 45.6e-3

1 0.25:7f 0 1600 52.1e-3

2 0.31:5e 0 1600 33.7e-3

...

2500 0.42.6f 0 1400 42.1e-3

# END

# Footer text

I am reading it in as follows:

import pandas as pd



with open(filename, 'rt') as f:

    df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

Then when I print(df.dtypes) I get the following:

# id        int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

This is fine, except for the # in the name of the first column. So I tried specifying the names:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"])

but then I get print(df.dtypes)

id          object

col1        object

col2        object

col3        object

col4        object

dtype: object

So I tried specifying both names and dtypes:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python', 

                   names=["id", "col1", "col2", "col3", "col4"], 

                   dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})

but this gives an error:

ValueError: Unable to convert column id to type <class 'int'>

What's wrong? How can I load the table with the column names I want and the appropriate dtypes?

python pandas python-3.6

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

asked Oct 26 '18 at 10:25

berkelem

8702720

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

asked Oct 26 '18 at 10:25

berkelem

8702720

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

edited Nov 12 '18 at 12:44

Sandeep Kadapa

6,092429

asked Oct 26 '18 at 10:25

berkelem

8702720

asked Oct 26 '18 at 10:25

berkelem

8702720

asked Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

3 Answers
3

active

oldest

votes

A few comments.

Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.

Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)

But to answer your question:

If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.

answered Oct 26 '18 at 11:32

jhansen

590313

add a comment |

I have found a workaround solution but I am open to better solutions if they are out there.

I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

df.rename(columns={'# id':'id'}, inplace=True)

Then I used print(df.dtypes) to get the desired output:

id          int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

answered Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

Use astype

df['id'] = df['id'].astype(int)

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53006661%2floading-pandas-table-with-column-names-and-dtypes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

A few comments.

Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)

But to answer your question:

answered Oct 26 '18 at 11:32

jhansen

590313

add a comment |

A few comments.

Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)

But to answer your question:

answered Oct 26 '18 at 11:32

jhansen

590313

add a comment |

A few comments.

Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)

But to answer your question:

answered Oct 26 '18 at 11:32

jhansen

590313

A few comments.

Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)

But to answer your question:

answered Oct 26 '18 at 11:32

jhansen

590313

answered Oct 26 '18 at 11:32

jhansen

590313

answered Oct 26 '18 at 11:32

jhansen

590313

answered Oct 26 '18 at 11:32

jhansen

590313

add a comment |

I have found a workaround solution but I am open to better solutions if they are out there.

I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

df.rename(columns={'# id':'id'}, inplace=True)

Then I used print(df.dtypes) to get the desired output:

id          int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

answered Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

I have found a workaround solution but I am open to better solutions if they are out there.

I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

df.rename(columns={'# id':'id'}, inplace=True)

Then I used print(df.dtypes) to get the desired output:

id          int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

answered Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

I have found a workaround solution but I am open to better solutions if they are out there.

I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

df.rename(columns={'# id':'id'}, inplace=True)

Then I used print(df.dtypes) to get the desired output:

id          int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

answered Oct 26 '18 at 10:25

berkelem

8702720

I have found a workaround solution but I am open to better solutions if they are out there.

I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:

df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')

df.rename(columns={'# id':'id'}, inplace=True)

Then I used print(df.dtypes) to get the desired output:

id          int64

col1        object

col2        int64

col3        int64

col4        float64

dtype: object

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

answered Oct 26 '18 at 10:25

berkelem

8702720

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

edited Nov 1 '18 at 12:13

Sandeep Kadapa

6,092429

answered Oct 26 '18 at 10:25

berkelem

8702720

answered Oct 26 '18 at 10:25

berkelem

8702720

answered Oct 26 '18 at 10:25

berkelem

8702720

add a comment |

Use astype

df['id'] = df['id'].astype(int)

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

add a comment |

Use astype

df['id'] = df['id'].astype(int)

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

add a comment |

Use astype

df['id'] = df['id'].astype(int)

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

Use astype

df['id'] = df['id'].astype(int)

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

answered Oct 26 '18 at 11:06

Vishnudev

1,131517

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu