Loading pandas table with column names and dtypes
I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.
Here's an example of what the data looks like:
# Header text
# Header text
# id col1 col2 col3 col4
0 0.44:66 0 1600 45.6e-3
1 0.25:7f 0 1600 52.1e-3
2 0.31:5e 0 1600 33.7e-3
...
2500 0.42.6f 0 1400 42.1e-3
# END
# Footer text
I am reading it in as follows:
import pandas as pd
with open(filename, 'rt') as f:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
Then when I print(df.dtypes) I get the following:
# id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
This is fine, except for the # in the name of the first column. So I tried specifying the names:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"])
but then I get print(df.dtypes)
id object
col1 object
col2 object
col3 object
col4 object
dtype: object
So I tried specifying both names and dtypes:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"],
dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})
but this gives an error:
ValueError: Unable to convert column id to type <class 'int'>
What's wrong? How can I load the table with the column names I want and the appropriate dtypes?
python pandas python-3.6
add a comment |
I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.
Here's an example of what the data looks like:
# Header text
# Header text
# id col1 col2 col3 col4
0 0.44:66 0 1600 45.6e-3
1 0.25:7f 0 1600 52.1e-3
2 0.31:5e 0 1600 33.7e-3
...
2500 0.42.6f 0 1400 42.1e-3
# END
# Footer text
I am reading it in as follows:
import pandas as pd
with open(filename, 'rt') as f:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
Then when I print(df.dtypes) I get the following:
# id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
This is fine, except for the # in the name of the first column. So I tried specifying the names:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"])
but then I get print(df.dtypes)
id object
col1 object
col2 object
col3 object
col4 object
dtype: object
So I tried specifying both names and dtypes:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"],
dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})
but this gives an error:
ValueError: Unable to convert column id to type <class 'int'>
What's wrong? How can I load the table with the column names I want and the appropriate dtypes?
python pandas python-3.6
add a comment |
I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.
Here's an example of what the data looks like:
# Header text
# Header text
# id col1 col2 col3 col4
0 0.44:66 0 1600 45.6e-3
1 0.25:7f 0 1600 52.1e-3
2 0.31:5e 0 1600 33.7e-3
...
2500 0.42.6f 0 1400 42.1e-3
# END
# Footer text
I am reading it in as follows:
import pandas as pd
with open(filename, 'rt') as f:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
Then when I print(df.dtypes) I get the following:
# id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
This is fine, except for the # in the name of the first column. So I tried specifying the names:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"])
but then I get print(df.dtypes)
id object
col1 object
col2 object
col3 object
col4 object
dtype: object
So I tried specifying both names and dtypes:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"],
dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})
but this gives an error:
ValueError: Unable to convert column id to type <class 'int'>
What's wrong? How can I load the table with the column names I want and the appropriate dtypes?
python pandas python-3.6
I'm fairly new to using Pandas and I seem to be having some trouble loading a table from a textfile.
Here's an example of what the data looks like:
# Header text
# Header text
# id col1 col2 col3 col4
0 0.44:66 0 1600 45.6e-3
1 0.25:7f 0 1600 52.1e-3
2 0.31:5e 0 1600 33.7e-3
...
2500 0.42.6f 0 1400 42.1e-3
# END
# Footer text
I am reading it in as follows:
import pandas as pd
with open(filename, 'rt') as f:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
Then when I print(df.dtypes) I get the following:
# id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
This is fine, except for the # in the name of the first column. So I tried specifying the names:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"])
but then I get print(df.dtypes)
id object
col1 object
col2 object
col3 object
col4 object
dtype: object
So I tried specifying both names and dtypes:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python',
names=["id", "col1", "col2", "col3", "col4"],
dtypes={"id":int,"col1":str,"col2":int, "col3":int,"col4":float})
but this gives an error:
ValueError: Unable to convert column id to type <class 'int'>
What's wrong? How can I load the table with the column names I want and the appropriate dtypes?
python pandas python-3.6
python pandas python-3.6
edited Nov 12 '18 at 12:44
Sandeep Kadapa
6,092429
6,092429
asked Oct 26 '18 at 10:25
berkelemberkelem
8702720
8702720
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
A few comments.
Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.
Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)
But to answer your question:
If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.
add a comment |
I have found a workaround solution but I am open to better solutions if they are out there.
I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
df.rename(columns={'# id':'id'}, inplace=True)
Then I used print(df.dtypes) to get the desired output:
id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
add a comment |
Use astype
df['id'] = df['id'].astype(int)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53006661%2floading-pandas-table-with-column-names-and-dtypes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
A few comments.
Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.
Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)
But to answer your question:
If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.
add a comment |
A few comments.
Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.
Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)
But to answer your question:
If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.
add a comment |
A few comments.
Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.
Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)
But to answer your question:
If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.
A few comments.
Firstly, I don't understand why your code works at all, given that your columns appear to be separated by whitespace (?). You'd usually require an extra sep=' ' in the call to read_table or read_csv.
Secondly, you don't need to open the file first, you can just pass the filename to the pandas function: pd.read_table(filename, ...)
But to answer your question:
If you specify the column names explicitly with names=[...] and they don't match the header of the file, pandas assumes there is no header. You therefore have to skip an additional row (skiprows=3), or else pandas will assume that line is part of the table data and thus set the data type to object (i.e. strings) for all columns.
answered Oct 26 '18 at 11:32
jhansenjhansen
590313
590313
add a comment |
add a comment |
I have found a workaround solution but I am open to better solutions if they are out there.
I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
df.rename(columns={'# id':'id'}, inplace=True)
Then I used print(df.dtypes) to get the desired output:
id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
add a comment |
I have found a workaround solution but I am open to better solutions if they are out there.
I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
df.rename(columns={'# id':'id'}, inplace=True)
Then I used print(df.dtypes) to get the desired output:
id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
add a comment |
I have found a workaround solution but I am open to better solutions if they are out there.
I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
df.rename(columns={'# id':'id'}, inplace=True)
Then I used print(df.dtypes) to get the desired output:
id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
I have found a workaround solution but I am open to better solutions if they are out there.
I loaded the table without specifying the names or dtypes and then renamed the problematic column name as:
df = pd.read_table(f, skiprows=2, skipfooter=2, engine='python')
df.rename(columns={'# id':'id'}, inplace=True)
Then I used print(df.dtypes) to get the desired output:
id int64
col1 object
col2 int64
col3 int64
col4 float64
dtype: object
edited Nov 1 '18 at 12:13
Sandeep Kadapa
6,092429
6,092429
answered Oct 26 '18 at 10:25
berkelemberkelem
8702720
8702720
add a comment |
add a comment |
Use astype
df['id'] = df['id'].astype(int)
add a comment |
Use astype
df['id'] = df['id'].astype(int)
add a comment |
Use astype
df['id'] = df['id'].astype(int)
Use astype
df['id'] = df['id'].astype(int)
answered Oct 26 '18 at 11:06
VishnudevVishnudev
1,131517
1,131517
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53006661%2floading-pandas-table-with-column-names-and-dtypes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown