Read large text files in Python, line by line without loading it in to memory
I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
python
add a comment |
I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
python
And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11
this should be a separate question
– cmcginty
Jun 25 '11 at 2:35
1
duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38
add a comment |
I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
python
I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
python
python
edited May 23 '17 at 10:31
Community♦
11
11
asked Jun 25 '11 at 2:04
Bruno Rocha - rochacbruno
3,07331728
3,07331728
And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11
this should be a separate question
– cmcginty
Jun 25 '11 at 2:35
1
duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38
add a comment |
And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11
this should be a separate question
– cmcginty
Jun 25 '11 at 2:35
1
duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38
And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11
And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11
this should be a separate question
– cmcginty
Jun 25 '11 at 2:35
this should be a separate question
– cmcginty
Jun 25 '11 at 2:35
1
1
duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38
duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38
add a comment |
13 Answers
13
active
oldest
votes
I provided this answer because Keith's, while succinct, doesn't close the file explicitly
with open("log.txt") as infile:
for line in infile:
do_something_with(line)
15
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
37
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
3
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
1
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of usingfor
loop which iterates over the lines, you can usechunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50
|
show 5 more comments
All you need to do is use the file object as an iterator.
for line in open("log.txt"):
do_something_with(line)
Even better is using context manager in recent Python versions.
with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)
This will automatically close the file as well.
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
1
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
add a comment |
An old school approach:
fh = open(file_name, 'rt')
line = fh.readline()
while line:
# do stuff with line
line = fh.readline()
fh.close()
2
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
12
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
add a comment |
You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html
From the docs:
import fileinput
for line in fileinput.input("filename"):
process(line)
This will avoid copying the whole file into memory at once.
Although the docs show the snippet as "typical use", using it does not call theclose()
method of the returnedFileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally madefileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50
add a comment |
I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp
command using line by line reading and writing. It's CRAZY FAST.
#!/usr/bin/env python3.6
import sys
with open(sys.argv[2], 'w') as outfile:
with open(sys.argv[1]) as infile:
for line in infile:
outfile.write(line)
NOTE: Because python'sreadline
standardizes line endings, this has the side effect of converting documents with DOS line endings ofrn
to Unix line endings ofn
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back andrstrip
the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13
add a comment |
Here's what you do if you dont have newlines in the file:
with open('large_text.txt') as f:
while True:
c = f.read(1024)
if not c:
break
print(c)
add a comment |
The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.
dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.
import dask.dataframe as dd
df = dd.read_csv('filename.csv')
df.head(10) # return first 10 rows
df.tail(10) # return last 10 rows
# iterate rows
for idx, row in df.iterrows():
...
# group by my_field and return mean
df.groupby(df.my_field).value.mean().compute()
# slice by column
df[df.my_field=='XYZ'].compute()
add a comment |
How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.
Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.
def chunks(file,size=1024):
while 1:
startat=fh.tell()
print startat #file's object current position from the start
fh.seek(size,1) #offset from current postion -->1
data=fh.readline()
yield startat,fh.tell()-startat #doesnt store whole list in memory
if not data:
break
if os.path.isfile(fname):
try:
fh=open(fname,'rb')
except IOError as e: #file --> permission denied
print "I/O error({0}): {1}".format(e.errno, e.strerror)
except Exception as e1: #handle other exceptions such as attribute errors
print "Unexpected error: {0}".format(e1)
for ele in chunks(fh):
fh.seek(ele[0])#startat
data=fh.read(ele[1])#endat
print data
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
add a comment |
Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.
Then I had to remove a "=n" in the middle of each line.
Then I split the lines at the new line.
b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
data_list = data_chunk.split('n') #List containing lines in chunk
#print(data_list,'n')
#time.sleep(1)
for j in range(len(data_list)): #iterate through data_list to get each item
i += 1
line_of_data = data_list[j]
print(line_of_data)
Here is the code starting just above "print data" in Arohi's code.
add a comment |
I demonstrated a parallel byte level random access approach here in this other question:
Getting number of lines in a text file without readlines
Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.
add a comment |
Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files
https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d
download the file data_loading_utils.py and import it into your code
usage
import data_loading_utils.py.py
file_name = 'file_name.ext'
CHUNK_SIZE = 1000000
def process_lines(data, eof, file_name):
# check if end of file reached
if not eof:
# process data, data is one single line of the file
else:
# end of file reached
data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)
process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.
You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.
add a comment |
Please try this:
with open('filename','r',buffering=100000) as f:
for line in f:
print line
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
add a comment |
f=open('filename','r').read()
f1=f.split('n')
for i in range (len(f1)):
do_something_with(f1[i])
hope this helps.
3
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6475328%2fread-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
13 Answers
13
active
oldest
votes
13 Answers
13
active
oldest
votes
active
oldest
votes
active
oldest
votes
I provided this answer because Keith's, while succinct, doesn't close the file explicitly
with open("log.txt") as infile:
for line in infile:
do_something_with(line)
15
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
37
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
3
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
1
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of usingfor
loop which iterates over the lines, you can usechunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50
|
show 5 more comments
I provided this answer because Keith's, while succinct, doesn't close the file explicitly
with open("log.txt") as infile:
for line in infile:
do_something_with(line)
15
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
37
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
3
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
1
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of usingfor
loop which iterates over the lines, you can usechunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50
|
show 5 more comments
I provided this answer because Keith's, while succinct, doesn't close the file explicitly
with open("log.txt") as infile:
for line in infile:
do_something_with(line)
I provided this answer because Keith's, while succinct, doesn't close the file explicitly
with open("log.txt") as infile:
for line in infile:
do_something_with(line)
edited Jun 25 '11 at 3:28
answered Jun 25 '11 at 2:26
John La Rooy
208k39273428
208k39273428
15
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
37
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
3
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
1
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of usingfor
loop which iterates over the lines, you can usechunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50
|
show 5 more comments
15
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
37
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
3
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
1
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of usingfor
loop which iterates over the lines, you can usechunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50
15
15
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31
37
37
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36
3
3
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09
1
1
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using
for
loop which iterates over the lines, you can use chunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.– John La Rooy
Jan 9 '18 at 21:50
@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using
for
loop which iterates over the lines, you can use chunk = infile.read(chunksize)
to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.– John La Rooy
Jan 9 '18 at 21:50
|
show 5 more comments
All you need to do is use the file object as an iterator.
for line in open("log.txt"):
do_something_with(line)
Even better is using context manager in recent Python versions.
with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)
This will automatically close the file as well.
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
1
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
add a comment |
All you need to do is use the file object as an iterator.
for line in open("log.txt"):
do_something_with(line)
Even better is using context manager in recent Python versions.
with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)
This will automatically close the file as well.
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
1
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
add a comment |
All you need to do is use the file object as an iterator.
for line in open("log.txt"):
do_something_with(line)
Even better is using context manager in recent Python versions.
with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)
This will automatically close the file as well.
All you need to do is use the file object as an iterator.
for line in open("log.txt"):
do_something_with(line)
Even better is using context manager in recent Python versions.
with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)
This will automatically close the file as well.
edited Jun 25 '11 at 6:45
answered Jun 25 '11 at 2:07
Keith
30.5k74361
30.5k74361
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
1
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
add a comment |
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
1
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10
1
1
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43
add a comment |
An old school approach:
fh = open(file_name, 'rt')
line = fh.readline()
while line:
# do stuff with line
line = fh.readline()
fh.close()
2
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
12
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
add a comment |
An old school approach:
fh = open(file_name, 'rt')
line = fh.readline()
while line:
# do stuff with line
line = fh.readline()
fh.close()
2
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
12
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
add a comment |
An old school approach:
fh = open(file_name, 'rt')
line = fh.readline()
while line:
# do stuff with line
line = fh.readline()
fh.close()
An old school approach:
fh = open(file_name, 'rt')
line = fh.readline()
while line:
# do stuff with line
line = fh.readline()
fh.close()
answered Jun 25 '11 at 2:31
PTBNL
4,12742431
4,12742431
2
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
12
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
add a comment |
2
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
12
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
2
2
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44
12
12
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40
add a comment |
You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html
From the docs:
import fileinput
for line in fileinput.input("filename"):
process(line)
This will avoid copying the whole file into memory at once.
Although the docs show the snippet as "typical use", using it does not call theclose()
method of the returnedFileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally madefileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50
add a comment |
You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html
From the docs:
import fileinput
for line in fileinput.input("filename"):
process(line)
This will avoid copying the whole file into memory at once.
Although the docs show the snippet as "typical use", using it does not call theclose()
method of the returnedFileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally madefileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50
add a comment |
You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html
From the docs:
import fileinput
for line in fileinput.input("filename"):
process(line)
This will avoid copying the whole file into memory at once.
You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html
From the docs:
import fileinput
for line in fileinput.input("filename"):
process(line)
This will avoid copying the whole file into memory at once.
answered Jun 25 '11 at 2:06
Mikola
7,5522638
7,5522638
Although the docs show the snippet as "typical use", using it does not call theclose()
method of the returnedFileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally madefileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50
add a comment |
Although the docs show the snippet as "typical use", using it does not call theclose()
method of the returnedFileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally madefileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50
Although the docs show the snippet as "typical use", using it does not call the
close()
method of the returned FileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).– martineau
Jul 24 '12 at 3:50
Although the docs show the snippet as "typical use", using it does not call the
close()
method of the returned FileInput
class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput
compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).– martineau
Jul 24 '12 at 3:50
add a comment |
I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp
command using line by line reading and writing. It's CRAZY FAST.
#!/usr/bin/env python3.6
import sys
with open(sys.argv[2], 'w') as outfile:
with open(sys.argv[1]) as infile:
for line in infile:
outfile.write(line)
NOTE: Because python'sreadline
standardizes line endings, this has the side effect of converting documents with DOS line endings ofrn
to Unix line endings ofn
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back andrstrip
the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13
add a comment |
I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp
command using line by line reading and writing. It's CRAZY FAST.
#!/usr/bin/env python3.6
import sys
with open(sys.argv[2], 'w') as outfile:
with open(sys.argv[1]) as infile:
for line in infile:
outfile.write(line)
NOTE: Because python'sreadline
standardizes line endings, this has the side effect of converting documents with DOS line endings ofrn
to Unix line endings ofn
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back andrstrip
the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13
add a comment |
I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp
command using line by line reading and writing. It's CRAZY FAST.
#!/usr/bin/env python3.6
import sys
with open(sys.argv[2], 'w') as outfile:
with open(sys.argv[1]) as infile:
for line in infile:
outfile.write(line)
I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp
command using line by line reading and writing. It's CRAZY FAST.
#!/usr/bin/env python3.6
import sys
with open(sys.argv[2], 'w') as outfile:
with open(sys.argv[1]) as infile:
for line in infile:
outfile.write(line)
edited Aug 11 '17 at 3:43
answered Aug 10 '17 at 21:48
Bruno Bronosky
33.6k47780
33.6k47780
NOTE: Because python'sreadline
standardizes line endings, this has the side effect of converting documents with DOS line endings ofrn
to Unix line endings ofn
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back andrstrip
the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13
add a comment |
NOTE: Because python'sreadline
standardizes line endings, this has the side effect of converting documents with DOS line endings ofrn
to Unix line endings ofn
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back andrstrip
the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13
NOTE: Because python's
readline
standardizes line endings, this has the side effect of converting documents with DOS line endings of rn
to Unix line endings of n
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip
the lines. It was already perfect!– Bruno Bronosky
Aug 11 '17 at 13:13
NOTE: Because python's
readline
standardizes line endings, this has the side effect of converting documents with DOS line endings of rn
to Unix line endings of n
. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip
the lines. It was already perfect!– Bruno Bronosky
Aug 11 '17 at 13:13
add a comment |
Here's what you do if you dont have newlines in the file:
with open('large_text.txt') as f:
while True:
c = f.read(1024)
if not c:
break
print(c)
add a comment |
Here's what you do if you dont have newlines in the file:
with open('large_text.txt') as f:
while True:
c = f.read(1024)
if not c:
break
print(c)
add a comment |
Here's what you do if you dont have newlines in the file:
with open('large_text.txt') as f:
while True:
c = f.read(1024)
if not c:
break
print(c)
Here's what you do if you dont have newlines in the file:
with open('large_text.txt') as f:
while True:
c = f.read(1024)
if not c:
break
print(c)
answered May 6 '18 at 15:20
Ariel Cabib
1,532139
1,532139
add a comment |
add a comment |
The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.
dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.
import dask.dataframe as dd
df = dd.read_csv('filename.csv')
df.head(10) # return first 10 rows
df.tail(10) # return last 10 rows
# iterate rows
for idx, row in df.iterrows():
...
# group by my_field and return mean
df.groupby(df.my_field).value.mean().compute()
# slice by column
df[df.my_field=='XYZ'].compute()
add a comment |
The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.
dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.
import dask.dataframe as dd
df = dd.read_csv('filename.csv')
df.head(10) # return first 10 rows
df.tail(10) # return last 10 rows
# iterate rows
for idx, row in df.iterrows():
...
# group by my_field and return mean
df.groupby(df.my_field).value.mean().compute()
# slice by column
df[df.my_field=='XYZ'].compute()
add a comment |
The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.
dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.
import dask.dataframe as dd
df = dd.read_csv('filename.csv')
df.head(10) # return first 10 rows
df.tail(10) # return last 10 rows
# iterate rows
for idx, row in df.iterrows():
...
# group by my_field and return mean
df.groupby(df.my_field).value.mean().compute()
# slice by column
df[df.my_field=='XYZ'].compute()
The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.
dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.
import dask.dataframe as dd
df = dd.read_csv('filename.csv')
df.head(10) # return first 10 rows
df.tail(10) # return last 10 rows
# iterate rows
for idx, row in df.iterrows():
...
# group by my_field and return mean
df.groupby(df.my_field).value.mean().compute()
# slice by column
df[df.my_field=='XYZ'].compute()
answered Jan 22 '18 at 20:51
jpp
92.2k2053103
92.2k2053103
add a comment |
add a comment |
How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.
Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.
def chunks(file,size=1024):
while 1:
startat=fh.tell()
print startat #file's object current position from the start
fh.seek(size,1) #offset from current postion -->1
data=fh.readline()
yield startat,fh.tell()-startat #doesnt store whole list in memory
if not data:
break
if os.path.isfile(fname):
try:
fh=open(fname,'rb')
except IOError as e: #file --> permission denied
print "I/O error({0}): {1}".format(e.errno, e.strerror)
except Exception as e1: #handle other exceptions such as attribute errors
print "Unexpected error: {0}".format(e1)
for ele in chunks(fh):
fh.seek(ele[0])#startat
data=fh.read(ele[1])#endat
print data
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
add a comment |
How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.
Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.
def chunks(file,size=1024):
while 1:
startat=fh.tell()
print startat #file's object current position from the start
fh.seek(size,1) #offset from current postion -->1
data=fh.readline()
yield startat,fh.tell()-startat #doesnt store whole list in memory
if not data:
break
if os.path.isfile(fname):
try:
fh=open(fname,'rb')
except IOError as e: #file --> permission denied
print "I/O error({0}): {1}".format(e.errno, e.strerror)
except Exception as e1: #handle other exceptions such as attribute errors
print "Unexpected error: {0}".format(e1)
for ele in chunks(fh):
fh.seek(ele[0])#startat
data=fh.read(ele[1])#endat
print data
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
add a comment |
How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.
Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.
def chunks(file,size=1024):
while 1:
startat=fh.tell()
print startat #file's object current position from the start
fh.seek(size,1) #offset from current postion -->1
data=fh.readline()
yield startat,fh.tell()-startat #doesnt store whole list in memory
if not data:
break
if os.path.isfile(fname):
try:
fh=open(fname,'rb')
except IOError as e: #file --> permission denied
print "I/O error({0}): {1}".format(e.errno, e.strerror)
except Exception as e1: #handle other exceptions such as attribute errors
print "Unexpected error: {0}".format(e1)
for ele in chunks(fh):
fh.seek(ele[0])#startat
data=fh.read(ele[1])#endat
print data
How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.
Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.
def chunks(file,size=1024):
while 1:
startat=fh.tell()
print startat #file's object current position from the start
fh.seek(size,1) #offset from current postion -->1
data=fh.readline()
yield startat,fh.tell()-startat #doesnt store whole list in memory
if not data:
break
if os.path.isfile(fname):
try:
fh=open(fname,'rb')
except IOError as e: #file --> permission denied
print "I/O error({0}): {1}".format(e.errno, e.strerror)
except Exception as e1: #handle other exceptions such as attribute errors
print "Unexpected error: {0}".format(e1)
for ele in chunks(fh):
fh.seek(ele[0])#startat
data=fh.read(ele[1])#endat
print data
answered Oct 25 '17 at 0:30
Arohi Gupta
176
176
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
add a comment |
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59
add a comment |
Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.
Then I had to remove a "=n" in the middle of each line.
Then I split the lines at the new line.
b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
data_list = data_chunk.split('n') #List containing lines in chunk
#print(data_list,'n')
#time.sleep(1)
for j in range(len(data_list)): #iterate through data_list to get each item
i += 1
line_of_data = data_list[j]
print(line_of_data)
Here is the code starting just above "print data" in Arohi's code.
add a comment |
Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.
Then I had to remove a "=n" in the middle of each line.
Then I split the lines at the new line.
b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
data_list = data_chunk.split('n') #List containing lines in chunk
#print(data_list,'n')
#time.sleep(1)
for j in range(len(data_list)): #iterate through data_list to get each item
i += 1
line_of_data = data_list[j]
print(line_of_data)
Here is the code starting just above "print data" in Arohi's code.
add a comment |
Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.
Then I had to remove a "=n" in the middle of each line.
Then I split the lines at the new line.
b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
data_list = data_chunk.split('n') #List containing lines in chunk
#print(data_list,'n')
#time.sleep(1)
for j in range(len(data_list)): #iterate through data_list to get each item
i += 1
line_of_data = data_list[j]
print(line_of_data)
Here is the code starting just above "print data" in Arohi's code.
Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.
Then I had to remove a "=n" in the middle of each line.
Then I split the lines at the new line.
b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
data_list = data_chunk.split('n') #List containing lines in chunk
#print(data_list,'n')
#time.sleep(1)
for j in range(len(data_list)): #iterate through data_list to get each item
i += 1
line_of_data = data_list[j]
print(line_of_data)
Here is the code starting just above "print data" in Arohi's code.
edited Jan 18 '18 at 16:30
WhatsThePoint
2,15842036
2,15842036
answered Jan 18 '18 at 15:28
John Haynes
1
1
add a comment |
add a comment |
I demonstrated a parallel byte level random access approach here in this other question:
Getting number of lines in a text file without readlines
Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.
add a comment |
I demonstrated a parallel byte level random access approach here in this other question:
Getting number of lines in a text file without readlines
Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.
add a comment |
I demonstrated a parallel byte level random access approach here in this other question:
Getting number of lines in a text file without readlines
Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.
I demonstrated a parallel byte level random access approach here in this other question:
Getting number of lines in a text file without readlines
Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.
answered May 4 '18 at 14:17
Geoffrey Anderson
549514
549514
add a comment |
add a comment |
Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files
https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d
download the file data_loading_utils.py and import it into your code
usage
import data_loading_utils.py.py
file_name = 'file_name.ext'
CHUNK_SIZE = 1000000
def process_lines(data, eof, file_name):
# check if end of file reached
if not eof:
# process data, data is one single line of the file
else:
# end of file reached
data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)
process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.
You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.
add a comment |
Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files
https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d
download the file data_loading_utils.py and import it into your code
usage
import data_loading_utils.py.py
file_name = 'file_name.ext'
CHUNK_SIZE = 1000000
def process_lines(data, eof, file_name):
# check if end of file reached
if not eof:
# process data, data is one single line of the file
else:
# end of file reached
data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)
process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.
You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.
add a comment |
Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files
https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d
download the file data_loading_utils.py and import it into your code
usage
import data_loading_utils.py.py
file_name = 'file_name.ext'
CHUNK_SIZE = 1000000
def process_lines(data, eof, file_name):
# check if end of file reached
if not eof:
# process data, data is one single line of the file
else:
# end of file reached
data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)
process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.
You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.
Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files
https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d
download the file data_loading_utils.py and import it into your code
usage
import data_loading_utils.py.py
file_name = 'file_name.ext'
CHUNK_SIZE = 1000000
def process_lines(data, eof, file_name):
# check if end of file reached
if not eof:
# process data, data is one single line of the file
else:
# end of file reached
data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)
process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.
You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.
answered Jul 25 '18 at 2:32
Iyvin Jose
31419
31419
add a comment |
add a comment |
Please try this:
with open('filename','r',buffering=100000) as f:
for line in f:
print line
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
add a comment |
Please try this:
with open('filename','r',buffering=100000) as f:
for line in f:
print line
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
add a comment |
Please try this:
with open('filename','r',buffering=100000) as f:
for line in f:
print line
Please try this:
with open('filename','r',buffering=100000) as f:
for line in f:
print line
edited Jan 25 '18 at 15:14
Daniel Trugman
4,7671032
4,7671032
answered Jan 25 '18 at 14:48
jyoti das
11
11
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
add a comment |
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
please explain?
– Nikhil VJ
Mar 31 '18 at 4:00
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26
add a comment |
f=open('filename','r').read()
f1=f.split('n')
for i in range (len(f1)):
do_something_with(f1[i])
hope this helps.
3
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
add a comment |
f=open('filename','r').read()
f1=f.split('n')
for i in range (len(f1)):
do_something_with(f1[i])
hope this helps.
3
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
add a comment |
f=open('filename','r').read()
f1=f.split('n')
for i in range (len(f1)):
do_something_with(f1[i])
hope this helps.
f=open('filename','r').read()
f1=f.split('n')
for i in range (len(f1)):
do_something_with(f1[i])
hope this helps.
edited Apr 12 '16 at 7:54
answered Apr 12 '16 at 7:47
Sainik Kr Mahata
152
152
3
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
add a comment |
3
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
3
3
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6475328%2fread-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11
this should be a separate question
– cmcginty
Jun 25 '11 at 2:35
1
duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38