Read large text files in Python, line by line without loading it in to memory

171

I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() because it will create a very large list in the memory.

How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?

f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?



f.next()

Plus, what can I do to read this in reverse order, just as the Linux tail command?

I found:

http://code.google.com/p/pytailer/

and

"python head, tail and backward read by lines of a text file"

Both worked very well!

edited May 23 '17 at 10:31

Community♦

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11

this should be a separate question
– cmcginty
Jun 25 '11 at 2:35

1

duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38

add a comment |

171

How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?

f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?



f.next()

Plus, what can I do to read this in reverse order, just as the Linux tail command?

I found:

http://code.google.com/p/pytailer/

and

"python head, tail and backward read by lines of a text file"

Both worked very well!

edited May 23 '17 at 10:31

Community♦

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11

this should be a separate question
– cmcginty
Jun 25 '11 at 2:35

1

duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38

add a comment |

171

How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?

f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?



f.next()

Plus, what can I do to read this in reverse order, just as the Linux tail command?

I found:

http://code.google.com/p/pytailer/

and

"python head, tail and backward read by lines of a text file"

Both worked very well!

edited May 23 '17 at 10:31

Community♦

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?

f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?



f.next()

Plus, what can I do to read this in reverse order, just as the Linux tail command?

I found:

http://code.google.com/p/pytailer/

and

"python head, tail and backward read by lines of a text file"

Both worked very well!

python

edited May 23 '17 at 10:31

Community♦

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

edited May 23 '17 at 10:31

Community♦

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

edited May 23 '17 at 10:31

Community♦

edited May 23 '17 at 10:31

Community♦

edited May 23 '17 at 10:31

Community♦

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

asked Jun 25 '11 at 2:04

Bruno Rocha - rochacbruno

3,07331728

And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11

this should be a separate question
– cmcginty
Jun 25 '11 at 2:35

1

duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38

add a comment |

And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11

this should be a separate question
– cmcginty
Jun 25 '11 at 2:35

1

duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38

And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11

this should be a separate question
– cmcginty
Jun 25 '11 at 2:35

duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38

add a comment |

13 Answers
13

active

oldest

votes

234

I provided this answer because Keith's, while succinct, doesn't close the file explicitly

with open("log.txt") as infile:

    for line in infile:

        do_something_with(line)

edited Jun 25 '11 at 3:28

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

15

the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31

37

@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33

@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36

3

Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09

1

@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50

|
show 5 more comments

All you need to do is use the file object as an iterator.

for line in open("log.txt"):

    do_something_with(line)

Even better is using context manager in recent Python versions.

with open("log.txt") as fileobject:

    for line in fileobject:

        do_something_with(line)

This will automatically close the file as well.

edited Jun 25 '11 at 6:45

answered Jun 25 '11 at 2:07

Keith

30.5k74361

That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10

1

Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43

add a comment |

An old school approach:

fh = open(file_name, 'rt')

line = fh.readline()

while line:

    # do stuff with line

    line = fh.readline()

fh.close()

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

2

minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44

12

@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40

add a comment |

You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html

From the docs:

import fileinput

for line in fileinput.input("filename"):

    process(line)

This will avoid copying the whole file into memory at once.

answered Jun 25 '11 at 2:06

Mikola

7,5522638

Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50

add a comment |

I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.

#!/usr/bin/env python3.6



import sys



with open(sys.argv[2], 'w') as outfile:

    with open(sys.argv[1]) as infile:

        for line in infile:

            outfile.write(line)

edited Aug 11 '17 at 3:43

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13

add a comment |

Here's what you do if you dont have newlines in the file:

with open('large_text.txt') as f:

  while True:

    c = f.read(1024)

    if not c:

      break

    print(c)

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

add a comment |

The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

import dask.dataframe as dd



df = dd.read_csv('filename.csv')

df.head(10)  # return first 10 rows

df.tail(10)  # return last 10 rows



# iterate rows

for idx, row in df.iterrows():

    ...



# group by my_field and return mean

df.groupby(df.my_field).value.mean().compute()



# slice by column

df[df.my_field=='XYZ'].compute()

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

add a comment |

How about this?
Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

def chunks(file,size=1024):

    while 1:



        startat=fh.tell()

        print startat #file's object current position from the start

        fh.seek(size,1) #offset from current postion -->1

        data=fh.readline()

        yield startat,fh.tell()-startat #doesnt store whole list in memory

        if not data:

            break

if os.path.isfile(fname):

    try:

        fh=open(fname,'rb') 

    except IOError as e: #file --> permission denied

        print "I/O error({0}): {1}".format(e.errno, e.strerror)

    except Exception as e1: #handle other exceptions such as attribute errors

        print "Unexpected error: {0}".format(e1)

    for ele in chunks(fh):

        fh.seek(ele[0])#startat

        data=fh.read(ele[1])#endat

        print data

answered Oct 25 '17 at 0:30

Arohi Gupta

176

This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59

add a comment |

Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.

Then I had to remove a "=n" in the middle of each line.

Then I split the lines at the new line.

b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format

        a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format

        data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed

        data_list = data_chunk.split('n')  #List containing lines in chunk

        #print(data_list,'n')

        #time.sleep(1)

        for j in range(len(data_list)): #iterate through data_list to get each item 

            i += 1

            line_of_data = data_list[j]

            print(line_of_data)

Here is the code starting just above "print data" in Arohi's code.

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

answered Jan 18 '18 at 15:28

John Haynes

add a comment |

I demonstrated a parallel byte level random access approach here in this other question:

Getting number of lines in a text file without readlines

Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

add a comment |

Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files

https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d

download the file data_loading_utils.py and import it into your code

usage

import data_loading_utils.py.py

file_name = 'file_name.ext'

CHUNK_SIZE = 1000000





def process_lines(data, eof, file_name):



    # check if end of file reached

    if not eof:

         # process data, data is one single line of the file



    else:

         # end of file reached



data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)

process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.

You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

add a comment |

-1

Please try this:

with open('filename','r',buffering=100000) as f:

    for line in f:

        print line

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

answered Jan 25 '18 at 14:48

jyoti das

please explain?
– Nikhil VJ
Mar 31 '18 at 4:00

From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26

add a comment |

-9

f=open('filename','r').read()

f1=f.split('n')

for i in range (len(f1)):

    do_something_with(f1[i])

hope this helps.

edited Apr 12 '16 at 7:54

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

3

Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6475328%2fread-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

13 Answers
13

active

oldest

votes

13 Answers
13

active

oldest

votes

234

I provided this answer because Keith's, while succinct, doesn't close the file explicitly

with open("log.txt") as infile:

    for line in infile:

        do_something_with(line)

edited Jun 25 '11 at 3:28

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

15

the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31

37

@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33

@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36

3

Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09

1

@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50

|
show 5 more comments

234

I provided this answer because Keith's, while succinct, doesn't close the file explicitly

with open("log.txt") as infile:

    for line in infile:

        do_something_with(line)

edited Jun 25 '11 at 3:28

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

15

the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31

37

@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33

@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36

3

Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09

1

@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50

|
show 5 more comments

234

I provided this answer because Keith's, while succinct, doesn't close the file explicitly

with open("log.txt") as infile:

    for line in infile:

        do_something_with(line)

edited Jun 25 '11 at 3:28

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

I provided this answer because Keith's, while succinct, doesn't close the file explicitly

with open("log.txt") as infile:

    for line in infile:

        do_something_with(line)

edited Jun 25 '11 at 3:28

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

edited Jun 25 '11 at 3:28

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

answered Jun 25 '11 at 2:26

John La Rooy

208k39273428

15

the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31

37

@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33

@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36

3

Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09

1

@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50

|
show 5 more comments

15

the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31

37

@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33

@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36

3

Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09

1

@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50

the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:31

@rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
– John La Rooy
Jun 25 '11 at 2:33

@rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
– John La Rooy
Jun 25 '11 at 2:36

Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
– Bruno Rocha - rochacbruno
Jun 25 '11 at 3:09

@bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
– John La Rooy
Jan 9 '18 at 21:50

|
show 5 more comments

All you need to do is use the file object as an iterator.

for line in open("log.txt"):

    do_something_with(line)

Even better is using context manager in recent Python versions.

with open("log.txt") as fileobject:

    for line in fileobject:

        do_something_with(line)

This will automatically close the file as well.

edited Jun 25 '11 at 6:45

answered Jun 25 '11 at 2:07

Keith

30.5k74361

That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10

1

Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43

add a comment |

All you need to do is use the file object as an iterator.

for line in open("log.txt"):

    do_something_with(line)

Even better is using context manager in recent Python versions.

with open("log.txt") as fileobject:

    for line in fileobject:

        do_something_with(line)

This will automatically close the file as well.

edited Jun 25 '11 at 6:45

answered Jun 25 '11 at 2:07

Keith

30.5k74361

That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10

1

Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43

add a comment |

All you need to do is use the file object as an iterator.

for line in open("log.txt"):

    do_something_with(line)

Even better is using context manager in recent Python versions.

with open("log.txt") as fileobject:

    for line in fileobject:

        do_something_with(line)

This will automatically close the file as well.

edited Jun 25 '11 at 6:45

answered Jun 25 '11 at 2:07

Keith

30.5k74361

All you need to do is use the file object as an iterator.

for line in open("log.txt"):

    do_something_with(line)

Even better is using context manager in recent Python versions.

with open("log.txt") as fileobject:

    for line in fileobject:

        do_something_with(line)

This will automatically close the file as well.

edited Jun 25 '11 at 6:45

answered Jun 25 '11 at 2:07

Keith

30.5k74361

edited Jun 25 '11 at 6:45

answered Jun 25 '11 at 2:07

Keith

30.5k74361

answered Jun 25 '11 at 2:07

Keith

30.5k74361

answered Jun 25 '11 at 2:07

Keith

30.5k74361

That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10

1

Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43

add a comment |

That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10

1

Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43

That is not loading whole file in to the memory?
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:10

Nope. It reads it line by line.
– Keith
Jun 25 '11 at 6:43

add a comment |

An old school approach:

fh = open(file_name, 'rt')

line = fh.readline()

while line:

    # do stuff with line

    line = fh.readline()

fh.close()

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

2

minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44

12

@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40

add a comment |

An old school approach:

fh = open(file_name, 'rt')

line = fh.readline()

while line:

    # do stuff with line

    line = fh.readline()

fh.close()

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

2

minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44

12

@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40

add a comment |

An old school approach:

fh = open(file_name, 'rt')

line = fh.readline()

while line:

    # do stuff with line

    line = fh.readline()

fh.close()

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

An old school approach:

fh = open(file_name, 'rt')

line = fh.readline()

while line:

    # do stuff with line

    line = fh.readline()

fh.close()

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

answered Jun 25 '11 at 2:31

PTBNL

4,12742431

2

minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44

12

@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40

add a comment |

2

minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44

12

@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40

minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
– prokher
Jan 15 '15 at 14:44

@prokher: Yeah, but I did call this "old school".
– PTBNL
Jan 16 '15 at 13:40

add a comment |

You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html

From the docs:

import fileinput

for line in fileinput.input("filename"):

    process(line)

This will avoid copying the whole file into memory at once.

answered Jun 25 '11 at 2:06

Mikola

7,5522638

Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50

add a comment |

You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html

From the docs:

import fileinput

for line in fileinput.input("filename"):

    process(line)

This will avoid copying the whole file into memory at once.

answered Jun 25 '11 at 2:06

Mikola

7,5522638

Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50

add a comment |

You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html

From the docs:

import fileinput

for line in fileinput.input("filename"):

    process(line)

This will avoid copying the whole file into memory at once.

answered Jun 25 '11 at 2:06

Mikola

7,5522638

You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html

From the docs:

import fileinput

for line in fileinput.input("filename"):

    process(line)

This will avoid copying the whole file into memory at once.

answered Jun 25 '11 at 2:06

Mikola

7,5522638

answered Jun 25 '11 at 2:06

Mikola

7,5522638

answered Jun 25 '11 at 2:06

Mikola

7,5522638

answered Jun 25 '11 at 2:06

Mikola

7,5522638

Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50

add a comment |

Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50

Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
– martineau
Jul 24 '12 at 3:50

add a comment |

I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.

#!/usr/bin/env python3.6



import sys



with open(sys.argv[2], 'w') as outfile:

    with open(sys.argv[1]) as infile:

        for line in infile:

            outfile.write(line)

edited Aug 11 '17 at 3:43

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13

add a comment |

I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.

#!/usr/bin/env python3.6



import sys



with open(sys.argv[2], 'w') as outfile:

    with open(sys.argv[1]) as infile:

        for line in infile:

            outfile.write(line)

edited Aug 11 '17 at 3:43

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13

add a comment |

I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.

#!/usr/bin/env python3.6



import sys



with open(sys.argv[2], 'w') as outfile:

    with open(sys.argv[1]) as infile:

        for line in infile:

            outfile.write(line)

edited Aug 11 '17 at 3:43

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.

#!/usr/bin/env python3.6



import sys



with open(sys.argv[2], 'w') as outfile:

    with open(sys.argv[1]) as infile:

        for line in infile:

            outfile.write(line)

edited Aug 11 '17 at 3:43

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

edited Aug 11 '17 at 3:43

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

answered Aug 10 '17 at 21:48

Bruno Bronosky

33.6k47780

NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13

add a comment |

NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13

NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
– Bruno Bronosky
Aug 11 '17 at 13:13

add a comment |

Here's what you do if you dont have newlines in the file:

with open('large_text.txt') as f:

  while True:

    c = f.read(1024)

    if not c:

      break

    print(c)

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

add a comment |

Here's what you do if you dont have newlines in the file:

with open('large_text.txt') as f:

  while True:

    c = f.read(1024)

    if not c:

      break

    print(c)

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

add a comment |

Here's what you do if you dont have newlines in the file:

with open('large_text.txt') as f:

  while True:

    c = f.read(1024)

    if not c:

      break

    print(c)

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

Here's what you do if you dont have newlines in the file:

with open('large_text.txt') as f:

  while True:

    c = f.read(1024)

    if not c:

      break

    print(c)

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

answered May 6 '18 at 15:20

Ariel Cabib

1,532139

add a comment |

The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

import dask.dataframe as dd



df = dd.read_csv('filename.csv')

df.head(10)  # return first 10 rows

df.tail(10)  # return last 10 rows



# iterate rows

for idx, row in df.iterrows():

    ...



# group by my_field and return mean

df.groupby(df.my_field).value.mean().compute()



# slice by column

df[df.my_field=='XYZ'].compute()

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

add a comment |

The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

import dask.dataframe as dd



df = dd.read_csv('filename.csv')

df.head(10)  # return first 10 rows

df.tail(10)  # return last 10 rows



# iterate rows

for idx, row in df.iterrows():

    ...



# group by my_field and return mean

df.groupby(df.my_field).value.mean().compute()



# slice by column

df[df.my_field=='XYZ'].compute()

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

add a comment |

The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

import dask.dataframe as dd



df = dd.read_csv('filename.csv')

df.head(10)  # return first 10 rows

df.tail(10)  # return last 10 rows



# iterate rows

for idx, row in df.iterrows():

    ...



# group by my_field and return mean

df.groupby(df.my_field).value.mean().compute()



# slice by column

df[df.my_field=='XYZ'].compute()

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.

dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.

import dask.dataframe as dd



df = dd.read_csv('filename.csv')

df.head(10)  # return first 10 rows

df.tail(10)  # return last 10 rows



# iterate rows

for idx, row in df.iterrows():

    ...



# group by my_field and return mean

df.groupby(df.my_field).value.mean().compute()



# slice by column

df[df.my_field=='XYZ'].compute()

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

answered Jan 22 '18 at 20:51

jpp

92.2k2053103

add a comment |

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

def chunks(file,size=1024):

    while 1:



        startat=fh.tell()

        print startat #file's object current position from the start

        fh.seek(size,1) #offset from current postion -->1

        data=fh.readline()

        yield startat,fh.tell()-startat #doesnt store whole list in memory

        if not data:

            break

if os.path.isfile(fname):

    try:

        fh=open(fname,'rb') 

    except IOError as e: #file --> permission denied

        print "I/O error({0}): {1}".format(e.errno, e.strerror)

    except Exception as e1: #handle other exceptions such as attribute errors

        print "Unexpected error: {0}".format(e1)

    for ele in chunks(fh):

        fh.seek(ele[0])#startat

        data=fh.read(ele[1])#endat

        print data

answered Oct 25 '17 at 0:30

Arohi Gupta

176

This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59

add a comment |

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

def chunks(file,size=1024):

    while 1:



        startat=fh.tell()

        print startat #file's object current position from the start

        fh.seek(size,1) #offset from current postion -->1

        data=fh.readline()

        yield startat,fh.tell()-startat #doesnt store whole list in memory

        if not data:

            break

if os.path.isfile(fname):

    try:

        fh=open(fname,'rb') 

    except IOError as e: #file --> permission denied

        print "I/O error({0}): {1}".format(e.errno, e.strerror)

    except Exception as e1: #handle other exceptions such as attribute errors

        print "Unexpected error: {0}".format(e1)

    for ele in chunks(fh):

        fh.seek(ele[0])#startat

        data=fh.read(ele[1])#endat

        print data

answered Oct 25 '17 at 0:30

Arohi Gupta

176

This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59

add a comment |

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

def chunks(file,size=1024):

    while 1:



        startat=fh.tell()

        print startat #file's object current position from the start

        fh.seek(size,1) #offset from current postion -->1

        data=fh.readline()

        yield startat,fh.tell()-startat #doesnt store whole list in memory

        if not data:

            break

if os.path.isfile(fname):

    try:

        fh=open(fname,'rb') 

    except IOError as e: #file --> permission denied

        print "I/O error({0}): {1}".format(e.errno, e.strerror)

    except Exception as e1: #handle other exceptions such as attribute errors

        print "Unexpected error: {0}".format(e1)

    for ele in chunks(fh):

        fh.seek(ele[0])#startat

        data=fh.read(ele[1])#endat

        print data

answered Oct 25 '17 at 0:30

Arohi Gupta

176

Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.

def chunks(file,size=1024):

    while 1:



        startat=fh.tell()

        print startat #file's object current position from the start

        fh.seek(size,1) #offset from current postion -->1

        data=fh.readline()

        yield startat,fh.tell()-startat #doesnt store whole list in memory

        if not data:

            break

if os.path.isfile(fname):

    try:

        fh=open(fname,'rb') 

    except IOError as e: #file --> permission denied

        print "I/O error({0}): {1}".format(e.errno, e.strerror)

    except Exception as e1: #handle other exceptions such as attribute errors

        print "Unexpected error: {0}".format(e1)

    for ele in chunks(fh):

        fh.seek(ele[0])#startat

        data=fh.read(ele[1])#endat

        print data

answered Oct 25 '17 at 0:30

Arohi Gupta

176

answered Oct 25 '17 at 0:30

Arohi Gupta

176

answered Oct 25 '17 at 0:30

Arohi Gupta

176

answered Oct 25 '17 at 0:30

Arohi Gupta

176

This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59

add a comment |

This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59

This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
– Nikhil VJ
Mar 31 '18 at 3:59

add a comment |

Then I had to remove a "=n" in the middle of each line.

Then I split the lines at the new line.

b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format

        a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format

        data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed

        data_list = data_chunk.split('n')  #List containing lines in chunk

        #print(data_list,'n')

        #time.sleep(1)

        for j in range(len(data_list)): #iterate through data_list to get each item 

            i += 1

            line_of_data = data_list[j]

            print(line_of_data)

Here is the code starting just above "print data" in Arohi's code.

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

answered Jan 18 '18 at 15:28

John Haynes

add a comment |

Then I had to remove a "=n" in the middle of each line.

Then I split the lines at the new line.

b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format

        a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format

        data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed

        data_list = data_chunk.split('n')  #List containing lines in chunk

        #print(data_list,'n')

        #time.sleep(1)

        for j in range(len(data_list)): #iterate through data_list to get each item 

            i += 1

            line_of_data = data_list[j]

            print(line_of_data)

Here is the code starting just above "print data" in Arohi's code.

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

answered Jan 18 '18 at 15:28

John Haynes

add a comment |

Then I had to remove a "=n" in the middle of each line.

Then I split the lines at the new line.

b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format

        a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format

        data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed

        data_list = data_chunk.split('n')  #List containing lines in chunk

        #print(data_list,'n')

        #time.sleep(1)

        for j in range(len(data_list)): #iterate through data_list to get each item 

            i += 1

            line_of_data = data_list[j]

            print(line_of_data)

Here is the code starting just above "print data" in Arohi's code.

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

answered Jan 18 '18 at 15:28

John Haynes

Then I had to remove a "=n" in the middle of each line.

Then I split the lines at the new line.

b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format

        a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format

        data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed

        data_list = data_chunk.split('n')  #List containing lines in chunk

        #print(data_list,'n')

        #time.sleep(1)

        for j in range(len(data_list)): #iterate through data_list to get each item 

            i += 1

            line_of_data = data_list[j]

            print(line_of_data)

Here is the code starting just above "print data" in Arohi's code.

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

answered Jan 18 '18 at 15:28

John Haynes

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

edited Jan 18 '18 at 16:30

WhatsThePoint

2,15842036

answered Jan 18 '18 at 15:28

John Haynes

answered Jan 18 '18 at 15:28

John Haynes

answered Jan 18 '18 at 15:28

John Haynes

add a comment |

I demonstrated a parallel byte level random access approach here in this other question:

Getting number of lines in a text file without readlines

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

add a comment |

I demonstrated a parallel byte level random access approach here in this other question:

Getting number of lines in a text file without readlines

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

add a comment |

I demonstrated a parallel byte level random access approach here in this other question:

Getting number of lines in a text file without readlines

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

I demonstrated a parallel byte level random access approach here in this other question:

Getting number of lines in a text file without readlines

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

answered May 4 '18 at 14:17

Geoffrey Anderson

549514

add a comment |

Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files

https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d

download the file data_loading_utils.py and import it into your code

usage

import data_loading_utils.py.py

file_name = 'file_name.ext'

CHUNK_SIZE = 1000000





def process_lines(data, eof, file_name):



    # check if end of file reached

    if not eof:

         # process data, data is one single line of the file



    else:

         # end of file reached



data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)

process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.

You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

add a comment |

Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files

https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d

download the file data_loading_utils.py and import it into your code

usage

import data_loading_utils.py.py

file_name = 'file_name.ext'

CHUNK_SIZE = 1000000





def process_lines(data, eof, file_name):



    # check if end of file reached

    if not eof:

         # process data, data is one single line of the file



    else:

         # end of file reached



data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)

process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.

You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

add a comment |

Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files

https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d

download the file data_loading_utils.py and import it into your code

usage

import data_loading_utils.py.py

file_name = 'file_name.ext'

CHUNK_SIZE = 1000000





def process_lines(data, eof, file_name):



    # check if end of file reached

    if not eof:

         # process data, data is one single line of the file



    else:

         # end of file reached



data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)

process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.

You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

Heres the code for loading text files of any size without causing memory issues.
It support gigabytes sized files

https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d

download the file data_loading_utils.py and import it into your code

usage

import data_loading_utils.py.py

file_name = 'file_name.ext'

CHUNK_SIZE = 1000000





def process_lines(data, eof, file_name):



    # check if end of file reached

    if not eof:

         # process data, data is one single line of the file



    else:

         # end of file reached



data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)

process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.

You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

answered Jul 25 '18 at 2:32

Iyvin Jose

31419

add a comment |

-1

Please try this:

with open('filename','r',buffering=100000) as f:

    for line in f:

        print line

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

answered Jan 25 '18 at 14:48

jyoti das

please explain?
– Nikhil VJ
Mar 31 '18 at 4:00

From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26

add a comment |

-1

Please try this:

with open('filename','r',buffering=100000) as f:

    for line in f:

        print line

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

answered Jan 25 '18 at 14:48

jyoti das

please explain?
– Nikhil VJ
Mar 31 '18 at 4:00

From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26

add a comment |

-1

Please try this:

with open('filename','r',buffering=100000) as f:

    for line in f:

        print line

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

answered Jan 25 '18 at 14:48

jyoti das

Please try this:

with open('filename','r',buffering=100000) as f:

    for line in f:

        print line

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

answered Jan 25 '18 at 14:48

jyoti das

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

edited Jan 25 '18 at 15:14

Daniel Trugman

4,7671032

answered Jan 25 '18 at 14:48

jyoti das

answered Jan 25 '18 at 14:48

jyoti das

answered Jan 25 '18 at 14:48

jyoti das

please explain?
– Nikhil VJ
Mar 31 '18 at 4:00

From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26

add a comment |

please explain?
– Nikhil VJ
Mar 31 '18 at 4:00

From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26

please explain?
– Nikhil VJ
Mar 31 '18 at 4:00

From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
– jyoti das
Apr 19 '18 at 5:26

add a comment |

-9

f=open('filename','r').read()

f1=f.split('n')

for i in range (len(f1)):

    do_something_with(f1[i])

hope this helps.

edited Apr 12 '16 at 7:54

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

3

Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43

add a comment |

-9

f=open('filename','r').read()

f1=f.split('n')

for i in range (len(f1)):

    do_something_with(f1[i])

hope this helps.

edited Apr 12 '16 at 7:54

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

3

Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43

add a comment |

-9

f=open('filename','r').read()

f1=f.split('n')

for i in range (len(f1)):

    do_something_with(f1[i])

hope this helps.

edited Apr 12 '16 at 7:54

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

f=open('filename','r').read()

f1=f.split('n')

for i in range (len(f1)):

    do_something_with(f1[i])

hope this helps.

edited Apr 12 '16 at 7:54

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

edited Apr 12 '16 at 7:54

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

answered Apr 12 '16 at 7:47

Sainik Kr Mahata

152

3

Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43

add a comment |

3

Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43

Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
– Fermi paradox
Apr 12 '16 at 8:43

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu