Read large text files in Python, line by line without loading it in to memory












171














I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() because it will create a very large list in the memory.



How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?



f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?

f.next()


Plus, what can I do to read this in reverse order, just as the Linux tail command?



I found:



http://code.google.com/p/pytailer/



and



"python head, tail and backward read by lines of a text file"



Both worked very well!










share|improve this question
























  • And What can I do to read this from tail? line by line, starting in the last line.
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 2:11










  • this should be a separate question
    – cmcginty
    Jun 25 '11 at 2:35








  • 1




    duplicate stackoverflow.com/questions/5896079/…
    – cmcginty
    Jun 25 '11 at 2:38
















171














I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() because it will create a very large list in the memory.



How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?



f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?

f.next()


Plus, what can I do to read this in reverse order, just as the Linux tail command?



I found:



http://code.google.com/p/pytailer/



and



"python head, tail and backward read by lines of a text file"



Both worked very well!










share|improve this question
























  • And What can I do to read this from tail? line by line, starting in the last line.
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 2:11










  • this should be a separate question
    – cmcginty
    Jun 25 '11 at 2:35








  • 1




    duplicate stackoverflow.com/questions/5896079/…
    – cmcginty
    Jun 25 '11 at 2:38














171












171








171


58





I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() because it will create a very large list in the memory.



How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?



f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?

f.next()


Plus, what can I do to read this in reverse order, just as the Linux tail command?



I found:



http://code.google.com/p/pytailer/



and



"python head, tail and backward read by lines of a text file"



Both worked very well!










share|improve this question















I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines() because it will create a very large list in the memory.



How will the code below work for this case? Is xreadlines itself reading one by one into memory? Is the generator expression needed?



f = (line for line in open("log.txt").xreadlines())  # how much is loaded in memory?

f.next()


Plus, what can I do to read this in reverse order, just as the Linux tail command?



I found:



http://code.google.com/p/pytailer/



and



"python head, tail and backward read by lines of a text file"



Both worked very well!







python






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 23 '17 at 10:31









Community

11




11










asked Jun 25 '11 at 2:04









Bruno Rocha - rochacbruno

3,07331728




3,07331728












  • And What can I do to read this from tail? line by line, starting in the last line.
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 2:11










  • this should be a separate question
    – cmcginty
    Jun 25 '11 at 2:35








  • 1




    duplicate stackoverflow.com/questions/5896079/…
    – cmcginty
    Jun 25 '11 at 2:38


















  • And What can I do to read this from tail? line by line, starting in the last line.
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 2:11










  • this should be a separate question
    – cmcginty
    Jun 25 '11 at 2:35








  • 1




    duplicate stackoverflow.com/questions/5896079/…
    – cmcginty
    Jun 25 '11 at 2:38
















And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11




And What can I do to read this from tail? line by line, starting in the last line.
– Bruno Rocha - rochacbruno
Jun 25 '11 at 2:11












this should be a separate question
– cmcginty
Jun 25 '11 at 2:35






this should be a separate question
– cmcginty
Jun 25 '11 at 2:35






1




1




duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38




duplicate stackoverflow.com/questions/5896079/…
– cmcginty
Jun 25 '11 at 2:38












13 Answers
13






active

oldest

votes


















234














I provided this answer because Keith's, while succinct, doesn't close the file explicitly



with open("log.txt") as infile:
for line in infile:
do_something_with(line)





share|improve this answer



















  • 15




    the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 2:31






  • 37




    @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
    – John La Rooy
    Jun 25 '11 at 2:33










  • @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
    – John La Rooy
    Jun 25 '11 at 2:36






  • 3




    Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 3:09






  • 1




    @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
    – John La Rooy
    Jan 9 '18 at 21:50



















43














All you need to do is use the file object as an iterator.



for line in open("log.txt"):
do_something_with(line)


Even better is using context manager in recent Python versions.



with open("log.txt") as fileobject:
for line in fileobject:
do_something_with(line)


This will automatically close the file as well.






share|improve this answer























  • That is not loading whole file in to the memory?
    – Bruno Rocha - rochacbruno
    Jun 25 '11 at 2:10






  • 1




    Nope. It reads it line by line.
    – Keith
    Jun 25 '11 at 6:43



















14














An old school approach:



fh = open(file_name, 'rt')
line = fh.readline()
while line:
# do stuff with line
line = fh.readline()
fh.close()





share|improve this answer

















  • 2




    minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
    – prokher
    Jan 15 '15 at 14:44






  • 12




    @prokher: Yeah, but I did call this "old school".
    – PTBNL
    Jan 16 '15 at 13:40



















12














You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html



From the docs:



import fileinput
for line in fileinput.input("filename"):
process(line)


This will avoid copying the whole file into memory at once.






share|improve this answer





















  • Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
    – martineau
    Jul 24 '12 at 3:50



















3














I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.



#!/usr/bin/env python3.6

import sys

with open(sys.argv[2], 'w') as outfile:
with open(sys.argv[1]) as infile:
for line in infile:
outfile.write(line)





share|improve this answer























  • NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
    – Bruno Bronosky
    Aug 11 '17 at 13:13



















3














Here's what you do if you dont have newlines in the file:



with open('large_text.txt') as f:
while True:
c = f.read(1024)
if not c:
break
print(c)





share|improve this answer





























    2














    The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.



    dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.



    import dask.dataframe as dd

    df = dd.read_csv('filename.csv')
    df.head(10) # return first 10 rows
    df.tail(10) # return last 10 rows

    # iterate rows
    for idx, row in df.iterrows():
    ...

    # group by my_field and return mean
    df.groupby(df.my_field).value.mean().compute()

    # slice by column
    df[df.my_field=='XYZ'].compute()





    share|improve this answer





























      0














      How about this?
      Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.



      Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.



      def chunks(file,size=1024):
      while 1:

      startat=fh.tell()
      print startat #file's object current position from the start
      fh.seek(size,1) #offset from current postion -->1
      data=fh.readline()
      yield startat,fh.tell()-startat #doesnt store whole list in memory
      if not data:
      break
      if os.path.isfile(fname):
      try:
      fh=open(fname,'rb')
      except IOError as e: #file --> permission denied
      print "I/O error({0}): {1}".format(e.errno, e.strerror)
      except Exception as e1: #handle other exceptions such as attribute errors
      print "Unexpected error: {0}".format(e1)
      for ele in chunks(fh):
      fh.seek(ele[0])#startat
      data=fh.read(ele[1])#endat
      print data





      share|improve this answer





















      • This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
        – Nikhil VJ
        Mar 31 '18 at 3:59



















      0














      Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.



      Then I had to remove a "=n" in the middle of each line.



      Then I split the lines at the new line.



      b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
      a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
      data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
      data_list = data_chunk.split('n') #List containing lines in chunk
      #print(data_list,'n')
      #time.sleep(1)
      for j in range(len(data_list)): #iterate through data_list to get each item
      i += 1
      line_of_data = data_list[j]
      print(line_of_data)


      Here is the code starting just above "print data" in Arohi's code.






      share|improve this answer































        0














        I demonstrated a parallel byte level random access approach here in this other question:



        Getting number of lines in a text file without readlines



        Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.






        share|improve this answer





























          0














          Heres the code for loading text files of any size without causing memory issues.
          It support gigabytes sized files



          https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d



          download the file data_loading_utils.py and import it into your code



          usage



          import data_loading_utils.py.py
          file_name = 'file_name.ext'
          CHUNK_SIZE = 1000000


          def process_lines(data, eof, file_name):

          # check if end of file reached
          if not eof:
          # process data, data is one single line of the file

          else:
          # end of file reached

          data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)


          process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.



          You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.






          share|improve this answer





























            -1














            Please try this:



            with open('filename','r',buffering=100000) as f:
            for line in f:
            print line





            share|improve this answer























            • please explain?
              – Nikhil VJ
              Mar 31 '18 at 4:00










            • From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
              – jyoti das
              Apr 19 '18 at 5:26





















            -9














            f=open('filename','r').read()
            f1=f.split('n')
            for i in range (len(f1)):
            do_something_with(f1[i])


            hope this helps.






            share|improve this answer



















            • 3




              Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
              – Fermi paradox
              Apr 12 '16 at 8:43











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6475328%2fread-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            13 Answers
            13






            active

            oldest

            votes








            13 Answers
            13






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            234














            I provided this answer because Keith's, while succinct, doesn't close the file explicitly



            with open("log.txt") as infile:
            for line in infile:
            do_something_with(line)





            share|improve this answer



















            • 15




              the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:31






            • 37




              @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
              – John La Rooy
              Jun 25 '11 at 2:33










            • @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
              – John La Rooy
              Jun 25 '11 at 2:36






            • 3




              Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 3:09






            • 1




              @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
              – John La Rooy
              Jan 9 '18 at 21:50
















            234














            I provided this answer because Keith's, while succinct, doesn't close the file explicitly



            with open("log.txt") as infile:
            for line in infile:
            do_something_with(line)





            share|improve this answer



















            • 15




              the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:31






            • 37




              @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
              – John La Rooy
              Jun 25 '11 at 2:33










            • @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
              – John La Rooy
              Jun 25 '11 at 2:36






            • 3




              Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 3:09






            • 1




              @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
              – John La Rooy
              Jan 9 '18 at 21:50














            234












            234








            234






            I provided this answer because Keith's, while succinct, doesn't close the file explicitly



            with open("log.txt") as infile:
            for line in infile:
            do_something_with(line)





            share|improve this answer














            I provided this answer because Keith's, while succinct, doesn't close the file explicitly



            with open("log.txt") as infile:
            for line in infile:
            do_something_with(line)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jun 25 '11 at 3:28

























            answered Jun 25 '11 at 2:26









            John La Rooy

            208k39273428




            208k39273428








            • 15




              the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:31






            • 37




              @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
              – John La Rooy
              Jun 25 '11 at 2:33










            • @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
              – John La Rooy
              Jun 25 '11 at 2:36






            • 3




              Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 3:09






            • 1




              @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
              – John La Rooy
              Jan 9 '18 at 21:50














            • 15




              the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:31






            • 37




              @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
              – John La Rooy
              Jun 25 '11 at 2:33










            • @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
              – John La Rooy
              Jun 25 '11 at 2:36






            • 3




              Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 3:09






            • 1




              @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
              – John La Rooy
              Jan 9 '18 at 21:50








            15




            15




            the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
            – Bruno Rocha - rochacbruno
            Jun 25 '11 at 2:31




            the question still is, "for line in infile" will load my 5GB of lines in to the memory? and, How can I read from tail?
            – Bruno Rocha - rochacbruno
            Jun 25 '11 at 2:31




            37




            37




            @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
            – John La Rooy
            Jun 25 '11 at 2:33




            @rochacbruno, it only reads one line at a time. When the next line is read, the previous one will be garbage collected unless you have stored a reference to it somewhere else
            – John La Rooy
            Jun 25 '11 at 2:33












            @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
            – John La Rooy
            Jun 25 '11 at 2:36




            @rochacbruno, Reading the lines in reverse order is not as easy to do efficiently unfortunately. Generally you would want to read from the end of the file in sensible sized chunks (kilobytes to megabytes say) and split on newline characters ( or whatever the line ending char is on your platform)
            – John La Rooy
            Jun 25 '11 at 2:36




            3




            3




            Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
            – Bruno Rocha - rochacbruno
            Jun 25 '11 at 3:09




            Thanks! I found the tail solution stackoverflow.com/questions/5896079/…
            – Bruno Rocha - rochacbruno
            Jun 25 '11 at 3:09




            1




            1




            @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
            – John La Rooy
            Jan 9 '18 at 21:50




            @bawejakunal, Do you mean if a line is too long to load into memory at once? That is unusual for a text file. Instead of using for loop which iterates over the lines, you can use chunk = infile.read(chunksize) to read limited size chunks regardless of their content. You'll have to search inside the chunks for newlines yourself.
            – John La Rooy
            Jan 9 '18 at 21:50













            43














            All you need to do is use the file object as an iterator.



            for line in open("log.txt"):
            do_something_with(line)


            Even better is using context manager in recent Python versions.



            with open("log.txt") as fileobject:
            for line in fileobject:
            do_something_with(line)


            This will automatically close the file as well.






            share|improve this answer























            • That is not loading whole file in to the memory?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:10






            • 1




              Nope. It reads it line by line.
              – Keith
              Jun 25 '11 at 6:43
















            43














            All you need to do is use the file object as an iterator.



            for line in open("log.txt"):
            do_something_with(line)


            Even better is using context manager in recent Python versions.



            with open("log.txt") as fileobject:
            for line in fileobject:
            do_something_with(line)


            This will automatically close the file as well.






            share|improve this answer























            • That is not loading whole file in to the memory?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:10






            • 1




              Nope. It reads it line by line.
              – Keith
              Jun 25 '11 at 6:43














            43












            43








            43






            All you need to do is use the file object as an iterator.



            for line in open("log.txt"):
            do_something_with(line)


            Even better is using context manager in recent Python versions.



            with open("log.txt") as fileobject:
            for line in fileobject:
            do_something_with(line)


            This will automatically close the file as well.






            share|improve this answer














            All you need to do is use the file object as an iterator.



            for line in open("log.txt"):
            do_something_with(line)


            Even better is using context manager in recent Python versions.



            with open("log.txt") as fileobject:
            for line in fileobject:
            do_something_with(line)


            This will automatically close the file as well.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jun 25 '11 at 6:45

























            answered Jun 25 '11 at 2:07









            Keith

            30.5k74361




            30.5k74361












            • That is not loading whole file in to the memory?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:10






            • 1




              Nope. It reads it line by line.
              – Keith
              Jun 25 '11 at 6:43


















            • That is not loading whole file in to the memory?
              – Bruno Rocha - rochacbruno
              Jun 25 '11 at 2:10






            • 1




              Nope. It reads it line by line.
              – Keith
              Jun 25 '11 at 6:43
















            That is not loading whole file in to the memory?
            – Bruno Rocha - rochacbruno
            Jun 25 '11 at 2:10




            That is not loading whole file in to the memory?
            – Bruno Rocha - rochacbruno
            Jun 25 '11 at 2:10




            1




            1




            Nope. It reads it line by line.
            – Keith
            Jun 25 '11 at 6:43




            Nope. It reads it line by line.
            – Keith
            Jun 25 '11 at 6:43











            14














            An old school approach:



            fh = open(file_name, 'rt')
            line = fh.readline()
            while line:
            # do stuff with line
            line = fh.readline()
            fh.close()





            share|improve this answer

















            • 2




              minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
              – prokher
              Jan 15 '15 at 14:44






            • 12




              @prokher: Yeah, but I did call this "old school".
              – PTBNL
              Jan 16 '15 at 13:40
















            14














            An old school approach:



            fh = open(file_name, 'rt')
            line = fh.readline()
            while line:
            # do stuff with line
            line = fh.readline()
            fh.close()





            share|improve this answer

















            • 2




              minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
              – prokher
              Jan 15 '15 at 14:44






            • 12




              @prokher: Yeah, but I did call this "old school".
              – PTBNL
              Jan 16 '15 at 13:40














            14












            14








            14






            An old school approach:



            fh = open(file_name, 'rt')
            line = fh.readline()
            while line:
            # do stuff with line
            line = fh.readline()
            fh.close()





            share|improve this answer












            An old school approach:



            fh = open(file_name, 'rt')
            line = fh.readline()
            while line:
            # do stuff with line
            line = fh.readline()
            fh.close()






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jun 25 '11 at 2:31









            PTBNL

            4,12742431




            4,12742431








            • 2




              minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
              – prokher
              Jan 15 '15 at 14:44






            • 12




              @prokher: Yeah, but I did call this "old school".
              – PTBNL
              Jan 16 '15 at 13:40














            • 2




              minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
              – prokher
              Jan 15 '15 at 14:44






            • 12




              @prokher: Yeah, but I did call this "old school".
              – PTBNL
              Jan 16 '15 at 13:40








            2




            2




            minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
            – prokher
            Jan 15 '15 at 14:44




            minor remark: for exception safety it is recommended to use 'with' statement, in your case "with open(filename, 'rt') as fh:"
            – prokher
            Jan 15 '15 at 14:44




            12




            12




            @prokher: Yeah, but I did call this "old school".
            – PTBNL
            Jan 16 '15 at 13:40




            @prokher: Yeah, but I did call this "old school".
            – PTBNL
            Jan 16 '15 at 13:40











            12














            You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html



            From the docs:



            import fileinput
            for line in fileinput.input("filename"):
            process(line)


            This will avoid copying the whole file into memory at once.






            share|improve this answer





















            • Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
              – martineau
              Jul 24 '12 at 3:50
















            12














            You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html



            From the docs:



            import fileinput
            for line in fileinput.input("filename"):
            process(line)


            This will avoid copying the whole file into memory at once.






            share|improve this answer





















            • Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
              – martineau
              Jul 24 '12 at 3:50














            12












            12








            12






            You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html



            From the docs:



            import fileinput
            for line in fileinput.input("filename"):
            process(line)


            This will avoid copying the whole file into memory at once.






            share|improve this answer












            You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html



            From the docs:



            import fileinput
            for line in fileinput.input("filename"):
            process(line)


            This will avoid copying the whole file into memory at once.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jun 25 '11 at 2:06









            Mikola

            7,5522638




            7,5522638












            • Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
              – martineau
              Jul 24 '12 at 3:50


















            • Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
              – martineau
              Jul 24 '12 at 3:50
















            Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
            – martineau
            Jul 24 '12 at 3:50




            Although the docs show the snippet as "typical use", using it does not call the close() method of the returned FileInput class object when the loop finishes -- so I would avoid using it this way. In Python 3.2 they've finally made fileinput compatible with the context manager protocol which addresses this issue (but the code still wouldn't be written quite the way shown).
            – martineau
            Jul 24 '12 at 3:50











            3














            I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.



            #!/usr/bin/env python3.6

            import sys

            with open(sys.argv[2], 'w') as outfile:
            with open(sys.argv[1]) as infile:
            for line in infile:
            outfile.write(line)





            share|improve this answer























            • NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
              – Bruno Bronosky
              Aug 11 '17 at 13:13
















            3














            I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.



            #!/usr/bin/env python3.6

            import sys

            with open(sys.argv[2], 'w') as outfile:
            with open(sys.argv[1]) as infile:
            for line in infile:
            outfile.write(line)





            share|improve this answer























            • NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
              – Bruno Bronosky
              Aug 11 '17 at 13:13














            3












            3








            3






            I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.



            #!/usr/bin/env python3.6

            import sys

            with open(sys.argv[2], 'w') as outfile:
            with open(sys.argv[1]) as infile:
            for line in infile:
            outfile.write(line)





            share|improve this answer














            I couldn't believe that it could be as easy as @john-la-rooy's answer made it seem. So, I recreated the cp command using line by line reading and writing. It's CRAZY FAST.



            #!/usr/bin/env python3.6

            import sys

            with open(sys.argv[2], 'w') as outfile:
            with open(sys.argv[1]) as infile:
            for line in infile:
            outfile.write(line)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 11 '17 at 3:43

























            answered Aug 10 '17 at 21:48









            Bruno Bronosky

            33.6k47780




            33.6k47780












            • NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
              – Bruno Bronosky
              Aug 11 '17 at 13:13


















            • NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
              – Bruno Bronosky
              Aug 11 '17 at 13:13
















            NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
            – Bruno Bronosky
            Aug 11 '17 at 13:13




            NOTE: Because python's readline standardizes line endings, this has the side effect of converting documents with DOS line endings of rn to Unix line endings of n. My whole reason for searching out this topic was that I needed to convert a log file that receives a jumble of line endings (because the developer blindly used various .NET libraries). I was shocked to find that after my initial speed test, I didn't need to go back and rstrip the lines. It was already perfect!
            – Bruno Bronosky
            Aug 11 '17 at 13:13











            3














            Here's what you do if you dont have newlines in the file:



            with open('large_text.txt') as f:
            while True:
            c = f.read(1024)
            if not c:
            break
            print(c)





            share|improve this answer


























              3














              Here's what you do if you dont have newlines in the file:



              with open('large_text.txt') as f:
              while True:
              c = f.read(1024)
              if not c:
              break
              print(c)





              share|improve this answer
























                3












                3








                3






                Here's what you do if you dont have newlines in the file:



                with open('large_text.txt') as f:
                while True:
                c = f.read(1024)
                if not c:
                break
                print(c)





                share|improve this answer












                Here's what you do if you dont have newlines in the file:



                with open('large_text.txt') as f:
                while True:
                c = f.read(1024)
                if not c:
                break
                print(c)






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered May 6 '18 at 15:20









                Ariel Cabib

                1,532139




                1,532139























                    2














                    The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.



                    dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.



                    import dask.dataframe as dd

                    df = dd.read_csv('filename.csv')
                    df.head(10) # return first 10 rows
                    df.tail(10) # return last 10 rows

                    # iterate rows
                    for idx, row in df.iterrows():
                    ...

                    # group by my_field and return mean
                    df.groupby(df.my_field).value.mean().compute()

                    # slice by column
                    df[df.my_field=='XYZ'].compute()





                    share|improve this answer


























                      2














                      The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.



                      dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.



                      import dask.dataframe as dd

                      df = dd.read_csv('filename.csv')
                      df.head(10) # return first 10 rows
                      df.tail(10) # return last 10 rows

                      # iterate rows
                      for idx, row in df.iterrows():
                      ...

                      # group by my_field and return mean
                      df.groupby(df.my_field).value.mean().compute()

                      # slice by column
                      df[df.my_field=='XYZ'].compute()





                      share|improve this answer
























                        2












                        2








                        2






                        The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.



                        dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.



                        import dask.dataframe as dd

                        df = dd.read_csv('filename.csv')
                        df.head(10) # return first 10 rows
                        df.tail(10) # return last 10 rows

                        # iterate rows
                        for idx, row in df.iterrows():
                        ...

                        # group by my_field and return mean
                        df.groupby(df.my_field).value.mean().compute()

                        # slice by column
                        df[df.my_field=='XYZ'].compute()





                        share|improve this answer












                        The blaze project has come a long way over the last 6 years. It has a simple API covering a useful subset of pandas features.



                        dask.dataframe takes care of chunking internally, supports many parallelisable operations and allows you to export slices back to pandas easily for in-memory operations.



                        import dask.dataframe as dd

                        df = dd.read_csv('filename.csv')
                        df.head(10) # return first 10 rows
                        df.tail(10) # return last 10 rows

                        # iterate rows
                        for idx, row in df.iterrows():
                        ...

                        # group by my_field and return mean
                        df.groupby(df.my_field).value.mean().compute()

                        # slice by column
                        df[df.my_field=='XYZ'].compute()






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Jan 22 '18 at 20:51









                        jpp

                        92.2k2053103




                        92.2k2053103























                            0














                            How about this?
                            Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.



                            Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.



                            def chunks(file,size=1024):
                            while 1:

                            startat=fh.tell()
                            print startat #file's object current position from the start
                            fh.seek(size,1) #offset from current postion -->1
                            data=fh.readline()
                            yield startat,fh.tell()-startat #doesnt store whole list in memory
                            if not data:
                            break
                            if os.path.isfile(fname):
                            try:
                            fh=open(fname,'rb')
                            except IOError as e: #file --> permission denied
                            print "I/O error({0}): {1}".format(e.errno, e.strerror)
                            except Exception as e1: #handle other exceptions such as attribute errors
                            print "Unexpected error: {0}".format(e1)
                            for ele in chunks(fh):
                            fh.seek(ele[0])#startat
                            data=fh.read(ele[1])#endat
                            print data





                            share|improve this answer





















                            • This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
                              – Nikhil VJ
                              Mar 31 '18 at 3:59
















                            0














                            How about this?
                            Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.



                            Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.



                            def chunks(file,size=1024):
                            while 1:

                            startat=fh.tell()
                            print startat #file's object current position from the start
                            fh.seek(size,1) #offset from current postion -->1
                            data=fh.readline()
                            yield startat,fh.tell()-startat #doesnt store whole list in memory
                            if not data:
                            break
                            if os.path.isfile(fname):
                            try:
                            fh=open(fname,'rb')
                            except IOError as e: #file --> permission denied
                            print "I/O error({0}): {1}".format(e.errno, e.strerror)
                            except Exception as e1: #handle other exceptions such as attribute errors
                            print "Unexpected error: {0}".format(e1)
                            for ele in chunks(fh):
                            fh.seek(ele[0])#startat
                            data=fh.read(ele[1])#endat
                            print data





                            share|improve this answer





















                            • This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
                              – Nikhil VJ
                              Mar 31 '18 at 3:59














                            0












                            0








                            0






                            How about this?
                            Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.



                            Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.



                            def chunks(file,size=1024):
                            while 1:

                            startat=fh.tell()
                            print startat #file's object current position from the start
                            fh.seek(size,1) #offset from current postion -->1
                            data=fh.readline()
                            yield startat,fh.tell()-startat #doesnt store whole list in memory
                            if not data:
                            break
                            if os.path.isfile(fname):
                            try:
                            fh=open(fname,'rb')
                            except IOError as e: #file --> permission denied
                            print "I/O error({0}): {1}".format(e.errno, e.strerror)
                            except Exception as e1: #handle other exceptions such as attribute errors
                            print "Unexpected error: {0}".format(e1)
                            for ele in chunks(fh):
                            fh.seek(ele[0])#startat
                            data=fh.read(ele[1])#endat
                            print data





                            share|improve this answer












                            How about this?
                            Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.



                            Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.



                            def chunks(file,size=1024):
                            while 1:

                            startat=fh.tell()
                            print startat #file's object current position from the start
                            fh.seek(size,1) #offset from current postion -->1
                            data=fh.readline()
                            yield startat,fh.tell()-startat #doesnt store whole list in memory
                            if not data:
                            break
                            if os.path.isfile(fname):
                            try:
                            fh=open(fname,'rb')
                            except IOError as e: #file --> permission denied
                            print "I/O error({0}): {1}".format(e.errno, e.strerror)
                            except Exception as e1: #handle other exceptions such as attribute errors
                            print "Unexpected error: {0}".format(e1)
                            for ele in chunks(fh):
                            fh.seek(ele[0])#startat
                            data=fh.read(ele[1])#endat
                            print data






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Oct 25 '17 at 0:30









                            Arohi Gupta

                            176




                            176












                            • This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
                              – Nikhil VJ
                              Mar 31 '18 at 3:59


















                            • This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
                              – Nikhil VJ
                              Mar 31 '18 at 3:59
















                            This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
                            – Nikhil VJ
                            Mar 31 '18 at 3:59




                            This looks promising. Is this loading by bytes or by lines? I'm afraid of lines being broken if it's by bytes.. how can we load say 1000 lines at a time and process that?
                            – Nikhil VJ
                            Mar 31 '18 at 3:59











                            0














                            Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.



                            Then I had to remove a "=n" in the middle of each line.



                            Then I split the lines at the new line.



                            b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
                            a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
                            data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
                            data_list = data_chunk.split('n') #List containing lines in chunk
                            #print(data_list,'n')
                            #time.sleep(1)
                            for j in range(len(data_list)): #iterate through data_list to get each item
                            i += 1
                            line_of_data = data_list[j]
                            print(line_of_data)


                            Here is the code starting just above "print data" in Arohi's code.






                            share|improve this answer




























                              0














                              Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.



                              Then I had to remove a "=n" in the middle of each line.



                              Then I split the lines at the new line.



                              b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
                              a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
                              data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
                              data_list = data_chunk.split('n') #List containing lines in chunk
                              #print(data_list,'n')
                              #time.sleep(1)
                              for j in range(len(data_list)): #iterate through data_list to get each item
                              i += 1
                              line_of_data = data_list[j]
                              print(line_of_data)


                              Here is the code starting just above "print data" in Arohi's code.






                              share|improve this answer


























                                0












                                0








                                0






                                Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.



                                Then I had to remove a "=n" in the middle of each line.



                                Then I split the lines at the new line.



                                b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
                                a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
                                data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
                                data_list = data_chunk.split('n') #List containing lines in chunk
                                #print(data_list,'n')
                                #time.sleep(1)
                                for j in range(len(data_list)): #iterate through data_list to get each item
                                i += 1
                                line_of_data = data_list[j]
                                print(line_of_data)


                                Here is the code starting just above "print data" in Arohi's code.






                                share|improve this answer














                                Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.



                                Then I had to remove a "=n" in the middle of each line.



                                Then I split the lines at the new line.



                                b_data=(fh.read(ele[1]))#endat This is one chunk of ascii data in binary format
                                a_data=((binascii.b2a_qp(b_data)).decode('utf-8')) #Data chunk in 'split' ascii format
                                data_chunk = (a_data.replace('=n','').strip()) #Splitting characters removed
                                data_list = data_chunk.split('n') #List containing lines in chunk
                                #print(data_list,'n')
                                #time.sleep(1)
                                for j in range(len(data_list)): #iterate through data_list to get each item
                                i += 1
                                line_of_data = data_list[j]
                                print(line_of_data)


                                Here is the code starting just above "print data" in Arohi's code.







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Jan 18 '18 at 16:30









                                WhatsThePoint

                                2,15842036




                                2,15842036










                                answered Jan 18 '18 at 15:28









                                John Haynes

                                1




                                1























                                    0














                                    I demonstrated a parallel byte level random access approach here in this other question:



                                    Getting number of lines in a text file without readlines



                                    Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.






                                    share|improve this answer


























                                      0














                                      I demonstrated a parallel byte level random access approach here in this other question:



                                      Getting number of lines in a text file without readlines



                                      Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.






                                      share|improve this answer
























                                        0












                                        0








                                        0






                                        I demonstrated a parallel byte level random access approach here in this other question:



                                        Getting number of lines in a text file without readlines



                                        Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.






                                        share|improve this answer












                                        I demonstrated a parallel byte level random access approach here in this other question:



                                        Getting number of lines in a text file without readlines



                                        Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered May 4 '18 at 14:17









                                        Geoffrey Anderson

                                        549514




                                        549514























                                            0














                                            Heres the code for loading text files of any size without causing memory issues.
                                            It support gigabytes sized files



                                            https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d



                                            download the file data_loading_utils.py and import it into your code



                                            usage



                                            import data_loading_utils.py.py
                                            file_name = 'file_name.ext'
                                            CHUNK_SIZE = 1000000


                                            def process_lines(data, eof, file_name):

                                            # check if end of file reached
                                            if not eof:
                                            # process data, data is one single line of the file

                                            else:
                                            # end of file reached

                                            data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)


                                            process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.



                                            You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.






                                            share|improve this answer


























                                              0














                                              Heres the code for loading text files of any size without causing memory issues.
                                              It support gigabytes sized files



                                              https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d



                                              download the file data_loading_utils.py and import it into your code



                                              usage



                                              import data_loading_utils.py.py
                                              file_name = 'file_name.ext'
                                              CHUNK_SIZE = 1000000


                                              def process_lines(data, eof, file_name):

                                              # check if end of file reached
                                              if not eof:
                                              # process data, data is one single line of the file

                                              else:
                                              # end of file reached

                                              data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)


                                              process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.



                                              You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.






                                              share|improve this answer
























                                                0












                                                0








                                                0






                                                Heres the code for loading text files of any size without causing memory issues.
                                                It support gigabytes sized files



                                                https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d



                                                download the file data_loading_utils.py and import it into your code



                                                usage



                                                import data_loading_utils.py.py
                                                file_name = 'file_name.ext'
                                                CHUNK_SIZE = 1000000


                                                def process_lines(data, eof, file_name):

                                                # check if end of file reached
                                                if not eof:
                                                # process data, data is one single line of the file

                                                else:
                                                # end of file reached

                                                data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)


                                                process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.



                                                You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.






                                                share|improve this answer












                                                Heres the code for loading text files of any size without causing memory issues.
                                                It support gigabytes sized files



                                                https://gist.github.com/iyvinjose/e6c1cb2821abd5f01fd1b9065cbc759d



                                                download the file data_loading_utils.py and import it into your code



                                                usage



                                                import data_loading_utils.py.py
                                                file_name = 'file_name.ext'
                                                CHUNK_SIZE = 1000000


                                                def process_lines(data, eof, file_name):

                                                # check if end of file reached
                                                if not eof:
                                                # process data, data is one single line of the file

                                                else:
                                                # end of file reached

                                                data_loading_utils.read_lines_from_file_as_data_chunks(file_name, chunk_size=CHUNK_SIZE, callback=self.process_lines)


                                                process_lines method is the callback function. It will be called for all the lines, with parameter data representing one single line of the file at a time.



                                                You can configure the variable CHUNK_SIZE depending on your machine hardware configurations.







                                                share|improve this answer












                                                share|improve this answer



                                                share|improve this answer










                                                answered Jul 25 '18 at 2:32









                                                Iyvin Jose

                                                31419




                                                31419























                                                    -1














                                                    Please try this:



                                                    with open('filename','r',buffering=100000) as f:
                                                    for line in f:
                                                    print line





                                                    share|improve this answer























                                                    • please explain?
                                                      – Nikhil VJ
                                                      Mar 31 '18 at 4:00










                                                    • From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
                                                      – jyoti das
                                                      Apr 19 '18 at 5:26


















                                                    -1














                                                    Please try this:



                                                    with open('filename','r',buffering=100000) as f:
                                                    for line in f:
                                                    print line





                                                    share|improve this answer























                                                    • please explain?
                                                      – Nikhil VJ
                                                      Mar 31 '18 at 4:00










                                                    • From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
                                                      – jyoti das
                                                      Apr 19 '18 at 5:26
















                                                    -1












                                                    -1








                                                    -1






                                                    Please try this:



                                                    with open('filename','r',buffering=100000) as f:
                                                    for line in f:
                                                    print line





                                                    share|improve this answer














                                                    Please try this:



                                                    with open('filename','r',buffering=100000) as f:
                                                    for line in f:
                                                    print line






                                                    share|improve this answer














                                                    share|improve this answer



                                                    share|improve this answer








                                                    edited Jan 25 '18 at 15:14









                                                    Daniel Trugman

                                                    4,7671032




                                                    4,7671032










                                                    answered Jan 25 '18 at 14:48









                                                    jyoti das

                                                    11




                                                    11












                                                    • please explain?
                                                      – Nikhil VJ
                                                      Mar 31 '18 at 4:00










                                                    • From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
                                                      – jyoti das
                                                      Apr 19 '18 at 5:26




















                                                    • please explain?
                                                      – Nikhil VJ
                                                      Mar 31 '18 at 4:00










                                                    • From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
                                                      – jyoti das
                                                      Apr 19 '18 at 5:26


















                                                    please explain?
                                                    – Nikhil VJ
                                                    Mar 31 '18 at 4:00




                                                    please explain?
                                                    – Nikhil VJ
                                                    Mar 31 '18 at 4:00












                                                    From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
                                                    – jyoti das
                                                    Apr 19 '18 at 5:26






                                                    From Python's official docmunets: link The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used
                                                    – jyoti das
                                                    Apr 19 '18 at 5:26













                                                    -9














                                                    f=open('filename','r').read()
                                                    f1=f.split('n')
                                                    for i in range (len(f1)):
                                                    do_something_with(f1[i])


                                                    hope this helps.






                                                    share|improve this answer



















                                                    • 3




                                                      Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
                                                      – Fermi paradox
                                                      Apr 12 '16 at 8:43
















                                                    -9














                                                    f=open('filename','r').read()
                                                    f1=f.split('n')
                                                    for i in range (len(f1)):
                                                    do_something_with(f1[i])


                                                    hope this helps.






                                                    share|improve this answer



















                                                    • 3




                                                      Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
                                                      – Fermi paradox
                                                      Apr 12 '16 at 8:43














                                                    -9












                                                    -9








                                                    -9






                                                    f=open('filename','r').read()
                                                    f1=f.split('n')
                                                    for i in range (len(f1)):
                                                    do_something_with(f1[i])


                                                    hope this helps.






                                                    share|improve this answer














                                                    f=open('filename','r').read()
                                                    f1=f.split('n')
                                                    for i in range (len(f1)):
                                                    do_something_with(f1[i])


                                                    hope this helps.







                                                    share|improve this answer














                                                    share|improve this answer



                                                    share|improve this answer








                                                    edited Apr 12 '16 at 7:54

























                                                    answered Apr 12 '16 at 7:47









                                                    Sainik Kr Mahata

                                                    152




                                                    152








                                                    • 3




                                                      Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
                                                      – Fermi paradox
                                                      Apr 12 '16 at 8:43














                                                    • 3




                                                      Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
                                                      – Fermi paradox
                                                      Apr 12 '16 at 8:43








                                                    3




                                                    3




                                                    Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
                                                    – Fermi paradox
                                                    Apr 12 '16 at 8:43




                                                    Wouldn't this read the whole file in memory? The question asks explicitly how to avoid that, therefore this doesn't answer the question.
                                                    – Fermi paradox
                                                    Apr 12 '16 at 8:43


















                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Stack Overflow!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.





                                                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                                    Please pay close attention to the following guidance:


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6475328%2fread-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    Full-time equivalent

                                                    さくらももこ

                                                    13 indicted, 8 arrested in Calif. drug cartel investigation