Listing files in Google Cloud bucket with thousands of files.












0















I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.



In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:






<--- Last few GCs --->

[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory





Below is the code I'm using. Is there a better way?






const Storage = require('@google-cloud/storage');

function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library


// Creates a client
const storage = new Storage();

/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';

// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}

listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});












share|improve this question























  • Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.

    – John Hanley
    Nov 13 '18 at 20:55











  • How would I use the API to get a list of files? Can you link to the documentation?

    – markkazanski
    Nov 14 '18 at 17:31
















0















I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.



In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:






<--- Last few GCs --->

[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory





Below is the code I'm using. Is there a better way?






const Storage = require('@google-cloud/storage');

function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library


// Creates a client
const storage = new Storage();

/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';

// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}

listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});












share|improve this question























  • Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.

    – John Hanley
    Nov 13 '18 at 20:55











  • How would I use the API to get a list of files? Can you link to the documentation?

    – markkazanski
    Nov 14 '18 at 17:31














0












0








0


1






I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.



In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:






<--- Last few GCs --->

[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory





Below is the code I'm using. Is there a better way?






const Storage = require('@google-cloud/storage');

function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library


// Creates a client
const storage = new Storage();

/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';

// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}

listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});












share|improve this question














I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.



In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:






<--- Last few GCs --->

[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory





Below is the code I'm using. Is there a better way?






const Storage = require('@google-cloud/storage');

function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library


// Creates a client
const storage = new Storage();

/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';

// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}

listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});








<--- Last few GCs --->

[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory





<--- Last few GCs --->

[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory





const Storage = require('@google-cloud/storage');

function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library


// Creates a client
const storage = new Storage();

/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';

// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}

listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});





const Storage = require('@google-cloud/storage');

function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library


// Creates a client
const storage = new Storage();

/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';

// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}

listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});






javascript node.js google-app-engine google-cloud-platform google-cloud-storage






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 19:49









markkazanskimarkkazanski

348




348













  • Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.

    – John Hanley
    Nov 13 '18 at 20:55











  • How would I use the API to get a list of files? Can you link to the documentation?

    – markkazanski
    Nov 14 '18 at 17:31



















  • Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.

    – John Hanley
    Nov 13 '18 at 20:55











  • How would I use the API to get a list of files? Can you link to the documentation?

    – markkazanski
    Nov 14 '18 at 17:31

















Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.

– John Hanley
Nov 13 '18 at 20:55





Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.

– John Hanley
Nov 13 '18 at 20:55













How would I use the API to get a list of files? Can you link to the documentation?

– markkazanski
Nov 14 '18 at 17:31





How would I use the API to get a list of files? Can you link to the documentation?

– markkazanski
Nov 14 '18 at 17:31












2 Answers
2






active

oldest

votes


















2














Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()



bucket.getFilesStream()
.on('error', console.error)
.on('data', function(file) {
// file is a File object.
})
.on('end', function() {
// All files retrieved.
});


Alternatively you can disable auto pagination and manual page through the results



const callback = function(err, files, nextQuery, apiResponse) {
if (nextQuery) {
// More results exist.
bucket.getFiles(nextQuery, callback);
}
};

bucket.getFiles({
autoPaginate: false
}, callback);





share|improve this answer































    0














    As pointed out in the comments you should use the Objects: list API to list large buckets.



    Also, if I am reading the library documentation correctly you can set the autoPaginate option to false and manually iterate over the results, without having to talk to the JSON api directly.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288478%2flisting-files-in-google-cloud-bucket-with-thousands-of-files%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()



      bucket.getFilesStream()
      .on('error', console.error)
      .on('data', function(file) {
      // file is a File object.
      })
      .on('end', function() {
      // All files retrieved.
      });


      Alternatively you can disable auto pagination and manual page through the results



      const callback = function(err, files, nextQuery, apiResponse) {
      if (nextQuery) {
      // More results exist.
      bucket.getFiles(nextQuery, callback);
      }
      };

      bucket.getFiles({
      autoPaginate: false
      }, callback);





      share|improve this answer




























        2














        Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()



        bucket.getFilesStream()
        .on('error', console.error)
        .on('data', function(file) {
        // file is a File object.
        })
        .on('end', function() {
        // All files retrieved.
        });


        Alternatively you can disable auto pagination and manual page through the results



        const callback = function(err, files, nextQuery, apiResponse) {
        if (nextQuery) {
        // More results exist.
        bucket.getFiles(nextQuery, callback);
        }
        };

        bucket.getFiles({
        autoPaginate: false
        }, callback);





        share|improve this answer


























          2












          2








          2







          Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()



          bucket.getFilesStream()
          .on('error', console.error)
          .on('data', function(file) {
          // file is a File object.
          })
          .on('end', function() {
          // All files retrieved.
          });


          Alternatively you can disable auto pagination and manual page through the results



          const callback = function(err, files, nextQuery, apiResponse) {
          if (nextQuery) {
          // More results exist.
          bucket.getFiles(nextQuery, callback);
          }
          };

          bucket.getFiles({
          autoPaginate: false
          }, callback);





          share|improve this answer













          Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()



          bucket.getFilesStream()
          .on('error', console.error)
          .on('data', function(file) {
          // file is a File object.
          })
          .on('end', function() {
          // All files retrieved.
          });


          Alternatively you can disable auto pagination and manual page through the results



          const callback = function(err, files, nextQuery, apiResponse) {
          if (nextQuery) {
          // More results exist.
          bucket.getFiles(nextQuery, callback);
          }
          };

          bucket.getFiles({
          autoPaginate: false
          }, callback);






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 18:06









          callmehiphopcallmehiphop

          48638




          48638

























              0














              As pointed out in the comments you should use the Objects: list API to list large buckets.



              Also, if I am reading the library documentation correctly you can set the autoPaginate option to false and manually iterate over the results, without having to talk to the JSON api directly.






              share|improve this answer




























                0














                As pointed out in the comments you should use the Objects: list API to list large buckets.



                Also, if I am reading the library documentation correctly you can set the autoPaginate option to false and manually iterate over the results, without having to talk to the JSON api directly.






                share|improve this answer


























                  0












                  0








                  0







                  As pointed out in the comments you should use the Objects: list API to list large buckets.



                  Also, if I am reading the library documentation correctly you can set the autoPaginate option to false and manually iterate over the results, without having to talk to the JSON api directly.






                  share|improve this answer













                  As pointed out in the comments you should use the Objects: list API to list large buckets.



                  Also, if I am reading the library documentation correctly you can set the autoPaginate option to false and manually iterate over the results, without having to talk to the JSON api directly.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 15 '18 at 21:07









                  coryancoryan

                  1263




                  1263






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288478%2flisting-files-in-google-cloud-bucket-with-thousands-of-files%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Full-time equivalent

                      さくらももこ

                      13 indicted, 8 arrested in Calif. drug cartel investigation