Listing files in Google Cloud bucket with thousands of files.
I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.
In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:
<--- Last few GCs --->
[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Below is the code I'm using. Is there a better way?
const Storage = require('@google-cloud/storage');
function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library
// Creates a client
const storage = new Storage();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';
// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}
listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});
javascript node.js google-app-engine google-cloud-platform google-cloud-storage
add a comment |
I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.
In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:
<--- Last few GCs --->
[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Below is the code I'm using. Is there a better way?
const Storage = require('@google-cloud/storage');
function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library
// Creates a client
const storage = new Storage();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';
// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}
listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});
javascript node.js google-app-engine google-cloud-platform google-cloud-storage
Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.
– John Hanley
Nov 13 '18 at 20:55
How would I use the API to get a list of files? Can you link to the documentation?
– markkazanski
Nov 14 '18 at 17:31
add a comment |
I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.
In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:
<--- Last few GCs --->
[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Below is the code I'm using. Is there a better way?
const Storage = require('@google-cloud/storage');
function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library
// Creates a client
const storage = new Storage();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';
// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}
listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});
javascript node.js google-app-engine google-cloud-platform google-cloud-storage
I'm running a Node.js script to get the number of files in a bucket on Google Cloud Storage.
In a bucket with about 30K files, I get a result in a few seconds. In a bucket with about 300K files, I get the following error:
<--- Last few GCs --->
[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Below is the code I'm using. Is there a better way?
const Storage = require('@google-cloud/storage');
function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library
// Creates a client
const storage = new Storage();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';
// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}
listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});
<--- Last few GCs --->
[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
<--- Last few GCs --->
[10508:0000014DB738ADB0] 2053931 ms: Mark-sweep 1400.6 (1467.7) -> 1400.6 (1437.2) MB, 1292.2 / 0.0 ms (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 1292 ms) last resort GC in old space requested
[10508:0000014DB738ADB0] 2055233 ms: Mark-sweep 1400.6 (1437.2) -> 1400.6 (1437.2) MB, 1301.9 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 000001A6B8025EE1 <JSObject>
1: /* anonymous */(aka /* anonymous */) [D:LibrariesDocumentsproject-namenode_modules@google-cloudstoragesrcacl.js:~717] [pc=0000005E62D95DCF](this=0000016DB7602311 <undefined>,accessMethod=0000016DB7602AC1 <String[3]: add>)
2: arguments adaptor frame: 3->1
3: forEach(this=00000335A20E8891 <JSArray[2]>)
4: /* anonymous */(a...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
const Storage = require('@google-cloud/storage');
function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library
// Creates a client
const storage = new Storage();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';
// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}
listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});
const Storage = require('@google-cloud/storage');
function listFiles(bucketName) {
// [START storage_list_files]
// Imports the Google Cloud client library
// Creates a client
const storage = new Storage();
/**
* TODO(developer): Uncomment the following line before running the sample.
*/
// const bucketName = 'Name of a bucket, e.g. my-bucket';
// Lists files in the bucket
return storage
.bucket(bucketName)
.getFiles(); ///const files = results[0];
// [END storage_list_files]
}
listFiles('bucket-name')
.then(x => {
console.log('Number of files: ', x[0].length)
});
javascript node.js google-app-engine google-cloud-platform google-cloud-storage
javascript node.js google-app-engine google-cloud-platform google-cloud-storage
asked Nov 13 '18 at 19:49
markkazanskimarkkazanski
348
348
Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.
– John Hanley
Nov 13 '18 at 20:55
How would I use the API to get a list of files? Can you link to the documentation?
– markkazanski
Nov 14 '18 at 17:31
add a comment |
Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.
– John Hanley
Nov 13 '18 at 20:55
How would I use the API to get a list of files? Can you link to the documentation?
– markkazanski
Nov 14 '18 at 17:31
Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.
– John Hanley
Nov 13 '18 at 20:55
Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.
– John Hanley
Nov 13 '18 at 20:55
How would I use the API to get a list of files? Can you link to the documentation?
– markkazanski
Nov 14 '18 at 17:31
How would I use the API to get a list of files? Can you link to the documentation?
– markkazanski
Nov 14 '18 at 17:31
add a comment |
2 Answers
2
active
oldest
votes
Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()
bucket.getFilesStream()
.on('error', console.error)
.on('data', function(file) {
// file is a File object.
})
.on('end', function() {
// All files retrieved.
});
Alternatively you can disable auto pagination and manual page through the results
const callback = function(err, files, nextQuery, apiResponse) {
if (nextQuery) {
// More results exist.
bucket.getFiles(nextQuery, callback);
}
};
bucket.getFiles({
autoPaginate: false
}, callback);
add a comment |
As pointed out in the comments you should use the Objects: list
API to list large buckets.
Also, if I am reading the library documentation correctly you can set the autoPaginate
option to false
and manually iterate over the results, without having to talk to the JSON api directly.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288478%2flisting-files-in-google-cloud-bucket-with-thousands-of-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()
bucket.getFilesStream()
.on('error', console.error)
.on('data', function(file) {
// file is a File object.
})
.on('end', function() {
// All files retrieved.
});
Alternatively you can disable auto pagination and manual page through the results
const callback = function(err, files, nextQuery, apiResponse) {
if (nextQuery) {
// More results exist.
bucket.getFiles(nextQuery, callback);
}
};
bucket.getFiles({
autoPaginate: false
}, callback);
add a comment |
Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()
bucket.getFilesStream()
.on('error', console.error)
.on('data', function(file) {
// file is a File object.
})
.on('end', function() {
// All files retrieved.
});
Alternatively you can disable auto pagination and manual page through the results
const callback = function(err, files, nextQuery, apiResponse) {
if (nextQuery) {
// More results exist.
bucket.getFiles(nextQuery, callback);
}
};
bucket.getFiles({
autoPaginate: false
}, callback);
add a comment |
Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()
bucket.getFilesStream()
.on('error', console.error)
.on('data', function(file) {
// file is a File object.
})
.on('end', function() {
// All files retrieved.
});
Alternatively you can disable auto pagination and manual page through the results
const callback = function(err, files, nextQuery, apiResponse) {
if (nextQuery) {
// More results exist.
bucket.getFiles(nextQuery, callback);
}
};
bucket.getFiles({
autoPaginate: false
}, callback);
Most of the methods that return lists offer a streaming version of said method. In this case you'll want to use bucket.getFilesStream()
bucket.getFilesStream()
.on('error', console.error)
.on('data', function(file) {
// file is a File object.
})
.on('end', function() {
// All files retrieved.
});
Alternatively you can disable auto pagination and manual page through the results
const callback = function(err, files, nextQuery, apiResponse) {
if (nextQuery) {
// More results exist.
bucket.getFiles(nextQuery, callback);
}
};
bucket.getFiles({
autoPaginate: false
}, callback);
answered Nov 20 '18 at 18:06
callmehiphopcallmehiphop
48638
48638
add a comment |
add a comment |
As pointed out in the comments you should use the Objects: list
API to list large buckets.
Also, if I am reading the library documentation correctly you can set the autoPaginate
option to false
and manually iterate over the results, without having to talk to the JSON api directly.
add a comment |
As pointed out in the comments you should use the Objects: list
API to list large buckets.
Also, if I am reading the library documentation correctly you can set the autoPaginate
option to false
and manually iterate over the results, without having to talk to the JSON api directly.
add a comment |
As pointed out in the comments you should use the Objects: list
API to list large buckets.
Also, if I am reading the library documentation correctly you can set the autoPaginate
option to false
and manually iterate over the results, without having to talk to the JSON api directly.
As pointed out in the comments you should use the Objects: list
API to list large buckets.
Also, if I am reading the library documentation correctly you can set the autoPaginate
option to false
and manually iterate over the results, without having to talk to the JSON api directly.
answered Nov 15 '18 at 21:07
coryancoryan
1263
1263
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53288478%2flisting-files-in-google-cloud-bucket-with-thousands-of-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Your problem is that you are running out of memory to store the returned results. You will need to use the APIs and page the output. This will limit the number of items returned for each response.
– John Hanley
Nov 13 '18 at 20:55
How would I use the API to get a list of files? Can you link to the documentation?
– markkazanski
Nov 14 '18 at 17:31