Using is.na with Sapply function in R
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
add a comment |
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
1
You are looping through the columns (withsapply- assumingXis adata.frame), get the number of NA elements (by doing thesumof logical vector (is.na(x)) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming thatXis the same asairports, or is a subset of theairportscolumns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. DroppingNAs in sum would like likesum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value isNA,sum(is.na(x))essentially means "count the number ofNAs inx"
– Gregor
Nov 12 '18 at 21:42
add a comment |
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
r lapply na sapply
edited Nov 12 '18 at 21:46
Joe
3,0191736
3,0191736
asked Nov 12 '18 at 21:30
srkalesrkale
112
112
1
You are looping through the columns (withsapply- assumingXis adata.frame), get the number of NA elements (by doing thesumof logical vector (is.na(x)) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming thatXis the same asairports, or is a subset of theairportscolumns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. DroppingNAs in sum would like likesum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value isNA,sum(is.na(x))essentially means "count the number ofNAs inx"
– Gregor
Nov 12 '18 at 21:42
add a comment |
1
You are looping through the columns (withsapply- assumingXis adata.frame), get the number of NA elements (by doing thesumof logical vector (is.na(x)) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming thatXis the same asairports, or is a subset of theairportscolumns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. DroppingNAs in sum would like likesum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value isNA,sum(is.na(x))essentially means "count the number ofNAs inx"
– Gregor
Nov 12 '18 at 21:42
1
1
You are looping through the columns (with
sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.– akrun
Nov 12 '18 at 21:35
You are looping through the columns (with
sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.– akrun
Nov 12 '18 at 21:35
1
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
2
@EnriquePérezHerrero not by row, by column. (And assuming that
X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)– Gregor
Nov 12 '18 at 21:39
@EnriquePérezHerrero not by row, by column. (And assuming that
X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. Dropping
NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"– Gregor
Nov 12 '18 at 21:42
@srkale it is not dropping anything. Dropping
NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"– Gregor
Nov 12 '18 at 21:42
add a comment |
1 Answer
1
active
oldest
votes
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.
It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.
In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na() on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.
It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.
In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na() on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.
It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.
In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na() on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.
It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.
In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na() on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.
It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.
In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na() on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
edited Nov 12 '18 at 22:04
answered Nov 12 '18 at 21:47
GregorGregor
63.4k989168
63.4k989168
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You are looping through the columns (with
sapply- assumingXis adata.frame), get the number of NA elements (by doing thesumof logical vector (is.na(x)) and divide by the number of rows or airports.– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming that
Xis the same asairports, or is a subset of theairportscolumns, or has the same number of rows. That part isn't clear at all.)– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. Dropping
NAs in sum would like likesum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value isNA,sum(is.na(x))essentially means "count the number ofNAs inx"– Gregor
Nov 12 '18 at 21:42