Using is.na with Sapply function in R

Can anyone tell me what the line of code written below do?

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.

Any help is appreciated.

Thank you

edited Nov 12 '18 at 21:46

Joe

3,0191736

asked Nov 12 '18 at 21:30

srkale

112

1

You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.

– akrun
Nov 12 '18 at 21:35

1

I think it counts the percentage of NA's entries by row

– Enrique Pérez Herrero
Nov 12 '18 at 21:38

2

@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)

– Gregor
Nov 12 '18 at 21:39

@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"

– Gregor
Nov 12 '18 at 21:42

add a comment |

Can anyone tell me what the line of code written below do?

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.

Any help is appreciated.

Thank you

edited Nov 12 '18 at 21:46

Joe

3,0191736

asked Nov 12 '18 at 21:30

srkale

112

1

You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.

– akrun
Nov 12 '18 at 21:35

1

I think it counts the percentage of NA's entries by row

– Enrique Pérez Herrero
Nov 12 '18 at 21:38

2

@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)

– Gregor
Nov 12 '18 at 21:39

@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"

– Gregor
Nov 12 '18 at 21:42

add a comment |

Can anyone tell me what the line of code written below do?

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.

Any help is appreciated.

Thank you

edited Nov 12 '18 at 21:46

Joe

3,0191736

asked Nov 12 '18 at 21:30

srkale

112

Can anyone tell me what the line of code written below do?

sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100

What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.

Any help is appreciated.

Thank you

r lapply na sapply

edited Nov 12 '18 at 21:46

Joe

3,0191736

asked Nov 12 '18 at 21:30

srkale

112

edited Nov 12 '18 at 21:46

Joe

3,0191736

asked Nov 12 '18 at 21:30

srkale

112

edited Nov 12 '18 at 21:46

Joe

3,0191736

edited Nov 12 '18 at 21:46

Joe

3,0191736

edited Nov 12 '18 at 21:46

Joe

3,0191736

asked Nov 12 '18 at 21:30

srkale

112

asked Nov 12 '18 at 21:30

srkale

112

asked Nov 12 '18 at 21:30

srkale

112

1

You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.

– akrun
Nov 12 '18 at 21:35

1

I think it counts the percentage of NA's entries by row

– Enrique Pérez Herrero
Nov 12 '18 at 21:38

2

@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)

– Gregor
Nov 12 '18 at 21:39

@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"

– Gregor
Nov 12 '18 at 21:42

add a comment |

1

You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.

– akrun
Nov 12 '18 at 21:35

1

I think it counts the percentage of NA's entries by row

– Enrique Pérez Herrero
Nov 12 '18 at 21:38

2

@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)

– Gregor
Nov 12 '18 at 21:39

@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"

– Gregor
Nov 12 '18 at 21:42

You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.

– akrun
Nov 12 '18 at 21:35

I think it counts the percentage of NA's entries by row

– Enrique Pérez Herrero
Nov 12 '18 at 21:38

@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)

– Gregor
Nov 12 '18 at 21:39

@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"

– Gregor
Nov 12 '18 at 21:42

add a comment |

1 Answer
1

active

oldest

votes

Enough comments, time for an answer:

sapply(X,      # apply to each item of X (each column, if X is a data frame)

  function(x)  # this function:

    sum(is.na(x))  # count the NAs

) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object

  # and multiply by 100

In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.

It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).

As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:

# slightly simpler, consistent object

sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function":

# rearrange for more simplicity

sapply(is.na(airports), mean) * 100

edited Nov 12 '18 at 22:04

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

Thank you for the explanation! I appreciate it!

– srkale
Nov 21 '18 at 18:15

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Enough comments, time for an answer:

sapply(X,      # apply to each item of X (each column, if X is a data frame)

  function(x)  # this function:

    sum(is.na(x))  # count the NAs

) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object

  # and multiply by 100

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:

# slightly simpler, consistent object

sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function":

# rearrange for more simplicity

sapply(is.na(airports), mean) * 100

edited Nov 12 '18 at 22:04

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

Thank you for the explanation! I appreciate it!

– srkale
Nov 21 '18 at 18:15

add a comment |

Enough comments, time for an answer:

sapply(X,      # apply to each item of X (each column, if X is a data frame)

  function(x)  # this function:

    sum(is.na(x))  # count the NAs

) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object

  # and multiply by 100

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:

# slightly simpler, consistent object

sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function":

# rearrange for more simplicity

sapply(is.na(airports), mean) * 100

edited Nov 12 '18 at 22:04

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

Thank you for the explanation! I appreciate it!

– srkale
Nov 21 '18 at 18:15

add a comment |

Enough comments, time for an answer:

sapply(X,      # apply to each item of X (each column, if X is a data frame)

  function(x)  # this function:

    sum(is.na(x))  # count the NAs

) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object

  # and multiply by 100

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:

# slightly simpler, consistent object

sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function":

# rearrange for more simplicity

sapply(is.na(airports), mean) * 100

edited Nov 12 '18 at 22:04

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

Enough comments, time for an answer:

sapply(X,      # apply to each item of X (each column, if X is a data frame)

  function(x)  # this function:

    sum(is.na(x))  # count the NAs

) / nrow(airports) * 100  # then divide the result by the number of rows in the the airports object

  # and multiply by 100

In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:

# slightly simpler, consistent object

sapply(airports, function(x) mean(is.na(x))) * 100

We could also use is.na() on the entire data so we don't need the "anonymous function":

# rearrange for more simplicity

sapply(is.na(airports), mean) * 100

edited Nov 12 '18 at 22:04

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

edited Nov 12 '18 at 22:04

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

answered Nov 12 '18 at 21:47

Gregor

63.4k989168

Thank you for the explanation! I appreciate it!

– srkale
Nov 21 '18 at 18:15

add a comment |

Thank you for the explanation! I appreciate it!

– srkale
Nov 21 '18 at 18:15

Thank you for the explanation! I appreciate it!

– srkale
Nov 21 '18 at 18:15

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu