Random Forest missing values in cases where the variables do not apply

-1

SOME BACKGROUND

I am working on a training Random Forest regressor, for predicting yield in crops. Some of my predictor variables apply only to some cases, e.g. I have a variable denoting the number of rows, which only applies to crops grown in a polytunnel. If the crops are grown in a glasshouse, the number of rows does not apply, so it is left as a null value. I also have another variable which denotes whether the crop is grown under a polytunnel or glasshouse.

THE PROBLEM

As Random Forest does not handle missing values, is there a strategy that could deal with cases where variables take null values for cases where they do not apply? Tutorials and papers on the topic suggest imputing the values, but under the scenarios they consider these variables still apply, and are missing because of some external factor (e.g. rich people don't generally want to reveal their salaries).

asked Nov 13 '18 at 9:09

Bodwin

Yes the best way to approach the problem is to give to those cases a special value. Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1. What you should have is that the tree will use the polutunnel/galsshouse variable to split the data. Then, the data in polytunnel will be evaluated according to the number of rows while the number of rows will be ignored in glasshouse since is constant.

– Roberto
Nov 13 '18 at 13:54

Thank you for the answer - I have now applied your method to my data. My only worry is whether it will actually split on glasshouse/polytunnel - for all I know random forest might decide to use number of rows first, in which case the -1 fill values will have an interesting consequences. I recognise this depends on the underlying data, so as long as I am taking the best approach in the current circumstances, I am happy!

– Bodwin
Nov 14 '18 at 8:49

That is a fair. So I suggest you to check what happen plotting the tree structure. If you have small dataset you could try to compute the entropy/gini values to check manually what happen. I will post the comment as answer

– Roberto
Nov 14 '18 at 8:52

add a comment |

-1

SOME BACKGROUND

THE PROBLEM

asked Nov 13 '18 at 9:09

Bodwin

Yes the best way to approach the problem is to give to those cases a special value. Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1. What you should have is that the tree will use the polutunnel/galsshouse variable to split the data. Then, the data in polytunnel will be evaluated according to the number of rows while the number of rows will be ignored in glasshouse since is constant.

– Roberto
Nov 13 '18 at 13:54

Thank you for the answer - I have now applied your method to my data. My only worry is whether it will actually split on glasshouse/polytunnel - for all I know random forest might decide to use number of rows first, in which case the -1 fill values will have an interesting consequences. I recognise this depends on the underlying data, so as long as I am taking the best approach in the current circumstances, I am happy!

– Bodwin
Nov 14 '18 at 8:49

That is a fair. So I suggest you to check what happen plotting the tree structure. If you have small dataset you could try to compute the entropy/gini values to check manually what happen. I will post the comment as answer

– Roberto
Nov 14 '18 at 8:52

add a comment |

-1

SOME BACKGROUND

THE PROBLEM

asked Nov 13 '18 at 9:09

Bodwin

SOME BACKGROUND

THE PROBLEM

machine-learning null regression missing-data imputation

asked Nov 13 '18 at 9:09

Bodwin

asked Nov 13 '18 at 9:09

Bodwin

asked Nov 13 '18 at 9:09

Bodwin

asked Nov 13 '18 at 9:09

Bodwin

asked Nov 13 '18 at 9:09

Bodwin

Yes the best way to approach the problem is to give to those cases a special value. Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1. What you should have is that the tree will use the polutunnel/galsshouse variable to split the data. Then, the data in polytunnel will be evaluated according to the number of rows while the number of rows will be ignored in glasshouse since is constant.

– Roberto
Nov 13 '18 at 13:54

Thank you for the answer - I have now applied your method to my data. My only worry is whether it will actually split on glasshouse/polytunnel - for all I know random forest might decide to use number of rows first, in which case the -1 fill values will have an interesting consequences. I recognise this depends on the underlying data, so as long as I am taking the best approach in the current circumstances, I am happy!

– Bodwin
Nov 14 '18 at 8:49

That is a fair. So I suggest you to check what happen plotting the tree structure. If you have small dataset you could try to compute the entropy/gini values to check manually what happen. I will post the comment as answer

– Roberto
Nov 14 '18 at 8:52

add a comment |

Yes the best way to approach the problem is to give to those cases a special value. Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1. What you should have is that the tree will use the polutunnel/galsshouse variable to split the data. Then, the data in polytunnel will be evaluated according to the number of rows while the number of rows will be ignored in glasshouse since is constant.

– Roberto
Nov 13 '18 at 13:54

Thank you for the answer - I have now applied your method to my data. My only worry is whether it will actually split on glasshouse/polytunnel - for all I know random forest might decide to use number of rows first, in which case the -1 fill values will have an interesting consequences. I recognise this depends on the underlying data, so as long as I am taking the best approach in the current circumstances, I am happy!

– Bodwin
Nov 14 '18 at 8:49

That is a fair. So I suggest you to check what happen plotting the tree structure. If you have small dataset you could try to compute the entropy/gini values to check manually what happen. I will post the comment as answer

– Roberto
Nov 14 '18 at 8:52

Yes the best way to approach the problem is to give to those cases a special value. Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1. What you should have is that the tree will use the polutunnel/galsshouse variable to split the data. Then, the data in polytunnel will be evaluated according to the number of rows while the number of rows will be ignored in glasshouse since is constant.

– Roberto
Nov 13 '18 at 13:54

Thank you for the answer - I have now applied your method to my data. My only worry is whether it will actually split on glasshouse/polytunnel - for all I know random forest might decide to use number of rows first, in which case the -1 fill values will have an interesting consequences. I recognise this depends on the underlying data, so as long as I am taking the best approach in the current circumstances, I am happy!

– Bodwin
Nov 14 '18 at 8:49

That is a fair. So I suggest you to check what happen plotting the tree structure. If you have small dataset you could try to compute the entropy/gini values to check manually what happen. I will post the comment as answer

– Roberto
Nov 14 '18 at 8:52

add a comment |

1 Answer
1

active

oldest

votes

The best way to approach the problem is to give to those cases a special value.

Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1.

What you should have is that the tree will use the polutunnel/galsshouse variable to split the data. Then, the data in polytunnel will be evaluated according to the number of rows while the number of rows will be ignored in glasshouse since is constant.

answered Nov 14 '18 at 8:57

Roberto

50512

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53277409%2frandom-forest-missing-values-in-cases-where-the-variables-do-not-apply%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The best way to approach the problem is to give to those cases a special value.

Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1.

answered Nov 14 '18 at 8:57

Roberto

50512

add a comment |

The best way to approach the problem is to give to those cases a special value.

Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1.

answered Nov 14 '18 at 8:57

Roberto

50512

add a comment |

The best way to approach the problem is to give to those cases a special value.

Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1.

answered Nov 14 '18 at 8:57

Roberto

50512

The best way to approach the problem is to give to those cases a special value.

Ad example if for the polytunnel crops the number of rows ranges in [0,100], to all the samples in glasshouse you will give -1.

answered Nov 14 '18 at 8:57

Roberto

50512

answered Nov 14 '18 at 8:57

Roberto

50512

answered Nov 14 '18 at 8:57

Roberto

50512

answered Nov 14 '18 at 8:57

Roberto

50512

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu