ActionView::Helpers::SanitizeHelper fails with multiple tags












0















When html that is sanitized using the Rails ActionView::Helpers::SanitizeHelper it fails when there are multiple 'wbr' tags.



Text that has been pasted from other applications contains html that has literally hundreds of 'wbr' elements.



When the combined 'depth' of the 'wbr' elements and the outer elements in which they appear reaches 255 all further text in the document appears to be removed.



This can mean that important information is lost.



As an example if you run sanize on the fragment below:



<div>
Some text here

<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr>

More important text here
</div>


The result does not contain the second piece of text. ie the result is:



<div>
Some text here



</div>


I am looking for a way to safely sanitize the html that is pasted from other applications without losing any of the content.



I could obviously replace all 'wbr' elements with spaces using gsub prior to sanitizing but I would like to know that there are not other scenarios that will cause data loss in the same way.



Note that the Rails::Html::TargetScrubber has similar issues if you try to remove 'wbr' elements from the example segment then it removes the last text as well.










share|improve this question























  • I think you should raise an issue in Rails issue tracker. Seems like a bug

    – rubyprince
    Nov 13 '18 at 6:47













  • @runyprince. Opened an issue here

    – giorgio
    Nov 14 '18 at 1:12













  • Hmm, looks like the bug is with Nokogiri (xml/html parsing gem) which Rails uses internally. The rabbit hole goes very deep.

    – rubyprince
    Nov 14 '18 at 19:41


















0















When html that is sanitized using the Rails ActionView::Helpers::SanitizeHelper it fails when there are multiple 'wbr' tags.



Text that has been pasted from other applications contains html that has literally hundreds of 'wbr' elements.



When the combined 'depth' of the 'wbr' elements and the outer elements in which they appear reaches 255 all further text in the document appears to be removed.



This can mean that important information is lost.



As an example if you run sanize on the fragment below:



<div>
Some text here

<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr>

More important text here
</div>


The result does not contain the second piece of text. ie the result is:



<div>
Some text here



</div>


I am looking for a way to safely sanitize the html that is pasted from other applications without losing any of the content.



I could obviously replace all 'wbr' elements with spaces using gsub prior to sanitizing but I would like to know that there are not other scenarios that will cause data loss in the same way.



Note that the Rails::Html::TargetScrubber has similar issues if you try to remove 'wbr' elements from the example segment then it removes the last text as well.










share|improve this question























  • I think you should raise an issue in Rails issue tracker. Seems like a bug

    – rubyprince
    Nov 13 '18 at 6:47













  • @runyprince. Opened an issue here

    – giorgio
    Nov 14 '18 at 1:12













  • Hmm, looks like the bug is with Nokogiri (xml/html parsing gem) which Rails uses internally. The rabbit hole goes very deep.

    – rubyprince
    Nov 14 '18 at 19:41
















0












0








0








When html that is sanitized using the Rails ActionView::Helpers::SanitizeHelper it fails when there are multiple 'wbr' tags.



Text that has been pasted from other applications contains html that has literally hundreds of 'wbr' elements.



When the combined 'depth' of the 'wbr' elements and the outer elements in which they appear reaches 255 all further text in the document appears to be removed.



This can mean that important information is lost.



As an example if you run sanize on the fragment below:



<div>
Some text here

<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr>

More important text here
</div>


The result does not contain the second piece of text. ie the result is:



<div>
Some text here



</div>


I am looking for a way to safely sanitize the html that is pasted from other applications without losing any of the content.



I could obviously replace all 'wbr' elements with spaces using gsub prior to sanitizing but I would like to know that there are not other scenarios that will cause data loss in the same way.



Note that the Rails::Html::TargetScrubber has similar issues if you try to remove 'wbr' elements from the example segment then it removes the last text as well.










share|improve this question














When html that is sanitized using the Rails ActionView::Helpers::SanitizeHelper it fails when there are multiple 'wbr' tags.



Text that has been pasted from other applications contains html that has literally hundreds of 'wbr' elements.



When the combined 'depth' of the 'wbr' elements and the outer elements in which they appear reaches 255 all further text in the document appears to be removed.



This can mean that important information is lost.



As an example if you run sanize on the fragment below:



<div>
Some text here

<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>
<wbr><wbr><wbr><wbr><wbr>

More important text here
</div>


The result does not contain the second piece of text. ie the result is:



<div>
Some text here



</div>


I am looking for a way to safely sanitize the html that is pasted from other applications without losing any of the content.



I could obviously replace all 'wbr' elements with spaces using gsub prior to sanitizing but I would like to know that there are not other scenarios that will cause data loss in the same way.



Note that the Rails::Html::TargetScrubber has similar issues if you try to remove 'wbr' elements from the example segment then it removes the last text as well.







ruby-on-rails sanitize






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 3:49









giorgiogiorgio

1,12041531




1,12041531













  • I think you should raise an issue in Rails issue tracker. Seems like a bug

    – rubyprince
    Nov 13 '18 at 6:47













  • @runyprince. Opened an issue here

    – giorgio
    Nov 14 '18 at 1:12













  • Hmm, looks like the bug is with Nokogiri (xml/html parsing gem) which Rails uses internally. The rabbit hole goes very deep.

    – rubyprince
    Nov 14 '18 at 19:41





















  • I think you should raise an issue in Rails issue tracker. Seems like a bug

    – rubyprince
    Nov 13 '18 at 6:47













  • @runyprince. Opened an issue here

    – giorgio
    Nov 14 '18 at 1:12













  • Hmm, looks like the bug is with Nokogiri (xml/html parsing gem) which Rails uses internally. The rabbit hole goes very deep.

    – rubyprince
    Nov 14 '18 at 19:41



















I think you should raise an issue in Rails issue tracker. Seems like a bug

– rubyprince
Nov 13 '18 at 6:47







I think you should raise an issue in Rails issue tracker. Seems like a bug

– rubyprince
Nov 13 '18 at 6:47















@runyprince. Opened an issue here

– giorgio
Nov 14 '18 at 1:12







@runyprince. Opened an issue here

– giorgio
Nov 14 '18 at 1:12















Hmm, looks like the bug is with Nokogiri (xml/html parsing gem) which Rails uses internally. The rabbit hole goes very deep.

– rubyprince
Nov 14 '18 at 19:41







Hmm, looks like the bug is with Nokogiri (xml/html parsing gem) which Rails uses internally. The rabbit hole goes very deep.

– rubyprince
Nov 14 '18 at 19:41














0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53273513%2factionviewhelperssanitizehelper-fails-with-multiple-wbr-tags%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53273513%2factionviewhelperssanitizehelper-fails-with-multiple-wbr-tags%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Full-time equivalent

Bicuculline

さくらももこ