Swift - Regex to extract value











up vote
-3
down vote

favorite
1












I want to extract value from a string which has unique starting and ending character. In my case its em



"Fully <em>Furni</em>shed |Downtown and Canal Views",


result




Furnished











share|improve this question




















  • 1




    Is that <em> tag the only HTML content in your string? In general you should not be parsing HTML using regex.
    – Tim Biegeleisen
    Nov 8 at 9:51










  • Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
    – Taimur Ajmal
    Nov 8 at 9:53






  • 3




    What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Fur<em>ni</em>shed. Do you expect nished? Furnished? Furni? ni?
    – Larme
    Nov 8 at 10:16








  • 1




    Try let res = s.replacingOccurrences(of: ".*<em>(\S*?)</em>(\S*).*", with: "$1$2", options: .regularExpression)
    – Wiktor Stribiżew
    Nov 8 at 20:49










  • @Larme it would always be in this format <em>Furni</em>shed <em>balc</em>ony <em>gard</em>en
    – Taimur Ajmal
    Nov 10 at 21:46















up vote
-3
down vote

favorite
1












I want to extract value from a string which has unique starting and ending character. In my case its em



"Fully <em>Furni</em>shed |Downtown and Canal Views",


result




Furnished











share|improve this question




















  • 1




    Is that <em> tag the only HTML content in your string? In general you should not be parsing HTML using regex.
    – Tim Biegeleisen
    Nov 8 at 9:51










  • Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
    – Taimur Ajmal
    Nov 8 at 9:53






  • 3




    What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Fur<em>ni</em>shed. Do you expect nished? Furnished? Furni? ni?
    – Larme
    Nov 8 at 10:16








  • 1




    Try let res = s.replacingOccurrences(of: ".*<em>(\S*?)</em>(\S*).*", with: "$1$2", options: .regularExpression)
    – Wiktor Stribiżew
    Nov 8 at 20:49










  • @Larme it would always be in this format <em>Furni</em>shed <em>balc</em>ony <em>gard</em>en
    – Taimur Ajmal
    Nov 10 at 21:46













up vote
-3
down vote

favorite
1









up vote
-3
down vote

favorite
1






1





I want to extract value from a string which has unique starting and ending character. In my case its em



"Fully <em>Furni</em>shed |Downtown and Canal Views",


result




Furnished











share|improve this question















I want to extract value from a string which has unique starting and ending character. In my case its em



"Fully <em>Furni</em>shed |Downtown and Canal Views",


result




Furnished








ios swift regex






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 21:05

























asked Nov 8 at 9:49









Taimur Ajmal

1,61052947




1,61052947








  • 1




    Is that <em> tag the only HTML content in your string? In general you should not be parsing HTML using regex.
    – Tim Biegeleisen
    Nov 8 at 9:51










  • Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
    – Taimur Ajmal
    Nov 8 at 9:53






  • 3




    What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Fur<em>ni</em>shed. Do you expect nished? Furnished? Furni? ni?
    – Larme
    Nov 8 at 10:16








  • 1




    Try let res = s.replacingOccurrences(of: ".*<em>(\S*?)</em>(\S*).*", with: "$1$2", options: .regularExpression)
    – Wiktor Stribiżew
    Nov 8 at 20:49










  • @Larme it would always be in this format <em>Furni</em>shed <em>balc</em>ony <em>gard</em>en
    – Taimur Ajmal
    Nov 10 at 21:46














  • 1




    Is that <em> tag the only HTML content in your string? In general you should not be parsing HTML using regex.
    – Tim Biegeleisen
    Nov 8 at 9:51










  • Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
    – Taimur Ajmal
    Nov 8 at 9:53






  • 3




    What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Fur<em>ni</em>shed. Do you expect nished? Furnished? Furni? ni?
    – Larme
    Nov 8 at 10:16








  • 1




    Try let res = s.replacingOccurrences(of: ".*<em>(\S*?)</em>(\S*).*", with: "$1$2", options: .regularExpression)
    – Wiktor Stribiżew
    Nov 8 at 20:49










  • @Larme it would always be in this format <em>Furni</em>shed <em>balc</em>ony <em>gard</em>en
    – Taimur Ajmal
    Nov 10 at 21:46








1




1




Is that <em> tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51




Is that <em> tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51












Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53




Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53




3




3




What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Fur<em>ni</em>shed. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16






What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Fur<em>ni</em>shed. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16






1




1




Try let res = s.replacingOccurrences(of: ".*<em>(\S*?)</em>(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49




Try let res = s.replacingOccurrences(of: ".*<em>(\S*?)</em>(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49












@Larme it would always be in this format <em>Furni</em>shed <em>balc</em>ony <em>gard</em>en
– Taimur Ajmal
Nov 10 at 21:46




@Larme it would always be in this format <em>Furni</em>shed <em>balc</em>ony <em>gard</em>en
– Taimur Ajmal
Nov 10 at 21:46












6 Answers
6






active

oldest

votes

















up vote
2
down vote













I guess you want to remove the tags.



If the backslash is only virtual the pattern is pretty simple: Basically <em> with optional slash /?



let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)


Considering also the backslash it's



let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)




If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.



let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"
let pattern = "<em>(.*)<\\?/em>(\S+)"
do {
let regex = try NSRegularExpression(pattern: pattern)
if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
let part1 = string[Range(match.range(at: 1), in: string)!]
let part2 = string[Range(match.range(at: 2), in: string)!]
print(String(part1 + part2))
}
} catch { print(error) }





share|improve this answer























  • I dont want to remove the tags only - I want to extra word between em
    – Taimur Ajmal
    Nov 10 at 21:55












  • This won't catch valid html variations like <em > (space before the closing angular bracket).
    – Cristik
    Nov 14 at 6:36


















up vote
1
down vote













Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:



let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"
let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }


For full words, e.g [Furnished, smashed]:



let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }





share|improve this answer























  • This won't catch valid html variations like <em > (space before the closing angular bracket).
    – Cristik
    Nov 14 at 6:38










  • Not a requirement + you can always do preprocessing.
    – Luzo
    Nov 14 at 19:34


















up vote
1
down vote













Regex:



If you want to achieve that by regex, you can use Valexa's answer:



public extension String {
public func capturedGroups(withRegex pattern: String) -> [String] {
var results = [String]()

var regex: NSRegularExpression
do {
regex = try NSRegularExpression(pattern: pattern, options: )
} catch {
return results
}
let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))

guard let match = matches.first else { return results }

let lastRangeIndex = match.numberOfRanges - 1
guard lastRangeIndex >= 1 else { return results }

for i in 1...lastRangeIndex {
let capturedGroupIndex = match.range(at: i)
let matchedString = (self as NSString).substring(with: capturedGroupIndex)
results.append(matchedString)
}

return results
}
}


like this:



let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))


result:




["Furni"]




NSAttributedString:



If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:



extension String {
var attributedStringAsHTML: NSAttributedString? {
do{
return try NSAttributedString(data: Data(utf8),
options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue],
documentAttributes: nil)
}
catch {
print("error: ", error)
return nil
}
}

}

func getTextSections(_ text:String) -> [String] {
guard let attributedText = text.attributedStringAsHTML else {
return
}
var sections:[String] =
let range = NSMakeRange(0, attributedText.length)

// we don't need to enumerate any special attribute here,
// but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
let attribute: NSAttributedString.Key = .init(rawValue: "")

attributedText.enumerateAttribute(attribute,
in: range,
options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in

let text = attributedText.attributedSubstring(from: range).string
sections.append(text)
}
return sections
}

let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
print(getTextSections(text))


result:




["Fully ", "Furni", "shed |Downtown and Canal Views"]







share|improve this answer




























    up vote
    1
    down vote













    Given this string:



    let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"


    and the corresponding NSRange:



    let range = NSRange(location: 0, length: (str as NSString).length)


    Let's construct a regular expression that would match letters between <em> and </em>, or preceded by </em>



    let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")


    What it does is :




    • look for 1 or more letters: \w+,

    • that are preceded by <em>: (?<=<em>) (positive lookbehind),

    • and followed by </em>: (?=<\\/em>) (positive lookahead),

    • or : |

    • letters: \w+,

    • that are preceded by </em>: (?=<\\/em>) (positive lookbehind)


    Let's get the matches:



    let matches = regex.matches(in: str, range: range)


    Which we can turn into substrings:



    let strings: [String] = matches.map { match in
    let start = str.index(str.startIndex, offsetBy: match.range.location)
    let end = str.index(start, offsetBy: match.range.length)
    return String(str[start..<end])
    }


    Now we can join the strings in even indices, with the ones in odd indices:



    let evenStride = stride(from: strings.startIndex,
    to: strings.index(strings.endIndex, offsetBy: -1),
    by: 2)
    let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}

    print(result) //["Furnished"]




    We can test it with another string:



    let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"


    the result would be:



    ["Furnished", "balcony", "garden"]





    share|improve this answer























    • This won't catch valid html variations like <em > (space before the closing angular bracket).
      – Cristik
      Nov 14 at 6:36










    • @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
      – Carpsen90
      Nov 14 at 8:23












    • @TaimurAjmal Does my answer work for you?
      – Carpsen90
      Nov 16 at 9:30


















    up vote
    0
    down vote













    Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):



    <?php

    $in = "Fully <em>Furni</em>shed |Downtown and Canal Views";

    $m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);

    $s = $t[2] . $t[3];

    echo $s;


    Output:



    ZC-MGMT-04:~ jv$ php -q regex.php
    Furnished


    Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward






    share|improve this answer




























      up vote
      0
      down vote













      If you just want to extract the text between <em> and </em> (note this is not normal HTML tags as then it would have been <em> and </em>) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,



      <em>(.*?)<\/em>


      OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,



      <s*ems*>(.*?)<s*\/ems*>


      And replace it with 1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.



      Here is the demo



      Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.






      share|improve this answer























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














         

        draft saved


        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205163%2fswift-regex-to-extract-value%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        2
        down vote













        I guess you want to remove the tags.



        If the backslash is only virtual the pattern is pretty simple: Basically <em> with optional slash /?



        let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)


        Considering also the backslash it's



        let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)




        If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.



        let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"
        let pattern = "<em>(.*)<\\?/em>(\S+)"
        do {
        let regex = try NSRegularExpression(pattern: pattern)
        if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
        let part1 = string[Range(match.range(at: 1), in: string)!]
        let part2 = string[Range(match.range(at: 2), in: string)!]
        print(String(part1 + part2))
        }
        } catch { print(error) }





        share|improve this answer























        • I dont want to remove the tags only - I want to extra word between em
          – Taimur Ajmal
          Nov 10 at 21:55












        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:36















        up vote
        2
        down vote













        I guess you want to remove the tags.



        If the backslash is only virtual the pattern is pretty simple: Basically <em> with optional slash /?



        let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)


        Considering also the backslash it's



        let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)




        If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.



        let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"
        let pattern = "<em>(.*)<\\?/em>(\S+)"
        do {
        let regex = try NSRegularExpression(pattern: pattern)
        if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
        let part1 = string[Range(match.range(at: 1), in: string)!]
        let part2 = string[Range(match.range(at: 2), in: string)!]
        print(String(part1 + part2))
        }
        } catch { print(error) }





        share|improve this answer























        • I dont want to remove the tags only - I want to extra word between em
          – Taimur Ajmal
          Nov 10 at 21:55












        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:36













        up vote
        2
        down vote










        up vote
        2
        down vote









        I guess you want to remove the tags.



        If the backslash is only virtual the pattern is pretty simple: Basically <em> with optional slash /?



        let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)


        Considering also the backslash it's



        let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)




        If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.



        let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"
        let pattern = "<em>(.*)<\\?/em>(\S+)"
        do {
        let regex = try NSRegularExpression(pattern: pattern)
        if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
        let part1 = string[Range(match.range(at: 1), in: string)!]
        let part2 = string[Range(match.range(at: 2), in: string)!]
        print(String(part1 + part2))
        }
        } catch { print(error) }





        share|improve this answer














        I guess you want to remove the tags.



        If the backslash is only virtual the pattern is pretty simple: Basically <em> with optional slash /?



        let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)


        Considering also the backslash it's



        let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)




        If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.



        let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"
        let pattern = "<em>(.*)<\\?/em>(\S+)"
        do {
        let regex = try NSRegularExpression(pattern: pattern)
        if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {
        let part1 = string[Range(match.range(at: 1), in: string)!]
        let part2 = string[Range(match.range(at: 2), in: string)!]
        print(String(part1 + part2))
        }
        } catch { print(error) }






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 10 at 22:27

























        answered Nov 8 at 10:01









        vadian

        138k13144165




        138k13144165












        • I dont want to remove the tags only - I want to extra word between em
          – Taimur Ajmal
          Nov 10 at 21:55












        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:36


















        • I dont want to remove the tags only - I want to extra word between em
          – Taimur Ajmal
          Nov 10 at 21:55












        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:36
















        I dont want to remove the tags only - I want to extra word between em
        – Taimur Ajmal
        Nov 10 at 21:55






        I dont want to remove the tags only - I want to extra word between em
        – Taimur Ajmal
        Nov 10 at 21:55














        This won't catch valid html variations like <em > (space before the closing angular bracket).
        – Cristik
        Nov 14 at 6:36




        This won't catch valid html variations like <em > (space before the closing angular bracket).
        – Cristik
        Nov 14 at 6:36












        up vote
        1
        down vote













        Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:



        let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"
        let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }


        For full words, e.g [Furnished, smashed]:



        let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }





        share|improve this answer























        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:38










        • Not a requirement + you can always do preprocessing.
          – Luzo
          Nov 14 at 19:34















        up vote
        1
        down vote













        Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:



        let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"
        let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }


        For full words, e.g [Furnished, smashed]:



        let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }





        share|improve this answer























        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:38










        • Not a requirement + you can always do preprocessing.
          – Luzo
          Nov 14 at 19:34













        up vote
        1
        down vote










        up vote
        1
        down vote









        Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:



        let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"
        let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }


        For full words, e.g [Furnished, smashed]:



        let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }





        share|improve this answer














        Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:



        let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"
        let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }


        For full words, e.g [Furnished, smashed]:



        let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 10 at 22:59

























        answered Nov 10 at 22:42









        Luzo

        994514




        994514












        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:38










        • Not a requirement + you can always do preprocessing.
          – Luzo
          Nov 14 at 19:34


















        • This won't catch valid html variations like <em > (space before the closing angular bracket).
          – Cristik
          Nov 14 at 6:38










        • Not a requirement + you can always do preprocessing.
          – Luzo
          Nov 14 at 19:34
















        This won't catch valid html variations like <em > (space before the closing angular bracket).
        – Cristik
        Nov 14 at 6:38




        This won't catch valid html variations like <em > (space before the closing angular bracket).
        – Cristik
        Nov 14 at 6:38












        Not a requirement + you can always do preprocessing.
        – Luzo
        Nov 14 at 19:34




        Not a requirement + you can always do preprocessing.
        – Luzo
        Nov 14 at 19:34










        up vote
        1
        down vote













        Regex:



        If you want to achieve that by regex, you can use Valexa's answer:



        public extension String {
        public func capturedGroups(withRegex pattern: String) -> [String] {
        var results = [String]()

        var regex: NSRegularExpression
        do {
        regex = try NSRegularExpression(pattern: pattern, options: )
        } catch {
        return results
        }
        let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))

        guard let match = matches.first else { return results }

        let lastRangeIndex = match.numberOfRanges - 1
        guard lastRangeIndex >= 1 else { return results }

        for i in 1...lastRangeIndex {
        let capturedGroupIndex = match.range(at: i)
        let matchedString = (self as NSString).substring(with: capturedGroupIndex)
        results.append(matchedString)
        }

        return results
        }
        }


        like this:



        let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
        print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))


        result:




        ["Furni"]




        NSAttributedString:



        If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:



        extension String {
        var attributedStringAsHTML: NSAttributedString? {
        do{
        return try NSAttributedString(data: Data(utf8),
        options: [
        .documentType: NSAttributedString.DocumentType.html,
        .characterEncoding: String.Encoding.utf8.rawValue],
        documentAttributes: nil)
        }
        catch {
        print("error: ", error)
        return nil
        }
        }

        }

        func getTextSections(_ text:String) -> [String] {
        guard let attributedText = text.attributedStringAsHTML else {
        return
        }
        var sections:[String] =
        let range = NSMakeRange(0, attributedText.length)

        // we don't need to enumerate any special attribute here,
        // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
        let attribute: NSAttributedString.Key = .init(rawValue: "")

        attributedText.enumerateAttribute(attribute,
        in: range,
        options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in

        let text = attributedText.attributedSubstring(from: range).string
        sections.append(text)
        }
        return sections
        }

        let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
        print(getTextSections(text))


        result:




        ["Fully ", "Furni", "shed |Downtown and Canal Views"]







        share|improve this answer

























          up vote
          1
          down vote













          Regex:



          If you want to achieve that by regex, you can use Valexa's answer:



          public extension String {
          public func capturedGroups(withRegex pattern: String) -> [String] {
          var results = [String]()

          var regex: NSRegularExpression
          do {
          regex = try NSRegularExpression(pattern: pattern, options: )
          } catch {
          return results
          }
          let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))

          guard let match = matches.first else { return results }

          let lastRangeIndex = match.numberOfRanges - 1
          guard lastRangeIndex >= 1 else { return results }

          for i in 1...lastRangeIndex {
          let capturedGroupIndex = match.range(at: i)
          let matchedString = (self as NSString).substring(with: capturedGroupIndex)
          results.append(matchedString)
          }

          return results
          }
          }


          like this:



          let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
          print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))


          result:




          ["Furni"]




          NSAttributedString:



          If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:



          extension String {
          var attributedStringAsHTML: NSAttributedString? {
          do{
          return try NSAttributedString(data: Data(utf8),
          options: [
          .documentType: NSAttributedString.DocumentType.html,
          .characterEncoding: String.Encoding.utf8.rawValue],
          documentAttributes: nil)
          }
          catch {
          print("error: ", error)
          return nil
          }
          }

          }

          func getTextSections(_ text:String) -> [String] {
          guard let attributedText = text.attributedStringAsHTML else {
          return
          }
          var sections:[String] =
          let range = NSMakeRange(0, attributedText.length)

          // we don't need to enumerate any special attribute here,
          // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
          let attribute: NSAttributedString.Key = .init(rawValue: "")

          attributedText.enumerateAttribute(attribute,
          in: range,
          options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in

          let text = attributedText.attributedSubstring(from: range).string
          sections.append(text)
          }
          return sections
          }

          let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
          print(getTextSections(text))


          result:




          ["Fully ", "Furni", "shed |Downtown and Canal Views"]







          share|improve this answer























            up vote
            1
            down vote










            up vote
            1
            down vote









            Regex:



            If you want to achieve that by regex, you can use Valexa's answer:



            public extension String {
            public func capturedGroups(withRegex pattern: String) -> [String] {
            var results = [String]()

            var regex: NSRegularExpression
            do {
            regex = try NSRegularExpression(pattern: pattern, options: )
            } catch {
            return results
            }
            let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))

            guard let match = matches.first else { return results }

            let lastRangeIndex = match.numberOfRanges - 1
            guard lastRangeIndex >= 1 else { return results }

            for i in 1...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            let matchedString = (self as NSString).substring(with: capturedGroupIndex)
            results.append(matchedString)
            }

            return results
            }
            }


            like this:



            let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
            print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))


            result:




            ["Furni"]




            NSAttributedString:



            If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:



            extension String {
            var attributedStringAsHTML: NSAttributedString? {
            do{
            return try NSAttributedString(data: Data(utf8),
            options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue],
            documentAttributes: nil)
            }
            catch {
            print("error: ", error)
            return nil
            }
            }

            }

            func getTextSections(_ text:String) -> [String] {
            guard let attributedText = text.attributedStringAsHTML else {
            return
            }
            var sections:[String] =
            let range = NSMakeRange(0, attributedText.length)

            // we don't need to enumerate any special attribute here,
            // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
            let attribute: NSAttributedString.Key = .init(rawValue: "")

            attributedText.enumerateAttribute(attribute,
            in: range,
            options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in

            let text = attributedText.attributedSubstring(from: range).string
            sections.append(text)
            }
            return sections
            }

            let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
            print(getTextSections(text))


            result:




            ["Fully ", "Furni", "shed |Downtown and Canal Views"]







            share|improve this answer












            Regex:



            If you want to achieve that by regex, you can use Valexa's answer:



            public extension String {
            public func capturedGroups(withRegex pattern: String) -> [String] {
            var results = [String]()

            var regex: NSRegularExpression
            do {
            regex = try NSRegularExpression(pattern: pattern, options: )
            } catch {
            return results
            }
            let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))

            guard let match = matches.first else { return results }

            let lastRangeIndex = match.numberOfRanges - 1
            guard lastRangeIndex >= 1 else { return results }

            for i in 1...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            let matchedString = (self as NSString).substring(with: capturedGroupIndex)
            results.append(matchedString)
            }

            return results
            }
            }


            like this:



            let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
            print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))


            result:




            ["Furni"]




            NSAttributedString:



            If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:



            extension String {
            var attributedStringAsHTML: NSAttributedString? {
            do{
            return try NSAttributedString(data: Data(utf8),
            options: [
            .documentType: NSAttributedString.DocumentType.html,
            .characterEncoding: String.Encoding.utf8.rawValue],
            documentAttributes: nil)
            }
            catch {
            print("error: ", error)
            return nil
            }
            }

            }

            func getTextSections(_ text:String) -> [String] {
            guard let attributedText = text.attributedStringAsHTML else {
            return
            }
            var sections:[String] =
            let range = NSMakeRange(0, attributedText.length)

            // we don't need to enumerate any special attribute here,
            // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead
            let attribute: NSAttributedString.Key = .init(rawValue: "")

            attributedText.enumerateAttribute(attribute,
            in: range,
            options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in

            let text = attributedText.attributedSubstring(from: range).string
            sections.append(text)
            }
            return sections
            }

            let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"
            print(getTextSections(text))


            result:




            ["Fully ", "Furni", "shed |Downtown and Canal Views"]








            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 11 at 0:29









            Amir Khorsandi

            8051917




            8051917






















                up vote
                1
                down vote













                Given this string:



                let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"


                and the corresponding NSRange:



                let range = NSRange(location: 0, length: (str as NSString).length)


                Let's construct a regular expression that would match letters between <em> and </em>, or preceded by </em>



                let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")


                What it does is :




                • look for 1 or more letters: \w+,

                • that are preceded by <em>: (?<=<em>) (positive lookbehind),

                • and followed by </em>: (?=<\\/em>) (positive lookahead),

                • or : |

                • letters: \w+,

                • that are preceded by </em>: (?=<\\/em>) (positive lookbehind)


                Let's get the matches:



                let matches = regex.matches(in: str, range: range)


                Which we can turn into substrings:



                let strings: [String] = matches.map { match in
                let start = str.index(str.startIndex, offsetBy: match.range.location)
                let end = str.index(start, offsetBy: match.range.length)
                return String(str[start..<end])
                }


                Now we can join the strings in even indices, with the ones in odd indices:



                let evenStride = stride(from: strings.startIndex,
                to: strings.index(strings.endIndex, offsetBy: -1),
                by: 2)
                let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}

                print(result) //["Furnished"]




                We can test it with another string:



                let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"


                the result would be:



                ["Furnished", "balcony", "garden"]





                share|improve this answer























                • This won't catch valid html variations like <em > (space before the closing angular bracket).
                  – Cristik
                  Nov 14 at 6:36










                • @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
                  – Carpsen90
                  Nov 14 at 8:23












                • @TaimurAjmal Does my answer work for you?
                  – Carpsen90
                  Nov 16 at 9:30















                up vote
                1
                down vote













                Given this string:



                let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"


                and the corresponding NSRange:



                let range = NSRange(location: 0, length: (str as NSString).length)


                Let's construct a regular expression that would match letters between <em> and </em>, or preceded by </em>



                let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")


                What it does is :




                • look for 1 or more letters: \w+,

                • that are preceded by <em>: (?<=<em>) (positive lookbehind),

                • and followed by </em>: (?=<\\/em>) (positive lookahead),

                • or : |

                • letters: \w+,

                • that are preceded by </em>: (?=<\\/em>) (positive lookbehind)


                Let's get the matches:



                let matches = regex.matches(in: str, range: range)


                Which we can turn into substrings:



                let strings: [String] = matches.map { match in
                let start = str.index(str.startIndex, offsetBy: match.range.location)
                let end = str.index(start, offsetBy: match.range.length)
                return String(str[start..<end])
                }


                Now we can join the strings in even indices, with the ones in odd indices:



                let evenStride = stride(from: strings.startIndex,
                to: strings.index(strings.endIndex, offsetBy: -1),
                by: 2)
                let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}

                print(result) //["Furnished"]




                We can test it with another string:



                let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"


                the result would be:



                ["Furnished", "balcony", "garden"]





                share|improve this answer























                • This won't catch valid html variations like <em > (space before the closing angular bracket).
                  – Cristik
                  Nov 14 at 6:36










                • @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
                  – Carpsen90
                  Nov 14 at 8:23












                • @TaimurAjmal Does my answer work for you?
                  – Carpsen90
                  Nov 16 at 9:30













                up vote
                1
                down vote










                up vote
                1
                down vote









                Given this string:



                let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"


                and the corresponding NSRange:



                let range = NSRange(location: 0, length: (str as NSString).length)


                Let's construct a regular expression that would match letters between <em> and </em>, or preceded by </em>



                let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")


                What it does is :




                • look for 1 or more letters: \w+,

                • that are preceded by <em>: (?<=<em>) (positive lookbehind),

                • and followed by </em>: (?=<\\/em>) (positive lookahead),

                • or : |

                • letters: \w+,

                • that are preceded by </em>: (?=<\\/em>) (positive lookbehind)


                Let's get the matches:



                let matches = regex.matches(in: str, range: range)


                Which we can turn into substrings:



                let strings: [String] = matches.map { match in
                let start = str.index(str.startIndex, offsetBy: match.range.location)
                let end = str.index(start, offsetBy: match.range.length)
                return String(str[start..<end])
                }


                Now we can join the strings in even indices, with the ones in odd indices:



                let evenStride = stride(from: strings.startIndex,
                to: strings.index(strings.endIndex, offsetBy: -1),
                by: 2)
                let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}

                print(result) //["Furnished"]




                We can test it with another string:



                let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"


                the result would be:



                ["Furnished", "balcony", "garden"]





                share|improve this answer














                Given this string:



                let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"


                and the corresponding NSRange:



                let range = NSRange(location: 0, length: (str as NSString).length)


                Let's construct a regular expression that would match letters between <em> and </em>, or preceded by </em>



                let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")


                What it does is :




                • look for 1 or more letters: \w+,

                • that are preceded by <em>: (?<=<em>) (positive lookbehind),

                • and followed by </em>: (?=<\\/em>) (positive lookahead),

                • or : |

                • letters: \w+,

                • that are preceded by </em>: (?=<\\/em>) (positive lookbehind)


                Let's get the matches:



                let matches = regex.matches(in: str, range: range)


                Which we can turn into substrings:



                let strings: [String] = matches.map { match in
                let start = str.index(str.startIndex, offsetBy: match.range.location)
                let end = str.index(start, offsetBy: match.range.length)
                return String(str[start..<end])
                }


                Now we can join the strings in even indices, with the ones in odd indices:



                let evenStride = stride(from: strings.startIndex,
                to: strings.index(strings.endIndex, offsetBy: -1),
                by: 2)
                let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}

                print(result) //["Furnished"]




                We can test it with another string:



                let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"


                the result would be:



                ["Furnished", "balcony", "garden"]






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 14 at 20:47

























                answered Nov 12 at 21:46









                Carpsen90

                6,49062557




                6,49062557












                • This won't catch valid html variations like <em > (space before the closing angular bracket).
                  – Cristik
                  Nov 14 at 6:36










                • @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
                  – Carpsen90
                  Nov 14 at 8:23












                • @TaimurAjmal Does my answer work for you?
                  – Carpsen90
                  Nov 16 at 9:30


















                • This won't catch valid html variations like <em > (space before the closing angular bracket).
                  – Cristik
                  Nov 14 at 6:36










                • @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
                  – Carpsen90
                  Nov 14 at 8:23












                • @TaimurAjmal Does my answer work for you?
                  – Carpsen90
                  Nov 16 at 9:30
















                This won't catch valid html variations like <em > (space before the closing angular bracket).
                – Cristik
                Nov 14 at 6:36




                This won't catch valid html variations like <em > (space before the closing angular bracket).
                – Cristik
                Nov 14 at 6:36












                @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
                – Carpsen90
                Nov 14 at 8:23






                @Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
                – Carpsen90
                Nov 14 at 8:23














                @TaimurAjmal Does my answer work for you?
                – Carpsen90
                Nov 16 at 9:30




                @TaimurAjmal Does my answer work for you?
                – Carpsen90
                Nov 16 at 9:30










                up vote
                0
                down vote













                Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):



                <?php

                $in = "Fully <em>Furni</em>shed |Downtown and Canal Views";

                $m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);

                $s = $t[2] . $t[3];

                echo $s;


                Output:



                ZC-MGMT-04:~ jv$ php -q regex.php
                Furnished


                Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward






                share|improve this answer

























                  up vote
                  0
                  down vote













                  Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):



                  <?php

                  $in = "Fully <em>Furni</em>shed |Downtown and Canal Views";

                  $m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);

                  $s = $t[2] . $t[3];

                  echo $s;


                  Output:



                  ZC-MGMT-04:~ jv$ php -q regex.php
                  Furnished


                  Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward






                  share|improve this answer























                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):



                    <?php

                    $in = "Fully <em>Furni</em>shed |Downtown and Canal Views";

                    $m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);

                    $s = $t[2] . $t[3];

                    echo $s;


                    Output:



                    ZC-MGMT-04:~ jv$ php -q regex.php
                    Furnished


                    Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward






                    share|improve this answer












                    Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):



                    <?php

                    $in = "Fully <em>Furni</em>shed |Downtown and Canal Views";

                    $m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);

                    $s = $t[2] . $t[3];

                    echo $s;


                    Output:



                    ZC-MGMT-04:~ jv$ php -q regex.php
                    Furnished


                    Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 13 at 21:40









                    jancha

                    4,23911533




                    4,23911533






















                        up vote
                        0
                        down vote













                        If you just want to extract the text between <em> and </em> (note this is not normal HTML tags as then it would have been <em> and </em>) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,



                        <em>(.*?)<\/em>


                        OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,



                        <s*ems*>(.*?)<s*\/ems*>


                        And replace it with 1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.



                        Here is the demo



                        Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.






                        share|improve this answer



























                          up vote
                          0
                          down vote













                          If you just want to extract the text between <em> and </em> (note this is not normal HTML tags as then it would have been <em> and </em>) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,



                          <em>(.*?)<\/em>


                          OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,



                          <s*ems*>(.*?)<s*\/ems*>


                          And replace it with 1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.



                          Here is the demo



                          Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.






                          share|improve this answer

























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            If you just want to extract the text between <em> and </em> (note this is not normal HTML tags as then it would have been <em> and </em>) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,



                            <em>(.*?)<\/em>


                            OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,



                            <s*ems*>(.*?)<s*\/ems*>


                            And replace it with 1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.



                            Here is the demo



                            Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.






                            share|improve this answer














                            If you just want to extract the text between <em> and </em> (note this is not normal HTML tags as then it would have been <em> and </em>) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,



                            <em>(.*?)<\/em>


                            OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,



                            <s*ems*>(.*?)<s*\/ems*>


                            And replace it with 1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.



                            Here is the demo



                            Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 17 at 15:37

























                            answered Nov 17 at 15:24









                            Pushpesh Kumar Rajwanshi

                            2,7231821




                            2,7231821






























                                 

                                draft saved


                                draft discarded



















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205163%2fswift-regex-to-extract-value%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Full-time equivalent

                                さくらももこ

                                13 indicted, 8 arrested in Calif. drug cartel investigation