Swift - Regex to extract value

up vote
-3
down vote

favorite

I want to extract value from a string which has unique starting and ending character. In my case its em

"Fully <em>Furni</em>shed |Downtown and Canal Views",

result

Furnished

edited Nov 10 at 21:05

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

1

Is that  tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51

Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53

3

What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Furnished. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16

1

Try let res = s.replacingOccurrences(of: ".*(\S*?)(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49

@Larme it would always be in this format Furnished balcony garden
– Taimur Ajmal
Nov 10 at 21:46

add a comment |

up vote
-3
down vote

favorite

I want to extract value from a string which has unique starting and ending character. In my case its em

"Fully <em>Furni</em>shed |Downtown and Canal Views",

result

Furnished

edited Nov 10 at 21:05

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

1

Is that  tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51

Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53

3

What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Furnished. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16

1

Try let res = s.replacingOccurrences(of: ".*(\S*?)(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49

@Larme it would always be in this format Furnished balcony garden
– Taimur Ajmal
Nov 10 at 21:46

add a comment |

up vote
-3
down vote

favorite

I want to extract value from a string which has unique starting and ending character. In my case its em

"Fully <em>Furni</em>shed |Downtown and Canal Views",

result

Furnished

edited Nov 10 at 21:05

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

I want to extract value from a string which has unique starting and ending character. In my case its em

"Fully <em>Furni</em>shed |Downtown and Canal Views",

result

Furnished

ios swift regex

edited Nov 10 at 21:05

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

edited Nov 10 at 21:05

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

edited Nov 10 at 21:05

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

asked Nov 8 at 9:49

Taimur Ajmal

1,61052947

1

Is that  tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51

Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53

3

What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Furnished. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16

1

Try let res = s.replacingOccurrences(of: ".*(\S*?)(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49

@Larme it would always be in this format Furnished balcony garden
– Taimur Ajmal
Nov 10 at 21:46

add a comment |

1

Is that  tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51

Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53

3

What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Furnished. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16

1

Try let res = s.replacingOccurrences(of: ".*(\S*?)(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49

@Larme it would always be in this format Furnished balcony garden
– Taimur Ajmal
Nov 10 at 21:46

Is that  tag the only HTML content in your string? In general you should not be parsing HTML using regex.
– Tim Biegeleisen
Nov 8 at 9:51

Yeah thats the only tag. Thats my perception to use regex to extract it since its a pattern. Is there a better way
– Taimur Ajmal
Nov 8 at 9:53

What's the logic? I would have expected result: Furni (the only text embraced by the em tag). But you seem to expect Furnished. So what if it's Furnished. Do you expect nished? Furnished? Furni? ni?
– Larme
Nov 8 at 10:16

Try let res = s.replacingOccurrences(of: ".*(\S*?)(\S*).*", with: "$1$2", options: .regularExpression)
– Wiktor Stribiżew
Nov 8 at 20:49

@Larme it would always be in this format Furnished balcony garden
– Taimur Ajmal
Nov 10 at 21:46

add a comment |

6 Answers
6

active

oldest

votes

up vote
2
down vote

I guess you want to remove the tags.

If the backslash is only virtual the pattern is pretty simple: Basically  with optional slash /?

let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)

Considering also the backslash it's

let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)

If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.

let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

let pattern = "<em>(.*)<\\?/em>(\S+)"

do {

    let regex = try NSRegularExpression(pattern: pattern)

    if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {

        let part1 = string[Range(match.range(at: 1), in: string)!]

        let part2 = string[Range(match.range(at: 2), in: string)!]

        print(String(part1 + part2))

    }

} catch { print(error) }

edited Nov 10 at 22:27

answered Nov 8 at 10:01

vadian

138k13144165

I dont want to remove the tags only - I want to extra word between em
– Taimur Ajmal
Nov 10 at 21:55

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

add a comment |

up vote
1
down vote

Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:

let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"

let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }

For full words, e.g [Furnished, smashed]:

let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }

edited Nov 10 at 22:59

answered Nov 10 at 22:42

Luzo

994514

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:38

Not a requirement + you can always do preprocessing.
– Luzo
Nov 14 at 19:34

add a comment |

up vote
1
down vote

Regex:

If you want to achieve that by regex, you can use Valexa's answer:

public extension String {

    public func capturedGroups(withRegex pattern: String) -> [String] {

        var results = [String]()



        var regex: NSRegularExpression

        do {

            regex = try NSRegularExpression(pattern: pattern, options: )

        } catch {

            return results

        }

        let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))



        guard let match = matches.first else { return results }



        let lastRangeIndex = match.numberOfRanges - 1

        guard lastRangeIndex >= 1 else { return results }



        for i in 1...lastRangeIndex {

            let capturedGroupIndex = match.range(at: i)

            let matchedString = (self as NSString).substring(with: capturedGroupIndex)

            results.append(matchedString)

        }



        return results

    }

}

like this:

let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))

result:

["Furni"]

NSAttributedString:

If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:

extension String {

    var attributedStringAsHTML: NSAttributedString? {

        do{

            return try NSAttributedString(data: Data(utf8),

                                          options: [

                                            .documentType: NSAttributedString.DocumentType.html,

                                            .characterEncoding: String.Encoding.utf8.rawValue],

                                          documentAttributes: nil)

        }

        catch {

            print("error: ", error)

            return nil

        }

    }



}



func getTextSections(_ text:String) -> [String] {

    guard let attributedText = text.attributedStringAsHTML else {

        return 

    }

    var sections:[String] = 

    let range = NSMakeRange(0, attributedText.length)



    // we don't need to enumerate any special attribute here,

    // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead

    let attribute: NSAttributedString.Key = .init(rawValue: "")



    attributedText.enumerateAttribute(attribute,

                                      in: range,

                                      options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in



                                        let text = attributedText.attributedSubstring(from: range).string

                                        sections.append(text)

    }

    return sections

}



let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(getTextSections(text))

result:

["Fully ", "Furni", "shed |Downtown and Canal Views"]

answered Nov 11 at 0:29

Amir Khorsandi

8051917

add a comment |

up vote
1
down vote

Given this string:

let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

and the corresponding NSRange:

let range = NSRange(location: 0, length: (str as NSString).length)

Let's construct a regular expression that would match letters between  and , or preceded by 

let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")

What it does is :

look for 1 or more letters: \w+,

that are preceded by : (?<=) (positive lookbehind),

and followed by : (?=<\\/em>) (positive lookahead),

or : |

letters: \w+,

that are preceded by : (?=<\\/em>) (positive lookbehind)

Let's get the matches:

let matches = regex.matches(in: str, range: range)

Which we can turn into substrings:

let strings: [String] = matches.map { match in

    let start = str.index(str.startIndex, offsetBy: match.range.location)

    let end = str.index(start, offsetBy: match.range.length)

    return String(str[start..<end])

}

Now we can join the strings in even indices, with the ones in odd indices:

let evenStride = stride(from: strings.startIndex,

               to: strings.index(strings.endIndex, offsetBy: -1),

               by: 2)

let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}



print(result)  //["Furnished"]

We can test it with another string:

let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"

the result would be:

["Furnished", "balcony", "garden"]

edited Nov 14 at 20:47

answered Nov 12 at 21:46

Carpsen90

6,49062557

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

@Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
– Carpsen90
Nov 14 at 8:23

@TaimurAjmal Does my answer work for you?
– Carpsen90
Nov 16 at 9:30

add a comment |

up vote
0
down vote

Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):

<?php



$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";



$m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);    



$s = $t[2] . $t[3];



echo $s;

Output:

ZC-MGMT-04:~ jv$ php -q regex.php

Furnished

Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward

answered Nov 13 at 21:40

jancha

4,23911533

add a comment |

up vote
0
down vote

If you just want to extract the text between  and  (note this is not normal HTML tags as then it would have been  and ) tags, we can simply capture this pattern and replace it with the group 1's value captured. And we don't need to worry about what is present around the matching text and just replace it with whatever got captured between those text which could actually be empty string, because OP hasn't mentioned any constraint for that. The regex for matching this pattern would be this,

<em>(.*?)<\/em>

OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,

<s*ems*>(.*?)<s*\/ems*>

And replace it with 1 or $1 depending upon where you are doing it. Now whether these tags contain empty string, or contains some actual string within it, doesn't really matter as shown in my demo on regex101.

Here is the demo

Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.

edited Nov 17 at 15:37

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53205163%2fswift-regex-to-extract-value%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

up vote
2
down vote

I guess you want to remove the tags.

If the backslash is only virtual the pattern is pretty simple: Basically  with optional slash /?

let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)

Considering also the backslash it's

let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)

If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.

let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

let pattern = "<em>(.*)<\\?/em>(\S+)"

do {

    let regex = try NSRegularExpression(pattern: pattern)

    if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {

        let part1 = string[Range(match.range(at: 1), in: string)!]

        let part2 = string[Range(match.range(at: 2), in: string)!]

        print(String(part1 + part2))

    }

} catch { print(error) }

edited Nov 10 at 22:27

answered Nov 8 at 10:01

vadian

138k13144165

I dont want to remove the tags only - I want to extra word between em
– Taimur Ajmal
Nov 10 at 21:55

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

add a comment |

up vote
2
down vote

I guess you want to remove the tags.

If the backslash is only virtual the pattern is pretty simple: Basically  with optional slash /?

let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)

Considering also the backslash it's

let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)

If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.

let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

let pattern = "<em>(.*)<\\?/em>(\S+)"

do {

    let regex = try NSRegularExpression(pattern: pattern)

    if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {

        let part1 = string[Range(match.range(at: 1), in: string)!]

        let part2 = string[Range(match.range(at: 2), in: string)!]

        print(String(part1 + part2))

    }

} catch { print(error) }

edited Nov 10 at 22:27

answered Nov 8 at 10:01

vadian

138k13144165

I dont want to remove the tags only - I want to extra word between em
– Taimur Ajmal
Nov 10 at 21:55

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

add a comment |

up vote
2
down vote

I guess you want to remove the tags.

If the backslash is only virtual the pattern is pretty simple: Basically  with optional slash /?

let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)

Considering also the backslash it's

let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)

If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.

let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

let pattern = "<em>(.*)<\\?/em>(\S+)"

do {

    let regex = try NSRegularExpression(pattern: pattern)

    if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {

        let part1 = string[Range(match.range(at: 1), in: string)!]

        let part2 = string[Range(match.range(at: 2), in: string)!]

        print(String(part1 + part2))

    }

} catch { print(error) }

edited Nov 10 at 22:27

answered Nov 8 at 10:01

vadian

138k13144165

I guess you want to remove the tags.

If the backslash is only virtual the pattern is pretty simple: Basically  with optional slash /?

let trimmedString = string.replacingOccurrences(of: "</?em>", with: "", options: .regularExpression)

Considering also the backslash it's

let trimmedString = string.replacingOccurrences(of: "<\\?/?em>", with: "", options: .regularExpression)

If you want to extract only Furnished you have to capture groups: The string between the tags and everything after the closing tag until the next whitespace character.

let string = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

let pattern = "<em>(.*)<\\?/em>(\S+)"

do {

    let regex = try NSRegularExpression(pattern: pattern)

    if let match = regex.firstMatch(in: string, range: NSRange(string.startIndex..., in: string)) {

        let part1 = string[Range(match.range(at: 1), in: string)!]

        let part2 = string[Range(match.range(at: 2), in: string)!]

        print(String(part1 + part2))

    }

} catch { print(error) }

edited Nov 10 at 22:27

answered Nov 8 at 10:01

vadian

138k13144165

edited Nov 10 at 22:27

answered Nov 8 at 10:01

vadian

138k13144165

answered Nov 8 at 10:01

vadian

138k13144165

answered Nov 8 at 10:01

vadian

138k13144165

I dont want to remove the tags only - I want to extra word between em
– Taimur Ajmal
Nov 10 at 21:55

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

add a comment |

I dont want to remove the tags only - I want to extra word between em
– Taimur Ajmal
Nov 10 at 21:55

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

I dont want to remove the tags only - I want to extra word between em
– Taimur Ajmal
Nov 10 at 21:55

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

add a comment |

up vote
1
down vote

Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:

let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"

let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }

For full words, e.g [Furnished, smashed]:

let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }

edited Nov 10 at 22:59

answered Nov 10 at 22:42

Luzo

994514

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:38

Not a requirement + you can always do preprocessing.
– Luzo
Nov 14 at 19:34

add a comment |

up vote
1
down vote

Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:

let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"

let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }

For full words, e.g [Furnished, smashed]:

let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }

edited Nov 10 at 22:59

answered Nov 10 at 22:42

Luzo

994514

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:38

Not a requirement + you can always do preprocessing.
– Luzo
Nov 14 at 19:34

add a comment |

up vote
1
down vote

Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:

let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"

let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }

For full words, e.g [Furnished, smashed]:

let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }

edited Nov 10 at 22:59

answered Nov 10 at 22:42

Luzo

994514

Not a regex but, for obtaining all words in tags, e.g [Furni, sma]:

let text = "Fully <em>Furni<\/em>shed <em>sma<\/em>shed |Downtown and Canal Views"

let emphasizedParts = text.components(separatedBy: "<em>").filter { $0.contains("<\/em>")}.flatMap { $0.components(separatedBy: "<\/em>").first }

For full words, e.g [Furnished, smashed]:

let emphasizedParts = text.components(separatedBy: " ").filter { $0.contains("<em>")}.map { $0.replacingOccurrences(of: "<\/em>", with: "").replacingOccurrences(of: "<em>", with: "") }

edited Nov 10 at 22:59

answered Nov 10 at 22:42

Luzo

994514

edited Nov 10 at 22:59

answered Nov 10 at 22:42

Luzo

994514

answered Nov 10 at 22:42

Luzo

994514

answered Nov 10 at 22:42

Luzo

994514

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:38

Not a requirement + you can always do preprocessing.
– Luzo
Nov 14 at 19:34

add a comment |

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:38

Not a requirement + you can always do preprocessing.
– Luzo
Nov 14 at 19:34

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:38

Not a requirement + you can always do preprocessing.
– Luzo
Nov 14 at 19:34

add a comment |

up vote
1
down vote

Regex:

If you want to achieve that by regex, you can use Valexa's answer:

public extension String {

    public func capturedGroups(withRegex pattern: String) -> [String] {

        var results = [String]()



        var regex: NSRegularExpression

        do {

            regex = try NSRegularExpression(pattern: pattern, options: )

        } catch {

            return results

        }

        let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))



        guard let match = matches.first else { return results }



        let lastRangeIndex = match.numberOfRanges - 1

        guard lastRangeIndex >= 1 else { return results }



        for i in 1...lastRangeIndex {

            let capturedGroupIndex = match.range(at: i)

            let matchedString = (self as NSString).substring(with: capturedGroupIndex)

            results.append(matchedString)

        }



        return results

    }

}

like this:

let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))

result:

["Furni"]

NSAttributedString:

If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:

extension String {

    var attributedStringAsHTML: NSAttributedString? {

        do{

            return try NSAttributedString(data: Data(utf8),

                                          options: [

                                            .documentType: NSAttributedString.DocumentType.html,

                                            .characterEncoding: String.Encoding.utf8.rawValue],

                                          documentAttributes: nil)

        }

        catch {

            print("error: ", error)

            return nil

        }

    }



}



func getTextSections(_ text:String) -> [String] {

    guard let attributedText = text.attributedStringAsHTML else {

        return 

    }

    var sections:[String] = 

    let range = NSMakeRange(0, attributedText.length)



    // we don't need to enumerate any special attribute here,

    // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead

    let attribute: NSAttributedString.Key = .init(rawValue: "")



    attributedText.enumerateAttribute(attribute,

                                      in: range,

                                      options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in



                                        let text = attributedText.attributedSubstring(from: range).string

                                        sections.append(text)

    }

    return sections

}



let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(getTextSections(text))

result:

["Fully ", "Furni", "shed |Downtown and Canal Views"]

answered Nov 11 at 0:29

Amir Khorsandi

8051917

add a comment |

up vote
1
down vote

Regex:

If you want to achieve that by regex, you can use Valexa's answer:

public extension String {

    public func capturedGroups(withRegex pattern: String) -> [String] {

        var results = [String]()



        var regex: NSRegularExpression

        do {

            regex = try NSRegularExpression(pattern: pattern, options: )

        } catch {

            return results

        }

        let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))



        guard let match = matches.first else { return results }



        let lastRangeIndex = match.numberOfRanges - 1

        guard lastRangeIndex >= 1 else { return results }



        for i in 1...lastRangeIndex {

            let capturedGroupIndex = match.range(at: i)

            let matchedString = (self as NSString).substring(with: capturedGroupIndex)

            results.append(matchedString)

        }



        return results

    }

}

like this:

let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))

result:

["Furni"]

NSAttributedString:

If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:

extension String {

    var attributedStringAsHTML: NSAttributedString? {

        do{

            return try NSAttributedString(data: Data(utf8),

                                          options: [

                                            .documentType: NSAttributedString.DocumentType.html,

                                            .characterEncoding: String.Encoding.utf8.rawValue],

                                          documentAttributes: nil)

        }

        catch {

            print("error: ", error)

            return nil

        }

    }



}



func getTextSections(_ text:String) -> [String] {

    guard let attributedText = text.attributedStringAsHTML else {

        return 

    }

    var sections:[String] = 

    let range = NSMakeRange(0, attributedText.length)



    // we don't need to enumerate any special attribute here,

    // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead

    let attribute: NSAttributedString.Key = .init(rawValue: "")



    attributedText.enumerateAttribute(attribute,

                                      in: range,

                                      options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in



                                        let text = attributedText.attributedSubstring(from: range).string

                                        sections.append(text)

    }

    return sections

}



let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(getTextSections(text))

result:

["Fully ", "Furni", "shed |Downtown and Canal Views"]

answered Nov 11 at 0:29

Amir Khorsandi

8051917

add a comment |

up vote
1
down vote

Regex:

If you want to achieve that by regex, you can use Valexa's answer:

public extension String {

    public func capturedGroups(withRegex pattern: String) -> [String] {

        var results = [String]()



        var regex: NSRegularExpression

        do {

            regex = try NSRegularExpression(pattern: pattern, options: )

        } catch {

            return results

        }

        let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))



        guard let match = matches.first else { return results }



        let lastRangeIndex = match.numberOfRanges - 1

        guard lastRangeIndex >= 1 else { return results }



        for i in 1...lastRangeIndex {

            let capturedGroupIndex = match.range(at: i)

            let matchedString = (self as NSString).substring(with: capturedGroupIndex)

            results.append(matchedString)

        }



        return results

    }

}

like this:

let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))

result:

["Furni"]

NSAttributedString:

If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:

extension String {

    var attributedStringAsHTML: NSAttributedString? {

        do{

            return try NSAttributedString(data: Data(utf8),

                                          options: [

                                            .documentType: NSAttributedString.DocumentType.html,

                                            .characterEncoding: String.Encoding.utf8.rawValue],

                                          documentAttributes: nil)

        }

        catch {

            print("error: ", error)

            return nil

        }

    }



}



func getTextSections(_ text:String) -> [String] {

    guard let attributedText = text.attributedStringAsHTML else {

        return 

    }

    var sections:[String] = 

    let range = NSMakeRange(0, attributedText.length)



    // we don't need to enumerate any special attribute here,

    // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead

    let attribute: NSAttributedString.Key = .init(rawValue: "")



    attributedText.enumerateAttribute(attribute,

                                      in: range,

                                      options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in



                                        let text = attributedText.attributedSubstring(from: range).string

                                        sections.append(text)

    }

    return sections

}



let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(getTextSections(text))

result:

["Fully ", "Furni", "shed |Downtown and Canal Views"]

answered Nov 11 at 0:29

Amir Khorsandi

8051917

Regex:

If you want to achieve that by regex, you can use Valexa's answer:

public extension String {

    public func capturedGroups(withRegex pattern: String) -> [String] {

        var results = [String]()



        var regex: NSRegularExpression

        do {

            regex = try NSRegularExpression(pattern: pattern, options: )

        } catch {

            return results

        }

        let matches = regex.matches(in: self, options: , range: NSRange(location:0, length: self.count))



        guard let match = matches.first else { return results }



        let lastRangeIndex = match.numberOfRanges - 1

        guard lastRangeIndex >= 1 else { return results }



        for i in 1...lastRangeIndex {

            let capturedGroupIndex = match.range(at: i)

            let matchedString = (self as NSString).substring(with: capturedGroupIndex)

            results.append(matchedString)

        }



        return results

    }

}

like this:

let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(text.capturedGroups(withRegex: "<em>([a-zA-z]+)</em>"))

result:

["Furni"]

NSAttributedString:

If you want to do some highlighting or you only need to get rid of tags or any other reason that you can't use the first solution, you can also do that using NSAttributedString:

extension String {

    var attributedStringAsHTML: NSAttributedString? {

        do{

            return try NSAttributedString(data: Data(utf8),

                                          options: [

                                            .documentType: NSAttributedString.DocumentType.html,

                                            .characterEncoding: String.Encoding.utf8.rawValue],

                                          documentAttributes: nil)

        }

        catch {

            print("error: ", error)

            return nil

        }

    }



}



func getTextSections(_ text:String) -> [String] {

    guard let attributedText = text.attributedStringAsHTML else {

        return 

    }

    var sections:[String] = 

    let range = NSMakeRange(0, attributedText.length)



    // we don't need to enumerate any special attribute here,

    // but for example, if you want to just extract links you can use `NSAttributedString.Key.link` instead

    let attribute: NSAttributedString.Key = .init(rawValue: "")



    attributedText.enumerateAttribute(attribute,

                                      in: range,

                                      options: .longestEffectiveRangeNotRequired) {attribute, range, pointer in



                                        let text = attributedText.attributedSubstring(from: range).string

                                        sections.append(text)

    }

    return sections

}



let text = "Fully <em>Furni</em>shed |Downtown and Canal Views"

print(getTextSections(text))

result:

["Fully ", "Furni", "shed |Downtown and Canal Views"]

answered Nov 11 at 0:29

Amir Khorsandi

8051917

answered Nov 11 at 0:29

Amir Khorsandi

8051917

answered Nov 11 at 0:29

Amir Khorsandi

8051917

answered Nov 11 at 0:29

Amir Khorsandi

8051917

add a comment |

up vote
1
down vote

Given this string:

let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

and the corresponding NSRange:

let range = NSRange(location: 0, length: (str as NSString).length)

Let's construct a regular expression that would match letters between  and , or preceded by 

let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")

What it does is :

look for 1 or more letters: \w+,

that are preceded by : (?<=) (positive lookbehind),

and followed by : (?=<\\/em>) (positive lookahead),

or : |

letters: \w+,

that are preceded by : (?=<\\/em>) (positive lookbehind)

Let's get the matches:

let matches = regex.matches(in: str, range: range)

Which we can turn into substrings:

let strings: [String] = matches.map { match in

    let start = str.index(str.startIndex, offsetBy: match.range.location)

    let end = str.index(start, offsetBy: match.range.length)

    return String(str[start..<end])

}

Now we can join the strings in even indices, with the ones in odd indices:

let evenStride = stride(from: strings.startIndex,

               to: strings.index(strings.endIndex, offsetBy: -1),

               by: 2)

let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}



print(result)  //["Furnished"]

We can test it with another string:

let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"

the result would be:

["Furnished", "balcony", "garden"]

edited Nov 14 at 20:47

answered Nov 12 at 21:46

Carpsen90

6,49062557

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

@Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
– Carpsen90
Nov 14 at 8:23

@TaimurAjmal Does my answer work for you?
– Carpsen90
Nov 16 at 9:30

add a comment |

up vote
1
down vote

Given this string:

let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

and the corresponding NSRange:

let range = NSRange(location: 0, length: (str as NSString).length)

Let's construct a regular expression that would match letters between  and , or preceded by 

let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")

What it does is :

look for 1 or more letters: \w+,

that are preceded by : (?<=) (positive lookbehind),

and followed by : (?=<\\/em>) (positive lookahead),

or : |

letters: \w+,

that are preceded by : (?=<\\/em>) (positive lookbehind)

Let's get the matches:

let matches = regex.matches(in: str, range: range)

Which we can turn into substrings:

let strings: [String] = matches.map { match in

    let start = str.index(str.startIndex, offsetBy: match.range.location)

    let end = str.index(start, offsetBy: match.range.length)

    return String(str[start..<end])

}

Now we can join the strings in even indices, with the ones in odd indices:

let evenStride = stride(from: strings.startIndex,

               to: strings.index(strings.endIndex, offsetBy: -1),

               by: 2)

let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}



print(result)  //["Furnished"]

We can test it with another string:

let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"

the result would be:

["Furnished", "balcony", "garden"]

edited Nov 14 at 20:47

answered Nov 12 at 21:46

Carpsen90

6,49062557

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

@Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
– Carpsen90
Nov 14 at 8:23

@TaimurAjmal Does my answer work for you?
– Carpsen90
Nov 16 at 9:30

add a comment |

up vote
1
down vote

Given this string:

let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

and the corresponding NSRange:

let range = NSRange(location: 0, length: (str as NSString).length)

Let's construct a regular expression that would match letters between  and , or preceded by 

let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")

What it does is :

look for 1 or more letters: \w+,

that are preceded by : (?<=) (positive lookbehind),

and followed by : (?=<\\/em>) (positive lookahead),

or : |

letters: \w+,

that are preceded by : (?=<\\/em>) (positive lookbehind)

Let's get the matches:

let matches = regex.matches(in: str, range: range)

Which we can turn into substrings:

let strings: [String] = matches.map { match in

    let start = str.index(str.startIndex, offsetBy: match.range.location)

    let end = str.index(start, offsetBy: match.range.length)

    return String(str[start..<end])

}

Now we can join the strings in even indices, with the ones in odd indices:

let evenStride = stride(from: strings.startIndex,

               to: strings.index(strings.endIndex, offsetBy: -1),

               by: 2)

let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}



print(result)  //["Furnished"]

We can test it with another string:

let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"

the result would be:

["Furnished", "balcony", "garden"]

edited Nov 14 at 20:47

answered Nov 12 at 21:46

Carpsen90

6,49062557

Given this string:

let str = "Fully <em>Furni<\/em>shed |Downtown and Canal Views"

and the corresponding NSRange:

let range = NSRange(location: 0, length: (str as NSString).length)

Let's construct a regular expression that would match letters between  and , or preceded by 

let regex = try NSRegularExpression(pattern: "(?<=<em>)\w+(?=<\\/em>)|(?<=<\\/em>)\w+")

What it does is :

look for 1 or more letters: \w+,

that are preceded by : (?<=) (positive lookbehind),

and followed by : (?=<\\/em>) (positive lookahead),

or : |

letters: \w+,

that are preceded by : (?=<\\/em>) (positive lookbehind)

Let's get the matches:

let matches = regex.matches(in: str, range: range)

Which we can turn into substrings:

let strings: [String] = matches.map { match in

    let start = str.index(str.startIndex, offsetBy: match.range.location)

    let end = str.index(start, offsetBy: match.range.length)

    return String(str[start..<end])

}

Now we can join the strings in even indices, with the ones in odd indices:

let evenStride = stride(from: strings.startIndex,

               to: strings.index(strings.endIndex, offsetBy: -1),

               by: 2)

let result = evenStride.map { strings[$0] + strings[strings.index($0, offsetBy: 1)]}



print(result)  //["Furnished"]

We can test it with another string:

let str2 = "<em>Furni<\/em>shed <em>balc<\/em>ony <em>gard<\/em>en"

the result would be:

["Furnished", "balcony", "garden"]

edited Nov 14 at 20:47

answered Nov 12 at 21:46

Carpsen90

6,49062557

edited Nov 14 at 20:47

answered Nov 12 at 21:46

Carpsen90

6,49062557

answered Nov 12 at 21:46

Carpsen90

6,49062557

answered Nov 12 at 21:46

Carpsen90

6,49062557

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

@Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
– Carpsen90
Nov 14 at 8:23

@TaimurAjmal Does my answer work for you?
– Carpsen90
Nov 16 at 9:30

add a comment |

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

@Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
– Carpsen90
Nov 14 at 8:23

@TaimurAjmal Does my answer work for you?
– Carpsen90
Nov 16 at 9:30

This won't catch valid html variations like  (space before the closing angular bracket).
– Cristik
Nov 14 at 6:36

@Cristik That’s an easy fix, the main idea would still be the same. The OP hasn’t mentioned such a requirement.
– Carpsen90
Nov 14 at 8:23

@TaimurAjmal Does my answer work for you?
– Carpsen90
Nov 16 at 9:30

add a comment |

up vote
0
down vote

Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):

<?php



$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";



$m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);    



$s = $t[2] . $t[3];



echo $s;

Output:

ZC-MGMT-04:~ jv$ php -q regex.php

Furnished

Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward

answered Nov 13 at 21:40

jancha

4,23911533

add a comment |

up vote
0
down vote

Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):

<?php



$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";



$m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);    



$s = $t[2] . $t[3];



echo $s;

Output:

ZC-MGMT-04:~ jv$ php -q regex.php

Furnished

Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward

answered Nov 13 at 21:40

jancha

4,23911533

add a comment |

up vote
0
down vote

Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):

<?php



$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";



$m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);    



$s = $t[2] . $t[3];



echo $s;

Output:

ZC-MGMT-04:~ jv$ php -q regex.php

Furnished

Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward

answered Nov 13 at 21:40

jancha

4,23911533

Here is basic implementation in PHP (yes, I know you asked Swift, but it's to demonstrate the regex part):

<?php



$in = "Fully <em>Furni</em>shed |Downtown and Canal Views";



$m = preg_match("/<([^>]+)>([^>]+)</\1>([^ ]+|$)/i", $in, $t);    



$s = $t[2] . $t[3];



echo $s;

Output:

ZC-MGMT-04:~ jv$ php -q regex.php

Furnished

Obviously, the most important bit is the regular expression part which would match any tag and find a respective closing tag and reminder afterward

answered Nov 13 at 21:40

jancha

4,23911533

answered Nov 13 at 21:40

jancha

4,23911533

answered Nov 13 at 21:40

jancha

4,23911533

answered Nov 13 at 21:40

jancha

4,23911533

add a comment |

up vote
0
down vote

<em>(.*?)<\/em>

OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,

<s*ems*>(.*?)<s*\/ems*>

Here is the demo

Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.

edited Nov 17 at 15:37

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

add a comment |

up vote
0
down vote

<em>(.*?)<\/em>

OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,

<s*ems*>(.*?)<s*\/ems*>

Here is the demo

Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.

edited Nov 17 at 15:37

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

add a comment |

up vote
0
down vote

<em>(.*?)<\/em>

OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,

<s*ems*>(.*?)<s*\/ems*>

Here is the demo

Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.

edited Nov 17 at 15:37

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

<em>(.*?)<\/em>

OR to be technically more robust in taking care of optional spaces (as I saw someone pointing out in comment's of other answers) present any where within the tags, we can use this regex,

<s*ems*>(.*?)<s*\/ems*>

Here is the demo

Let me know if this meets your requirements and further, if any of your requirement remains unsatisfied.

edited Nov 17 at 15:37

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

edited Nov 17 at 15:37

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

answered Nov 17 at 15:24

Pushpesh Kumar Rajwanshi

2,7231821

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu