Web Crawling: get shares of Youtube Video from statistics tab











up vote
1
down vote

favorite
1












Does anybody know a way to get the shares of youtube videos (not mine)? I would like to store them into a DB. It is not working with the yt api. Another problem ist that not every yt video has the statistics tab.



So far I tried the Youtube API, jsoup HTML Parser (the div showing the shares wasn't there, altough it is shown via inspect in firefox e.g) and import.io demo which was working but is definitely too expensive.



Thats what I would like to extract










share|improve this question
























  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
    – DaImTo
    Jun 20 '17 at 7:56















up vote
1
down vote

favorite
1












Does anybody know a way to get the shares of youtube videos (not mine)? I would like to store them into a DB. It is not working with the yt api. Another problem ist that not every yt video has the statistics tab.



So far I tried the Youtube API, jsoup HTML Parser (the div showing the shares wasn't there, altough it is shown via inspect in firefox e.g) and import.io demo which was working but is definitely too expensive.



Thats what I would like to extract










share|improve this question
























  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
    – DaImTo
    Jun 20 '17 at 7:56













up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





Does anybody know a way to get the shares of youtube videos (not mine)? I would like to store them into a DB. It is not working with the yt api. Another problem ist that not every yt video has the statistics tab.



So far I tried the Youtube API, jsoup HTML Parser (the div showing the shares wasn't there, altough it is shown via inspect in firefox e.g) and import.io demo which was working but is definitely too expensive.



Thats what I would like to extract










share|improve this question















Does anybody know a way to get the shares of youtube videos (not mine)? I would like to store them into a DB. It is not working with the yt api. Another problem ist that not every yt video has the statistics tab.



So far I tried the Youtube API, jsoup HTML Parser (the div showing the shares wasn't there, altough it is shown via inspect in firefox e.g) and import.io demo which was working but is definitely too expensive.



Thats what I would like to extract







web-scraping youtube youtube-api web-crawler extract






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 16:36









Bertrand Martel

16.6k134064




16.6k134064










asked Jun 20 '17 at 7:40









neodymium

377




377












  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
    – DaImTo
    Jun 20 '17 at 7:56


















  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
    – DaImTo
    Jun 20 '17 at 7:56
















Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
– DaImTo
Jun 20 '17 at 7:56




Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
– DaImTo
Jun 20 '17 at 7:56












1 Answer
1






active

oldest

votes

















up vote
4
down vote



accepted










The best way is to look at the network logs, in this case it shows a POST on :



https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id


It sends a XSRF token in the body that is available in the original html body of the video page https://www.youtube.com/watch?v=$video_id in a javascript object like :



yt.setConfig({
'XSRF_TOKEN': "QUFFLUhqbnNvZUx4THR3eV80dHlacV9tRkRxc2NwSjlXQXxBQ3Jtc0ttd0JLWENnMjdYNE5IRWhibE9ZdDJTSk1aMktxTDR5d3JjSnkzVUtQWVcwdnp3X0tSOXEtM3hZdzVFdjNPeGpPRGtLVU5pVXV0SmtfdWJSUHNqTVg2WXBndjZpa3d6U25ja2FTelBBVWRlT0lZZkRDaDV6SU94VWE3cnpERHhWNVlUYWdyRjFqN1hvc0VLRmVwcEY3ZWdJMWgyUmc=",
'XSRF_FIELD_NAME': "session_token",
'XSRF_REDIRECT_TOKEN': "VlhMkn6F56dGGYcm4Rg7jCZR0vJ8MTQ5ODA1NzIwMkAxNDk3OTcwODAy"
});


It also needs some cookies set in this same video page.



Using python



with beautifulsoup & python-requests :



import requests
from bs4 import BeautifulSoup
import re

s = requests.Session()

video_id = "CPkU0dF4JKo"

r = s.get('https://www.youtube.com/watch?v={}'.format(video_id))

xsrf_token = re.search("'XSRF_TOKEN's*:s*"(.*)"", r.text, re.IGNORECASE).group(1)

r = s.post(
'https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v={}'.format(video_id),
data = {
'session_token': xsrf_token
}
)
metrics = [
int(t.text.encode('ascii', 'ignore').split(' ', 1)[0])
for t in BeautifulSoup(r.content, "lxml").find('html_content').find("tr").findAll("div", {"class":"bragbar-metric"})
]
print(metrics)


Using bash



with curl, sed, pup & xml_grep :



The following bash script will :




  • request the video page https://www.youtube.com/watch?v=$video_id with curl

  • store the cookies in a file called cookie.txt

  • extract the XSRF_TOKEN called session_token in the following request with sed

  • request the video statistic page https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id with curl with the cookies previously stored

  • parse the xml result extract the CDATA part with xml_grep

  • parse the html with pup to extract the bragbar-metric class div and convert the html result to json with json{}

  • use sed to remove unicode character


The script :



video_id=CPkU0dF4JKo

session_token=$(curl -s -c cookie.txt "https://www.youtube.com/watch?v=$video_id" |
sed -rn "s/.*'XSRF_TOKEN's*:s*"(.*)".*/1/p")

curl -s -b cookie.txt -d "session_token=$session_token"
"https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id" |
xml_grep --text_only 'html_content' |
pup 'div table tr .bragbar-metric text{}' |
sed 's/xc2x91|xc2x92|xc2xa0|xe2x80x8e//' |
sed 's/s.*$//'


It gives number of views, time watched, subscriptions, shares:



120862
454
18
213





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44646802%2fweb-crawling-get-shares-of-youtube-video-from-statistics-tab%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    4
    down vote



    accepted










    The best way is to look at the network logs, in this case it shows a POST on :



    https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id


    It sends a XSRF token in the body that is available in the original html body of the video page https://www.youtube.com/watch?v=$video_id in a javascript object like :



    yt.setConfig({
    'XSRF_TOKEN': "QUFFLUhqbnNvZUx4THR3eV80dHlacV9tRkRxc2NwSjlXQXxBQ3Jtc0ttd0JLWENnMjdYNE5IRWhibE9ZdDJTSk1aMktxTDR5d3JjSnkzVUtQWVcwdnp3X0tSOXEtM3hZdzVFdjNPeGpPRGtLVU5pVXV0SmtfdWJSUHNqTVg2WXBndjZpa3d6U25ja2FTelBBVWRlT0lZZkRDaDV6SU94VWE3cnpERHhWNVlUYWdyRjFqN1hvc0VLRmVwcEY3ZWdJMWgyUmc=",
    'XSRF_FIELD_NAME': "session_token",
    'XSRF_REDIRECT_TOKEN': "VlhMkn6F56dGGYcm4Rg7jCZR0vJ8MTQ5ODA1NzIwMkAxNDk3OTcwODAy"
    });


    It also needs some cookies set in this same video page.



    Using python



    with beautifulsoup & python-requests :



    import requests
    from bs4 import BeautifulSoup
    import re

    s = requests.Session()

    video_id = "CPkU0dF4JKo"

    r = s.get('https://www.youtube.com/watch?v={}'.format(video_id))

    xsrf_token = re.search("'XSRF_TOKEN's*:s*"(.*)"", r.text, re.IGNORECASE).group(1)

    r = s.post(
    'https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v={}'.format(video_id),
    data = {
    'session_token': xsrf_token
    }
    )
    metrics = [
    int(t.text.encode('ascii', 'ignore').split(' ', 1)[0])
    for t in BeautifulSoup(r.content, "lxml").find('html_content').find("tr").findAll("div", {"class":"bragbar-metric"})
    ]
    print(metrics)


    Using bash



    with curl, sed, pup & xml_grep :



    The following bash script will :




    • request the video page https://www.youtube.com/watch?v=$video_id with curl

    • store the cookies in a file called cookie.txt

    • extract the XSRF_TOKEN called session_token in the following request with sed

    • request the video statistic page https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id with curl with the cookies previously stored

    • parse the xml result extract the CDATA part with xml_grep

    • parse the html with pup to extract the bragbar-metric class div and convert the html result to json with json{}

    • use sed to remove unicode character


    The script :



    video_id=CPkU0dF4JKo

    session_token=$(curl -s -c cookie.txt "https://www.youtube.com/watch?v=$video_id" |
    sed -rn "s/.*'XSRF_TOKEN's*:s*"(.*)".*/1/p")

    curl -s -b cookie.txt -d "session_token=$session_token"
    "https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id" |
    xml_grep --text_only 'html_content' |
    pup 'div table tr .bragbar-metric text{}' |
    sed 's/xc2x91|xc2x92|xc2xa0|xe2x80x8e//' |
    sed 's/s.*$//'


    It gives number of views, time watched, subscriptions, shares:



    120862
    454
    18
    213





    share|improve this answer



























      up vote
      4
      down vote



      accepted










      The best way is to look at the network logs, in this case it shows a POST on :



      https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id


      It sends a XSRF token in the body that is available in the original html body of the video page https://www.youtube.com/watch?v=$video_id in a javascript object like :



      yt.setConfig({
      'XSRF_TOKEN': "QUFFLUhqbnNvZUx4THR3eV80dHlacV9tRkRxc2NwSjlXQXxBQ3Jtc0ttd0JLWENnMjdYNE5IRWhibE9ZdDJTSk1aMktxTDR5d3JjSnkzVUtQWVcwdnp3X0tSOXEtM3hZdzVFdjNPeGpPRGtLVU5pVXV0SmtfdWJSUHNqTVg2WXBndjZpa3d6U25ja2FTelBBVWRlT0lZZkRDaDV6SU94VWE3cnpERHhWNVlUYWdyRjFqN1hvc0VLRmVwcEY3ZWdJMWgyUmc=",
      'XSRF_FIELD_NAME': "session_token",
      'XSRF_REDIRECT_TOKEN': "VlhMkn6F56dGGYcm4Rg7jCZR0vJ8MTQ5ODA1NzIwMkAxNDk3OTcwODAy"
      });


      It also needs some cookies set in this same video page.



      Using python



      with beautifulsoup & python-requests :



      import requests
      from bs4 import BeautifulSoup
      import re

      s = requests.Session()

      video_id = "CPkU0dF4JKo"

      r = s.get('https://www.youtube.com/watch?v={}'.format(video_id))

      xsrf_token = re.search("'XSRF_TOKEN's*:s*"(.*)"", r.text, re.IGNORECASE).group(1)

      r = s.post(
      'https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v={}'.format(video_id),
      data = {
      'session_token': xsrf_token
      }
      )
      metrics = [
      int(t.text.encode('ascii', 'ignore').split(' ', 1)[0])
      for t in BeautifulSoup(r.content, "lxml").find('html_content').find("tr").findAll("div", {"class":"bragbar-metric"})
      ]
      print(metrics)


      Using bash



      with curl, sed, pup & xml_grep :



      The following bash script will :




      • request the video page https://www.youtube.com/watch?v=$video_id with curl

      • store the cookies in a file called cookie.txt

      • extract the XSRF_TOKEN called session_token in the following request with sed

      • request the video statistic page https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id with curl with the cookies previously stored

      • parse the xml result extract the CDATA part with xml_grep

      • parse the html with pup to extract the bragbar-metric class div and convert the html result to json with json{}

      • use sed to remove unicode character


      The script :



      video_id=CPkU0dF4JKo

      session_token=$(curl -s -c cookie.txt "https://www.youtube.com/watch?v=$video_id" |
      sed -rn "s/.*'XSRF_TOKEN's*:s*"(.*)".*/1/p")

      curl -s -b cookie.txt -d "session_token=$session_token"
      "https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id" |
      xml_grep --text_only 'html_content' |
      pup 'div table tr .bragbar-metric text{}' |
      sed 's/xc2x91|xc2x92|xc2xa0|xe2x80x8e//' |
      sed 's/s.*$//'


      It gives number of views, time watched, subscriptions, shares:



      120862
      454
      18
      213





      share|improve this answer

























        up vote
        4
        down vote



        accepted







        up vote
        4
        down vote



        accepted






        The best way is to look at the network logs, in this case it shows a POST on :



        https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id


        It sends a XSRF token in the body that is available in the original html body of the video page https://www.youtube.com/watch?v=$video_id in a javascript object like :



        yt.setConfig({
        'XSRF_TOKEN': "QUFFLUhqbnNvZUx4THR3eV80dHlacV9tRkRxc2NwSjlXQXxBQ3Jtc0ttd0JLWENnMjdYNE5IRWhibE9ZdDJTSk1aMktxTDR5d3JjSnkzVUtQWVcwdnp3X0tSOXEtM3hZdzVFdjNPeGpPRGtLVU5pVXV0SmtfdWJSUHNqTVg2WXBndjZpa3d6U25ja2FTelBBVWRlT0lZZkRDaDV6SU94VWE3cnpERHhWNVlUYWdyRjFqN1hvc0VLRmVwcEY3ZWdJMWgyUmc=",
        'XSRF_FIELD_NAME': "session_token",
        'XSRF_REDIRECT_TOKEN': "VlhMkn6F56dGGYcm4Rg7jCZR0vJ8MTQ5ODA1NzIwMkAxNDk3OTcwODAy"
        });


        It also needs some cookies set in this same video page.



        Using python



        with beautifulsoup & python-requests :



        import requests
        from bs4 import BeautifulSoup
        import re

        s = requests.Session()

        video_id = "CPkU0dF4JKo"

        r = s.get('https://www.youtube.com/watch?v={}'.format(video_id))

        xsrf_token = re.search("'XSRF_TOKEN's*:s*"(.*)"", r.text, re.IGNORECASE).group(1)

        r = s.post(
        'https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v={}'.format(video_id),
        data = {
        'session_token': xsrf_token
        }
        )
        metrics = [
        int(t.text.encode('ascii', 'ignore').split(' ', 1)[0])
        for t in BeautifulSoup(r.content, "lxml").find('html_content').find("tr").findAll("div", {"class":"bragbar-metric"})
        ]
        print(metrics)


        Using bash



        with curl, sed, pup & xml_grep :



        The following bash script will :




        • request the video page https://www.youtube.com/watch?v=$video_id with curl

        • store the cookies in a file called cookie.txt

        • extract the XSRF_TOKEN called session_token in the following request with sed

        • request the video statistic page https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id with curl with the cookies previously stored

        • parse the xml result extract the CDATA part with xml_grep

        • parse the html with pup to extract the bragbar-metric class div and convert the html result to json with json{}

        • use sed to remove unicode character


        The script :



        video_id=CPkU0dF4JKo

        session_token=$(curl -s -c cookie.txt "https://www.youtube.com/watch?v=$video_id" |
        sed -rn "s/.*'XSRF_TOKEN's*:s*"(.*)".*/1/p")

        curl -s -b cookie.txt -d "session_token=$session_token"
        "https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id" |
        xml_grep --text_only 'html_content' |
        pup 'div table tr .bragbar-metric text{}' |
        sed 's/xc2x91|xc2x92|xc2xa0|xe2x80x8e//' |
        sed 's/s.*$//'


        It gives number of views, time watched, subscriptions, shares:



        120862
        454
        18
        213





        share|improve this answer














        The best way is to look at the network logs, in this case it shows a POST on :



        https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id


        It sends a XSRF token in the body that is available in the original html body of the video page https://www.youtube.com/watch?v=$video_id in a javascript object like :



        yt.setConfig({
        'XSRF_TOKEN': "QUFFLUhqbnNvZUx4THR3eV80dHlacV9tRkRxc2NwSjlXQXxBQ3Jtc0ttd0JLWENnMjdYNE5IRWhibE9ZdDJTSk1aMktxTDR5d3JjSnkzVUtQWVcwdnp3X0tSOXEtM3hZdzVFdjNPeGpPRGtLVU5pVXV0SmtfdWJSUHNqTVg2WXBndjZpa3d6U25ja2FTelBBVWRlT0lZZkRDaDV6SU94VWE3cnpERHhWNVlUYWdyRjFqN1hvc0VLRmVwcEY3ZWdJMWgyUmc=",
        'XSRF_FIELD_NAME': "session_token",
        'XSRF_REDIRECT_TOKEN': "VlhMkn6F56dGGYcm4Rg7jCZR0vJ8MTQ5ODA1NzIwMkAxNDk3OTcwODAy"
        });


        It also needs some cookies set in this same video page.



        Using python



        with beautifulsoup & python-requests :



        import requests
        from bs4 import BeautifulSoup
        import re

        s = requests.Session()

        video_id = "CPkU0dF4JKo"

        r = s.get('https://www.youtube.com/watch?v={}'.format(video_id))

        xsrf_token = re.search("'XSRF_TOKEN's*:s*"(.*)"", r.text, re.IGNORECASE).group(1)

        r = s.post(
        'https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v={}'.format(video_id),
        data = {
        'session_token': xsrf_token
        }
        )
        metrics = [
        int(t.text.encode('ascii', 'ignore').split(' ', 1)[0])
        for t in BeautifulSoup(r.content, "lxml").find('html_content').find("tr").findAll("div", {"class":"bragbar-metric"})
        ]
        print(metrics)


        Using bash



        with curl, sed, pup & xml_grep :



        The following bash script will :




        • request the video page https://www.youtube.com/watch?v=$video_id with curl

        • store the cookies in a file called cookie.txt

        • extract the XSRF_TOKEN called session_token in the following request with sed

        • request the video statistic page https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id with curl with the cookies previously stored

        • parse the xml result extract the CDATA part with xml_grep

        • parse the html with pup to extract the bragbar-metric class div and convert the html result to json with json{}

        • use sed to remove unicode character


        The script :



        video_id=CPkU0dF4JKo

        session_token=$(curl -s -c cookie.txt "https://www.youtube.com/watch?v=$video_id" |
        sed -rn "s/.*'XSRF_TOKEN's*:s*"(.*)".*/1/p")

        curl -s -b cookie.txt -d "session_token=$session_token"
        "https://www.youtube.com/insight_ajax?action_get_statistics_and_data=1&v=$video_id" |
        xml_grep --text_only 'html_content' |
        pup 'div table tr .bragbar-metric text{}' |
        sed 's/xc2x91|xc2x92|xc2xa0|xe2x80x8e//' |
        sed 's/s.*$//'


        It gives number of views, time watched, subscriptions, shares:



        120862
        454
        18
        213






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 11 at 14:30

























        answered Jun 20 '17 at 15:31









        Bertrand Martel

        16.6k134064




        16.6k134064






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f44646802%2fweb-crawling-get-shares-of-youtube-video-from-statistics-tab%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Full-time equivalent

            Bicuculline

            さくらももこ