1. <i id='mcZdJ'><tr id='mcZdJ'><dt id='mcZdJ'><q id='mcZdJ'><span id='mcZdJ'><b id='mcZdJ'><form id='mcZdJ'><ins id='mcZdJ'></ins><ul id='mcZdJ'></ul><sub id='mcZdJ'></sub></form><legend id='mcZdJ'></legend><bdo id='mcZdJ'><pre id='mcZdJ'><center id='mcZdJ'></center></pre></bdo></b><th id='mcZdJ'></th></span></q></dt></tr></i><div id='mcZdJ'><tfoot id='mcZdJ'></tfoot><dl id='mcZdJ'><fieldset id='mcZdJ'></fieldset></dl></div>
    1. <legend id='mcZdJ'><style id='mcZdJ'><dir id='mcZdJ'><q id='mcZdJ'></q></dir></style></legend>

      <small id='mcZdJ'></small><noframes id='mcZdJ'>

        <bdo id='mcZdJ'></bdo><ul id='mcZdJ'></ul>
      <tfoot id='mcZdJ'></tfoot>

      BeautifulSoup 在 instagram html 页面中查找

      时间:2023-08-30
      <i id='tqHrn'><tr id='tqHrn'><dt id='tqHrn'><q id='tqHrn'><span id='tqHrn'><b id='tqHrn'><form id='tqHrn'><ins id='tqHrn'></ins><ul id='tqHrn'></ul><sub id='tqHrn'></sub></form><legend id='tqHrn'></legend><bdo id='tqHrn'><pre id='tqHrn'><center id='tqHrn'></center></pre></bdo></b><th id='tqHrn'></th></span></q></dt></tr></i><div id='tqHrn'><tfoot id='tqHrn'></tfoot><dl id='tqHrn'><fieldset id='tqHrn'></fieldset></dl></div>

            <tbody id='tqHrn'></tbody>
          • <bdo id='tqHrn'></bdo><ul id='tqHrn'></ul>
              <legend id='tqHrn'><style id='tqHrn'><dir id='tqHrn'><q id='tqHrn'></q></dir></style></legend>

                <tfoot id='tqHrn'></tfoot>

                <small id='tqHrn'></small><noframes id='tqHrn'>

              • 本文介绍了BeautifulSoup 在 instagram html 页面中查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                我正在尝试在 html instagram 页面中自动查找一些 url,并且 (知道我是 python 菜鸟)我找不到在 html 源代码中自动搜索的方法示例中 "display_url": http..." 之后的 url.

                I'm trying to automatically find some urls in an html instagram page and (knowing that I'm a python noob) I can't find the way to search automatically within the html source code the urls who are in the exemple after the "display_url": http...".

                我想让我的脚本搜索多个显示为display_url"的url并下载它们.它们必须在源代码中出现的次数被提取.

                I want to make my script search multiples url who appears as next as "display_url" and download them. They have to be extracted as many times as they appear in the source code.

                <小时>

                用 bs4 我试过了:


                With bs4 I tried the :

                f = urllib.request.urlopen(fileURL)
                htmlSource = f.read()
                soup = bs(htmlSource, 'html.parser')
                metaTag = soup.find_all('meta', {'property': 'og:image'})
                imgURL = metaTag[0]['content']
                urllib.request.urlretrieve(imgURL, 'fileName.jpg')
                

                但我无法让 soup.find_all(... 工作/搜索它.有没有办法让我用 bs4 找到这部分页面?

                But I can't make the soup.find_all(... work/search it. Is there a way for me to find this part of the page with bs4 ?

                非常感谢您的帮助.

                这是我现在的小 (python) 代码示例:https://repl.它/@ClementJpn287/bs

                Here is an exemple of my little (python) code as it is now : https://repl.it/@ClementJpn287/bs

                <!––cropped...............-->
                
                <body class="">
                
                  <span id="react-root"><svg width="50" height="50" viewBox="0 0 50 50" style="position:absolute;top:50%;left:50%;margin:-25px 0 0 -25px;fill:#c7c7c7">
                      <path
                        d="
                
                        <!––deleted part for privacy -->
                
                         " />
                      </svg></span>
                
                
                  <script type="text/javascript">
                    window._sharedData = {
                      "config": {
                        "csrf_token": "",
                        "viewer": {
                        
                        <!––deleted part for privacy -->
                   
                        "viewerId": ""
                      },
                      "supports_es6": true,
                      "country_code": "FR",
                      "language_code": "fr",
                      "locale": "fr_FR",
                      "entry_data": {
                        "PostPage": [{
                          "graphql": {
                            "shortcode_media": {
                              "__typename": "GraphSidecar",
                     
                     <!––deleted part for privacy -->
                     
                              "dimensions": {
                                "height": 1080,
                                "width": 1080
                              },
                              "gating_info": null,
                              "media_preview": null,
                
                <--There's the important part that have to be extracted as many times it appear in the source code-->
                
                              "display_url": "https://scontent-cdt1-1.cdninstagram.com/vp/",
                              "display_resources": [{
                                "src": "https://scontent-cdt1-1.cdninstagram.com/vp/",
                                "config_width": 640,
                                "config_height": 640
                              }, {
                                "src": "https://scontent-cdt1-1.cdninstagram.com/vp/",
                                "config_width": 750,
                                "config_height": 750
                              }, {
                                "src": "https://scontent-cdt1-1.cdninstagram.com/vp/",
                                "config_width": 1080,
                                "config_height": 1080
                              }],
                              "is_video": false,
                       
                <!––cropped...............-->

                我的最新代码

                推荐答案

                问题解决

                这是在 iOS 上使用 Pythonista 3 从 Instagram 网址下载多个图像的代码:

                Problem Solved

                Here's the code to download multiples images from an instagram url with Pythonista 3 on iOS:

                    from sys import argv
                    import urllib
                    import urllib.request
                    from bs4 import BeautifulSoup 
                    import re
                    import photos
                    import clipboard
                
                
                    thepage = "your url"
                #p.1
                    thepage = urllib.request.urlopen(html)
                    soup = BeautifulSoup(thepage, "html.parser")
                    print(soup.title.text)
                    txt = soup.select('script[type="text/javascript"]')[3] 
                    texte = txt.get_text()
                    fille = open("tet.txt", 'w')
                    fille.write(texte)
                    fille.close()
                #p.2
                    g = open('tet.txt','r')
                    data=''.join(g.readlines())
                    le1 = 0
                    le2 = 0
                    hturl = open('url.html', 'w')
                    still_looking = True
                    while still_looking:
                        still_looking = False
                        dat = data.find('play_url', le1)
                        det = data.find('play_resources', le2)
                        if dat >= le1:
                            #urls.append(dat)
                            le1 = dat + 1
                            still_looking = True                
                        if det >= le2:
                            hturl.write(data[dat:det])
                            le2 = det + 1
                            still_looking = True
                    hturl.close()
                #p.3
                    hturl2 = open('url.html', 'r')
                    dete = ''.join(hturl2.readlines())
                    le11 = 0
                    le22 = 0
                    urls = []
                    still_looking2 = True
                    while still_looking2:
                        still_looking2 = False
                        dat2 = dete.find('https://scontent-', le11)
                        det2 = dete.find('","dis', le22)
                        if dat2 >= le11:
                            urls.append(dat2)
                            le11 = dat2 + 1
                            still_looking2 = True                
                        if det2 >= le22:
                            urls.append(dete[dat2:det2])
                            le22 = det2 + 1
                            still_looking2 = True   
                    hturl2.close()
                #p.4
                    imgs = len(urls)
                    nbind = imgs
                    nbindr = 3 
                    images = 1
                    while nbindr < imgs:
                        urllib.request.urlretrieve(urls[nbindr], 'photo.jpg')
                        photos.create_image_asset('photo.jpg')
                        print ('Image ' + str(images) + ' downloaded')
                        nbindr = nbindr +2
                        images += 1
                    print("OK")
                

                这有点挑剔,但它的工作速度也很快.感谢您的帮助.

                It's a bit fastidious but it's working and rapidly too. Thanks for your help.

                这篇关于BeautifulSoup 在 instagram html 页面中查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                上一篇:在 Python 字典中查找和替换值 下一篇:Python,字符串中的特定字符数

                相关文章

              • <i id='trMRs'><tr id='trMRs'><dt id='trMRs'><q id='trMRs'><span id='trMRs'><b id='trMRs'><form id='trMRs'><ins id='trMRs'></ins><ul id='trMRs'></ul><sub id='trMRs'></sub></form><legend id='trMRs'></legend><bdo id='trMRs'><pre id='trMRs'><center id='trMRs'></center></pre></bdo></b><th id='trMRs'></th></span></q></dt></tr></i><div id='trMRs'><tfoot id='trMRs'></tfoot><dl id='trMRs'><fieldset id='trMRs'></fieldset></dl></div>

                    <legend id='trMRs'><style id='trMRs'><dir id='trMRs'><q id='trMRs'></q></dir></style></legend>

                  1. <small id='trMRs'></small><noframes id='trMRs'>

                      <bdo id='trMRs'></bdo><ul id='trMRs'></ul>
                    <tfoot id='trMRs'></tfoot>