<bdo id='Y4ExU'></bdo><ul id='Y4ExU'></ul>
  • <i id='Y4ExU'><tr id='Y4ExU'><dt id='Y4ExU'><q id='Y4ExU'><span id='Y4ExU'><b id='Y4ExU'><form id='Y4ExU'><ins id='Y4ExU'></ins><ul id='Y4ExU'></ul><sub id='Y4ExU'></sub></form><legend id='Y4ExU'></legend><bdo id='Y4ExU'><pre id='Y4ExU'><center id='Y4ExU'></center></pre></bdo></b><th id='Y4ExU'></th></span></q></dt></tr></i><div id='Y4ExU'><tfoot id='Y4ExU'></tfoot><dl id='Y4ExU'><fieldset id='Y4ExU'></fieldset></dl></div>
  • <legend id='Y4ExU'><style id='Y4ExU'><dir id='Y4ExU'><q id='Y4ExU'></q></dir></style></legend>

    <small id='Y4ExU'></small><noframes id='Y4ExU'>

        <tfoot id='Y4ExU'></tfoot>

        使用 PDFMiner (Python) 处理在线 pdf 文件.编码网址?

        时间:2023-08-30

            <bdo id='QOuqz'></bdo><ul id='QOuqz'></ul>

            • <legend id='QOuqz'><style id='QOuqz'><dir id='QOuqz'><q id='QOuqz'></q></dir></style></legend>

                  <tbody id='QOuqz'></tbody>
                  <tfoot id='QOuqz'></tfoot>
                  <i id='QOuqz'><tr id='QOuqz'><dt id='QOuqz'><q id='QOuqz'><span id='QOuqz'><b id='QOuqz'><form id='QOuqz'><ins id='QOuqz'></ins><ul id='QOuqz'></ul><sub id='QOuqz'></sub></form><legend id='QOuqz'></legend><bdo id='QOuqz'><pre id='QOuqz'><center id='QOuqz'></center></pre></bdo></b><th id='QOuqz'></th></span></q></dt></tr></i><div id='QOuqz'><tfoot id='QOuqz'></tfoot><dl id='QOuqz'><fieldset id='QOuqz'></fieldset></dl></div>

                  <small id='QOuqz'></small><noframes id='QOuqz'>

                • 本文介绍了使用 PDFMiner (Python) 处理在线 pdf 文件.编码网址?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我希望使用 PDFMiner 提取在线可用的 pdf 文件的内容.

                  I am wishing to extract the content of pdf files available online using PDFMiner.

                  我的代码基于 文档 用于提取硬盘上的PDF文件内容:

                  My code is based on the one available in the documentation used to extract the content of PDF files on the hard disk:

                  # Open a PDF file.
                  fp = open('mypdf.pdf', 'rb')
                  # Create a PDF parser object associated with the file object.
                  parser = PDFParser(fp)
                  # Create a PDF document object that stores the document structure.
                  document = PDFDocument(parser)
                  

                  稍作改动后效果很好.

                  现在,我已尝试将 urllib2.openurl 用于在线 PDF,但这不起作用.我收到一条错误消息:coercing to Unicode: need string or buffer, instance found.

                  Now, I have tried urllib2.openurl for online PDFs but that doesn't work. I get an error message : coercing to Unicode: need string or buffer, instance found.

                  如何从 urllib2.openurl 获取字符串(或其他),以便在我给它一个 PDF 文件名时它与 open 函数相同(相对于 URL)`?

                  How can I get a string (or whatever) from urllib2.openurl so that it is the same as what the open function when I give it a PDF file name (versus an URL)`?

                  如果我的问题不清楚,请告诉我.

                  Please tell me if my question is not clear.

                  推荐答案

                  嗯,终于找到解决方案了,

                  Well, I finally found out a solution,

                  我求助于 RequestStringIO 并摆脱了 open('my_file', 'rd') 命令

                  I resorted on Request and StringIO and got rid off the open('my_file', 'rd') command

                  from urllib2 import Request
                  from StringIO import StringIO
                  
                  url = 'my_url'
                  
                  open = urllib2.urlopen(Request(url)).read()
                  memoryFile = StringIO(open)
                  
                  parser = PDFParser(memoryFile)
                  

                  这样 Python 将 url 视为一个文件(这么说).

                  That way Python considers the url as a file (to say so).

                  这篇关于使用 PDFMiner (Python) 处理在线 pdf 文件.编码网址?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:使用 pygit2 拉取和集成远程更改 下一篇:FileNotFoundError: [Errno 2] 没有这样的文件或目录

                  相关文章

                • <legend id='GL4VZ'><style id='GL4VZ'><dir id='GL4VZ'><q id='GL4VZ'></q></dir></style></legend>

                    1. <i id='GL4VZ'><tr id='GL4VZ'><dt id='GL4VZ'><q id='GL4VZ'><span id='GL4VZ'><b id='GL4VZ'><form id='GL4VZ'><ins id='GL4VZ'></ins><ul id='GL4VZ'></ul><sub id='GL4VZ'></sub></form><legend id='GL4VZ'></legend><bdo id='GL4VZ'><pre id='GL4VZ'><center id='GL4VZ'></center></pre></bdo></b><th id='GL4VZ'></th></span></q></dt></tr></i><div id='GL4VZ'><tfoot id='GL4VZ'></tfoot><dl id='GL4VZ'><fieldset id='GL4VZ'></fieldset></dl></div>

                      <small id='GL4VZ'></small><noframes id='GL4VZ'>

                      <tfoot id='GL4VZ'></tfoot>
                      • <bdo id='GL4VZ'></bdo><ul id='GL4VZ'></ul>