• <small id='3Nq4j'></small><noframes id='3Nq4j'>

  • <i id='3Nq4j'><tr id='3Nq4j'><dt id='3Nq4j'><q id='3Nq4j'><span id='3Nq4j'><b id='3Nq4j'><form id='3Nq4j'><ins id='3Nq4j'></ins><ul id='3Nq4j'></ul><sub id='3Nq4j'></sub></form><legend id='3Nq4j'></legend><bdo id='3Nq4j'><pre id='3Nq4j'><center id='3Nq4j'></center></pre></bdo></b><th id='3Nq4j'></th></span></q></dt></tr></i><div id='3Nq4j'><tfoot id='3Nq4j'></tfoot><dl id='3Nq4j'><fieldset id='3Nq4j'></fieldset></dl></div>
    <tfoot id='3Nq4j'></tfoot>
    • <bdo id='3Nq4j'></bdo><ul id='3Nq4j'></ul>
  • <legend id='3Nq4j'><style id='3Nq4j'><dir id='3Nq4j'><q id='3Nq4j'></q></dir></style></legend>

        如何在 python 子进程之间传递大型 numpy 数组而不保存到磁盘?

        时间:2023-07-22
          <tbody id='DVLPG'></tbody>
        <legend id='DVLPG'><style id='DVLPG'><dir id='DVLPG'><q id='DVLPG'></q></dir></style></legend>

          <tfoot id='DVLPG'></tfoot>
        1. <small id='DVLPG'></small><noframes id='DVLPG'>

                • <bdo id='DVLPG'></bdo><ul id='DVLPG'></ul>
                • <i id='DVLPG'><tr id='DVLPG'><dt id='DVLPG'><q id='DVLPG'><span id='DVLPG'><b id='DVLPG'><form id='DVLPG'><ins id='DVLPG'></ins><ul id='DVLPG'></ul><sub id='DVLPG'></sub></form><legend id='DVLPG'></legend><bdo id='DVLPG'><pre id='DVLPG'><center id='DVLPG'></center></pre></bdo></b><th id='DVLPG'></th></span></q></dt></tr></i><div id='DVLPG'><tfoot id='DVLPG'></tfoot><dl id='DVLPG'><fieldset id='DVLPG'></fieldset></dl></div>

                  本文介绍了如何在 python 子进程之间传递大型 numpy 数组而不保存到磁盘?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  有没有一种在不使用磁盘的情况下在两个 python 子进程之间传递大量数据的好方法?这是我希望完成的卡通示例:

                  Is there a good way to pass a large chunk of data between two python subprocesses without using the disk? Here's a cartoon example of what I'm hoping to accomplish:

                  import sys, subprocess, numpy
                  
                  cmdString = """
                  import sys, numpy
                  
                  done = False
                  while not done:
                      cmd = raw_input()
                      if cmd == 'done':
                          done = True
                      elif cmd == 'data':
                          ##Fake data. In real life, get data from hardware.
                          data = numpy.zeros(1000000, dtype=numpy.uint8)
                          data.dump('data.pkl')
                          sys.stdout.write('data.pkl' + '\n')
                          sys.stdout.flush()"""
                  
                  proc = subprocess.Popen( #python vs. pythonw on Windows?
                      [sys.executable, '-c %s'%cmdString],
                      stdin=subprocess.PIPE,
                      stdout=subprocess.PIPE,
                      stderr=subprocess.PIPE)
                  
                  for i in range(3):
                      proc.stdin.write('data
                  ')
                      print proc.stdout.readline().rstrip()
                      a = numpy.load('data.pkl')
                      print a.shape
                  
                  proc.stdin.write('done
                  ')
                  

                  这将创建一个子进程,该子进程生成一个 numpy 数组并将该数组保存到磁盘.然后父进程从磁盘加载阵列.有效!

                  This creates a subprocess which generates a numpy array and saves the array to disk. The parent process then loads the array from disk. It works!

                  问题是,我们的硬件生成数据的速度比磁盘读取/写入的速度快 10 倍.有没有办法将数据从一个 python 进程传输到另一个纯粹在内存中的进程,甚至可能不复制数据?我可以做类似通过引用传递的事情吗?

                  The problem is, our hardware can generate data 10x faster than the disk can read/write. Is there a way to transfer data from one python process to another purely in-memory, maybe even without making a copy of the data? Can I do something like passing-by-reference?

                  我第一次尝试纯粹在内存中传输数据非常糟糕:

                  My first attempt at transferring data purely in-memory is pretty lousy:

                  import sys, subprocess, numpy
                  
                  cmdString = """
                  import sys, numpy
                  
                  done = False
                  while not done:
                      cmd = raw_input()
                      if cmd == 'done':
                          done = True
                      elif cmd == 'data':
                          ##Fake data. In real life, get data from hardware.
                          data = numpy.zeros(1000000, dtype=numpy.uint8)
                          ##Note that this is NFG if there's a '10' in the array:
                          sys.stdout.write(data.tostring() + '\n')
                          sys.stdout.flush()"""
                  
                  proc = subprocess.Popen( #python vs. pythonw on Windows?
                      [sys.executable, '-c %s'%cmdString],
                      stdin=subprocess.PIPE,
                      stdout=subprocess.PIPE,
                      stderr=subprocess.PIPE)
                  
                  for i in range(3):
                      proc.stdin.write('data
                  ')
                      a = numpy.fromstring(proc.stdout.readline().rstrip(), dtype=numpy.uint8)
                      print a.shape
                  
                  proc.stdin.write('done
                  ')
                  

                  这非常慢(比保存到磁盘慢得多)并且非常非常脆弱.一定有更好的方法!

                  This is extremely slow (much slower than saving to disk) and very, very fragile. There's got to be a better way!

                  只要数据获取进程不阻塞父应用程序,我就不会与子进程"模块结婚.我短暂地尝试了多处理",但到目前为止没有成功.

                  I'm not married to the 'subprocess' module, as long as the data-taking process doesn't block the parent application. I briefly tried 'multiprocessing', but without success so far.

                  背景:我们有一个硬件可以在一系列 ctypes 缓冲区中生成高达 ~2 GB/s 的数据.处理这些缓冲区的 python 代码只是处理大量的信息.我想将此信息流与在主"程序中同时运行的其他几个硬件协调,而子进程不会相互阻塞.我目前的方法是在保存到磁盘之前在子进程中将数据煮沸一点,但最好将完整的数据传递给主"进程.

                  Background: We have a piece of hardware that generates up to ~2 GB/s of data in a series of ctypes buffers. The python code to handle these buffers has its hands full just dealing with the flood of information. I want to coordinate this flow of information with several other pieces of hardware running simultaneously in a 'master' program, without the subprocesses blocking each other. My current approach is to boil the data down a little bit in the subprocess before saving to disk, but it'd be nice to pass the full monty to the 'master' process.

                  推荐答案

                  在搜索有关 Joe Kington 发布的代码的更多信息时,我发现了 numpy-sharedmem 包.从这个 numpy/multiprocessing 教程来看,它似乎共享相同的知识遗产(可能大致相同作者?——我不确定).

                  While googling around for more information about the code Joe Kington posted, I found the numpy-sharedmem package. Judging from this numpy/multiprocessing tutorial it seems to share the same intellectual heritage (maybe largely the same authors? -- I'm not sure).

                  使用 sharedmem 模块,您可以创建一个共享内存 numpy 数组(太棒了!),并将其与 多处理 像这样:

                  Using the sharedmem module, you can create a shared-memory numpy array (awesome!), and use it with multiprocessing like this:

                  import sharedmem as shm
                  import numpy as np
                  import multiprocessing as mp
                  
                  def worker(q,arr):
                      done = False
                      while not done:
                          cmd = q.get()
                          if cmd == 'done':
                              done = True
                          elif cmd == 'data':
                              ##Fake data. In real life, get data from hardware.
                              rnd=np.random.randint(100)
                              print('rnd={0}'.format(rnd))
                              arr[:]=rnd
                          q.task_done()
                  
                  if __name__=='__main__':
                      N=10
                      arr=shm.zeros(N,dtype=np.uint8)
                      q=mp.JoinableQueue()    
                      proc = mp.Process(target=worker, args=[q,arr])
                      proc.daemon=True
                      proc.start()
                  
                      for i in range(3):
                          q.put('data')
                          # Wait for the computation to finish
                          q.join()   
                          print arr.shape
                          print(arr)
                      q.put('done')
                      proc.join()
                  

                  运行产量

                  rnd=53
                  (10,)
                  [53 53 53 53 53 53 53 53 53 53]
                  rnd=15
                  (10,)
                  [15 15 15 15 15 15 15 15 15 15]
                  rnd=87
                  (10,)
                  [87 87 87 87 87 87 87 87 87 87]
                  

                  这篇关于如何在 python 子进程之间传递大型 numpy 数组而不保存到磁盘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:确保子进程在退出 Python 程序时死亡 下一篇:并行的 Python 子进程

                  相关文章

                  <i id='j4Lfa'><tr id='j4Lfa'><dt id='j4Lfa'><q id='j4Lfa'><span id='j4Lfa'><b id='j4Lfa'><form id='j4Lfa'><ins id='j4Lfa'></ins><ul id='j4Lfa'></ul><sub id='j4Lfa'></sub></form><legend id='j4Lfa'></legend><bdo id='j4Lfa'><pre id='j4Lfa'><center id='j4Lfa'></center></pre></bdo></b><th id='j4Lfa'></th></span></q></dt></tr></i><div id='j4Lfa'><tfoot id='j4Lfa'></tfoot><dl id='j4Lfa'><fieldset id='j4Lfa'></fieldset></dl></div>
                  <tfoot id='j4Lfa'></tfoot>

                • <small id='j4Lfa'></small><noframes id='j4Lfa'>

                  • <bdo id='j4Lfa'></bdo><ul id='j4Lfa'></ul>

                    <legend id='j4Lfa'><style id='j4Lfa'><dir id='j4Lfa'><q id='j4Lfa'></q></dir></style></legend>