• <tfoot id='6YOAy'></tfoot>

    1. <small id='6YOAy'></small><noframes id='6YOAy'>

        <legend id='6YOAy'><style id='6YOAy'><dir id='6YOAy'><q id='6YOAy'></q></dir></style></legend>
        • <bdo id='6YOAy'></bdo><ul id='6YOAy'></ul>
        <i id='6YOAy'><tr id='6YOAy'><dt id='6YOAy'><q id='6YOAy'><span id='6YOAy'><b id='6YOAy'><form id='6YOAy'><ins id='6YOAy'></ins><ul id='6YOAy'></ul><sub id='6YOAy'></sub></form><legend id='6YOAy'></legend><bdo id='6YOAy'><pre id='6YOAy'><center id='6YOAy'></center></pre></bdo></b><th id='6YOAy'></th></span></q></dt></tr></i><div id='6YOAy'><tfoot id='6YOAy'></tfoot><dl id='6YOAy'><fieldset id='6YOAy'></fieldset></dl></div>
      1. Cuda 内核返回向量

        时间:2023-09-26
          <bdo id='6bjqw'></bdo><ul id='6bjqw'></ul>
        • <i id='6bjqw'><tr id='6bjqw'><dt id='6bjqw'><q id='6bjqw'><span id='6bjqw'><b id='6bjqw'><form id='6bjqw'><ins id='6bjqw'></ins><ul id='6bjqw'></ul><sub id='6bjqw'></sub></form><legend id='6bjqw'></legend><bdo id='6bjqw'><pre id='6bjqw'><center id='6bjqw'></center></pre></bdo></b><th id='6bjqw'></th></span></q></dt></tr></i><div id='6bjqw'><tfoot id='6bjqw'></tfoot><dl id='6bjqw'><fieldset id='6bjqw'></fieldset></dl></div>
            • <legend id='6bjqw'><style id='6bjqw'><dir id='6bjqw'><q id='6bjqw'></q></dir></style></legend>
            • <small id='6bjqw'></small><noframes id='6bjqw'>

                <tbody id='6bjqw'></tbody>

                  <tfoot id='6bjqw'></tfoot>

                  本文介绍了Cuda 内核返回向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我有一个单词列表,我的目标是匹配一个非常长的短语中的每个单词.我在匹配每个单词时没有问题,我唯一的问题是返回一个包含每个匹配信息的结构向量.

                  I have a list of words, my goal is to match each word in a very very long phrase. I'm having no problem in matching each word, my only problem is to return a vector of structures containing informations about each match.

                  在代码中:

                  typedef struct {
                      int A, B, C; } Match;
                  
                  __global__ void Find(veryLongPhrase * _phrase, Words * _word_list, vector<Match> * _matches)
                  {
                      int a, b, c;
                  
                      [...] //Parallel search for each word in the phrase
                  
                      if(match) //When an occurrence is found
                      {
                          _matches.push_back(new Match{ A = a, B = b, C = c }); //Here comes the unknown, what should I do here???
                      }
                  }
                  
                  main()
                  {
                      [...]
                  
                      veryLongPhrase * myPhrase = "The quick brown fox jumps over the lazy dog etc etc etc..."
                  
                      Words * wordList = {"the", "lazy"};
                  
                      vector<Match> * matches; //Obviously I can't pass a vector to a kernel
                  
                      Find<<< X, Y >>>(myPhrase, wordList, matches);
                  
                      [...]
                  
                  }
                  

                  我已经尝试过 Thrust 库,但没有成功,你能给我建议任何类型的解决方案吗?

                  I have tried Thrust library but without any success, can you suggest me any kind of solution?

                  非常感谢.

                  推荐答案

                  这样的事情应该可以工作(在浏览器中编码,未测试):

                  something like this should work (coded in browser, not tested):

                  // N is the maximum number of structs to insert
                  #define N 10000
                  
                  typedef struct {
                      int A, B, C; } Match;
                  
                  __device__ Match dev_data[N];
                  __device__ int dev_count = 0;
                  
                  __device__ int my_push_back(Match * mt) {
                    int insert_pt = atomicAdd(&dev_count, 1);
                    if (insert_pt < N){
                      dev_data[insert_pt] = *mt;
                      return insert_pt;}
                    else return -1;}
                  
                  __global__ void Find(veryLongPhrase * _phrase, Words * _word_list, vector<Match> * _matches)
                  {
                      int a, b, c;
                  
                      [...] //Parallel search for each word in the phrase
                  
                      if(match) //When an occurrence is found
                      {
                          my_push_back(new Match{ A = a, B = b, C = c });    }
                  }
                  
                  
                  main()
                  {
                      [...]
                  
                      veryLongPhrase * myPhrase = "The quick brown fox jumps over the lazy dog etc etc etc..."
                  
                      Words * wordList = {"the", "lazy"};
                  
                      Find<<< X, Y >>>(myPhrase, wordList);
                  
                      int dsize;
                      cudaMemcpyFromSymbol(&dsize, dev_count, sizeof(int));
                      vector<Match> results(dsize);
                      cudaMemcpyFromSymbol(&(results[0]), dev_data, dsize*sizeof(Match));
                  
                      [...]
                  
                  }
                  

                  这将需要原子操作的计算能力 1.1 或更高.

                  This will require compute capability 1.1 or better for the atomic operation.

                  nvcc -arch=sm_11 ...
                  

                  这是一个有效的例子:

                  $ cat t347.cu
                  #include <iostream>
                  #include <vector>
                  
                  // N is the maximum number of structs to insert
                  #define N 10000
                  
                  typedef struct {
                      int A, B, C; } Match;
                  
                  __device__ Match dev_data[N];
                  __device__ int dev_count = 0;
                  
                  __device__ int my_push_back(Match & mt) {
                    int insert_pt = atomicAdd(&dev_count, 1);
                    if (insert_pt < N){
                      dev_data[insert_pt] = mt;
                      return insert_pt;}
                    else return -1;}
                  
                  __global__ void Find()
                  {
                  
                      if(threadIdx.x < 10) //Simulate a found occurrence
                      {
                          Match a = { .A = 1, .B = 2, .C = 3 };
                          my_push_back(a);    }
                  }
                  
                  
                  main()
                  {
                  
                      Find<<< 2, 256 >>>();
                  
                      int dsize;
                      cudaMemcpyFromSymbol(&dsize, dev_count, sizeof(int));
                      if (dsize >= N) {printf("overflow error
                  "); return 1;}
                      std::vector<Match> results(dsize);
                      cudaMemcpyFromSymbol(&(results[0]), dev_data, dsize*sizeof(Match));
                      std::cout << "number of matches = " << dsize << std::endl;
                      std::cout << "A  =  " << results[dsize-1].A << std:: endl;
                      std::cout << "B  =  " << results[dsize-1].B << std:: endl;
                      std::cout << "C  =  " << results[dsize-1].C << std:: endl;
                  
                  }
                  $ nvcc -arch=sm_11 -o t347 t347.cu
                  $ ./t347
                  number of matches = 20
                  A  =  1
                  B  =  2
                  C  =  3
                  $
                  

                  注意,在这种情况下,我的Match结果结构创建是不同的,我是通过引用传递的,但概念是一样的.

                  Note that in this case my Match result struct creation is different, and I am passing by reference, but the concept is the same.

                  这篇关于Cuda 内核返回向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:C++17 并行算法是否已经实现? 下一篇:为什么 OpenMP 不允许使用 != 运算符?

                  相关文章

                  <i id='x0m4C'><tr id='x0m4C'><dt id='x0m4C'><q id='x0m4C'><span id='x0m4C'><b id='x0m4C'><form id='x0m4C'><ins id='x0m4C'></ins><ul id='x0m4C'></ul><sub id='x0m4C'></sub></form><legend id='x0m4C'></legend><bdo id='x0m4C'><pre id='x0m4C'><center id='x0m4C'></center></pre></bdo></b><th id='x0m4C'></th></span></q></dt></tr></i><div id='x0m4C'><tfoot id='x0m4C'></tfoot><dl id='x0m4C'><fieldset id='x0m4C'></fieldset></dl></div>
                • <legend id='x0m4C'><style id='x0m4C'><dir id='x0m4C'><q id='x0m4C'></q></dir></style></legend>
                • <small id='x0m4C'></small><noframes id='x0m4C'>

                    <bdo id='x0m4C'></bdo><ul id='x0m4C'></ul>

                  1. <tfoot id='x0m4C'></tfoot>