<tfoot id='CjW9q'></tfoot>

      • <bdo id='CjW9q'></bdo><ul id='CjW9q'></ul>
      <i id='CjW9q'><tr id='CjW9q'><dt id='CjW9q'><q id='CjW9q'><span id='CjW9q'><b id='CjW9q'><form id='CjW9q'><ins id='CjW9q'></ins><ul id='CjW9q'></ul><sub id='CjW9q'></sub></form><legend id='CjW9q'></legend><bdo id='CjW9q'><pre id='CjW9q'><center id='CjW9q'></center></pre></bdo></b><th id='CjW9q'></th></span></q></dt></tr></i><div id='CjW9q'><tfoot id='CjW9q'></tfoot><dl id='CjW9q'><fieldset id='CjW9q'></fieldset></dl></div>

      <legend id='CjW9q'><style id='CjW9q'><dir id='CjW9q'><q id='CjW9q'></q></dir></style></legend>

      <small id='CjW9q'></small><noframes id='CjW9q'>

      在 Java 中有效地计算两个集合的交集?

      时间:2023-10-13
        <tbody id='FC4UW'></tbody>

      1. <legend id='FC4UW'><style id='FC4UW'><dir id='FC4UW'><q id='FC4UW'></q></dir></style></legend>

        <small id='FC4UW'></small><noframes id='FC4UW'>

          • <tfoot id='FC4UW'></tfoot>
                <bdo id='FC4UW'></bdo><ul id='FC4UW'></ul>

                <i id='FC4UW'><tr id='FC4UW'><dt id='FC4UW'><q id='FC4UW'><span id='FC4UW'><b id='FC4UW'><form id='FC4UW'><ins id='FC4UW'></ins><ul id='FC4UW'></ul><sub id='FC4UW'></sub></form><legend id='FC4UW'></legend><bdo id='FC4UW'><pre id='FC4UW'><center id='FC4UW'></center></pre></bdo></b><th id='FC4UW'></th></span></q></dt></tr></i><div id='FC4UW'><tfoot id='FC4UW'></tfoot><dl id='FC4UW'><fieldset id='FC4UW'></fieldset></dl></div>
                本文介绍了在 Java 中有效地计算两个集合的交集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                问题描述

                在 Java 中查找两个非稀疏集合的交集大小最有效的方法是什么?这是我将在大型集合上调用很多次的操作,因此优化很重要.我无法修改原始集.

                What is the most efficient way to find the size of the intersection of two non-sparse Sets in Java? This is an operation I will be calling on large sets a very large number of times, so optimisation is important. I cannot modify the original sets.

                我查看了 Apache Commons CollectionUtils.intersection,它似乎很慢.我目前的方法是取两个集合中较小的一个,克隆它,然后在两个集合中较大的一个上调用 .retainAll.

                I have looked at Apache Commons CollectionUtils.intersection which appears to be quite slow. My current approach is to take the smaller of the two sets, clone it, and then call .retainAll on the larger of the two sets.

                public static int getIntersection(Set<Long> set1, Set<Long> set2) {
                    boolean set1IsLarger = set1.size() > set2.size();
                    Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1);
                    cloneSet.retainAll(set1IsLarger ? set1 : set2);
                    return cloneSet.size();
                }
                

                推荐答案

                使用发布的方法运行一些测试,而不是构造一个新的 HashSet.也就是说,令 A 为集合中较小的一个,B 为较大的集合,然后,对于 A 中的每个项目,如果它也存在于 B 中,然后将其添加到 C(一个新的 HashSet)——为了计算,可以跳过中间的 C 集.

                Run some tests with the posted approach and versus constructing a new HashSet. That is, let A be the smaller of the sets and B be the larger set and then, for each item in A, if it also exists in B then add it to C (a new HashSet) -- for just counting, the intermediate C set can be skipped.

                就像发布的方法一样,这应该是成本上的 O(|A|),因为迭代是 O(|A|) 并且对 B 的探测是O(1).我不知道它将如何与克隆和删除方法进行比较.

                Just as the posted approach, this should be a O(|A|) in cost as the iteration is O(|A|) and probe into B is O(1). I have no idea how it will compare vs. the clone-and-remove approach.

                编码愉快——并发布一些结果;-)

                Happy coding -- and post some results ;-)

                实际上,经过进一步思考,我相信这比帖子中的方法有更好的界限:O(|A|) vs O(|A| + |B|).我不知道这是否会在现实中产生任何差异(或改进),我只希望它在 |A|<<<|B|.

                Actually, on further thinking, I believe this has slightly better bounds than the method in the post: O(|A|) vs O(|A| + |B|). I have no idea if this will make any difference (or improvement) in actuality and I would only expect it to be relevant when |A| <<< |B|.

                好吧,我真的很无聊.至少在 JDK 7 (Windows 7 x64) 上,看来帖子中提出的方法比上述方法慢——好极了(尽管看起来大部分是恒定的)因素.我的眼球猜测表明,它比上述仅使用计数器的建议慢 四倍,而在创建新 HashSet 时,它慢 两倍.这似乎在不同的初始集合大小中大致一致".

                Okay, so I was really bored. At least on JDK 7 (Windows 7 x64), it appears the method in presented in the post is slower than the above approach -- by a good (albeit what appears to be mostly constant) factor. My eye-ball guesstimate says it is about four times as slow than the above suggestion that just uses a counter and twice as slow of when creating a new HashSet. This seems to be "roughly consistent" across the different initial set sizes.

                (请记住,正如 Voo 所指出的,上面的数字和这个微基准假设正在使用 HashSet! 而且,与往常一样,微基准存在危险.YMMV.)

                (Please keep in mind that, as Voo pointed out, the numbers above and this micro-benchmark assume a HashSet is being used! And, as always, there are dangers with micro-benchmarks. YMMV.)

                以下是丑陋的结果(以毫秒为单位):

                Here are the ugly results (times in milliseconds):

                Running tests for 1x1
                IntersectTest$PostMethod@6cc2060e took 13.9808544 count=1000000
                IntersectTest$MyMethod1@7d38847d took 2.9893732 count=1000000
                IntersectTest$MyMethod2@9826ac5 took 7.775945 count=1000000
                Running tests for 1x10
                IntersectTest$PostMethod@67fc9fee took 12.4647712 count=734000
                IntersectTest$MyMethod1@7a67f797 took 3.1567252 count=734000
                IntersectTest$MyMethod2@3fb01949 took 6.483941 count=734000
                Running tests for 1x100
                IntersectTest$PostMethod@16675039 took 11.3069326 count=706000
                IntersectTest$MyMethod1@58c3d9ac took 2.3482693 count=706000
                IntersectTest$MyMethod2@2207d8bb took 4.8687103 count=706000
                Running tests for 1x1000
                IntersectTest$PostMethod@33d626a4 took 10.28656 count=729000
                IntersectTest$MyMethod1@3082f392 took 2.3478658 count=729000
                IntersectTest$MyMethod2@65450f1f took 4.109205 count=729000
                Running tests for 10x2
                IntersectTest$PostMethod@55c4d594 took 10.4137618 count=736000
                IntersectTest$MyMethod1@6da21389 took 2.374206 count=736000
                IntersectTest$MyMethod2@2bb0bf9a took 4.9802039 count=736000
                Running tests for 10x10
                IntersectTest$PostMethod@7930ebb took 25.811083 count=4370000
                IntersectTest$MyMethod1@47ac1adf took 6.9409306 count=4370000
                IntersectTest$MyMethod2@74184b3b took 14.2603248 count=4370000
                Running tests for 10x100
                IntersectTest$PostMethod@7f423820 took 25.0577691 count=4251000
                IntersectTest$MyMethod1@5472fe25 took 6.1376042 count=4251000
                IntersectTest$MyMethod2@498b5a73 took 13.9880385 count=4251000
                Running tests for 10x1000
                IntersectTest$PostMethod@3033b503 took 25.0312716 count=4138000
                IntersectTest$MyMethod1@12b0f0ae took 6.0932898 count=4138000
                IntersectTest$MyMethod2@1e893918 took 13.8332505 count=4138000
                Running tests for 100x1
                IntersectTest$PostMethod@6366de01 took 9.4531628 count=700000
                IntersectTest$MyMethod1@767946a2 took 2.4284762 count=700000
                IntersectTest$MyMethod2@140c7272 took 4.7580235 count=700000
                Running tests for 100x10
                IntersectTest$PostMethod@3351e824 took 24.9788668 count=4192000
                IntersectTest$MyMethod1@465fadce took 6.1462852 count=4192000
                IntersectTest$MyMethod2@338bd37a took 13.1742654 count=4192000
                Running tests for 100x100
                IntersectTest$PostMethod@297630d5 took 193.0121077 count=41047000
                IntersectTest$MyMethod1@e800537 took 45.2652397 count=41047000
                IntersectTest$MyMethod2@76d66550 took 120.8494766 count=41047000
                Running tests for 100x1000
                IntersectTest$PostMethod@33576738 took 199.6269531 count=40966000
                IntersectTest$MyMethod1@2f39a7dd took 45.5255814 count=40966000
                IntersectTest$MyMethod2@723bb663 took 122.1704975 count=40966000
                Running tests for 1x1
                IntersectTest$PostMethod@35e3bdb5 took 9.5598373 count=1000000
                IntersectTest$MyMethod1@7abbd1b6 took 2.6359174 count=1000000
                IntersectTest$MyMethod2@40c542ad took 6.1091794 count=1000000
                Running tests for 1x10
                IntersectTest$PostMethod@3c33a0c5 took 9.4648528 count=733000
                IntersectTest$MyMethod1@61800463 took 2.302116 count=733000
                IntersectTest$MyMethod2@1ba03197 took 5.4803628 count=733000
                Running tests for 1x100
                IntersectTest$PostMethod@71b8da5 took 9.4971057 count=719000
                IntersectTest$MyMethod1@21f04f48 took 2.2983538 count=719000
                IntersectTest$MyMethod2@27e51160 took 5.3926902 count=719000
                Running tests for 1x1000
                IntersectTest$PostMethod@2fe7106a took 9.4702331 count=692000
                IntersectTest$MyMethod1@6ae6b7b7 took 2.3013066 count=692000
                IntersectTest$MyMethod2@51278635 took 5.4488882 count=692000
                Running tests for 10x2
                IntersectTest$PostMethod@223b2d85 took 9.5660879 count=743000
                IntersectTest$MyMethod1@5b298851 took 2.3481445 count=743000
                IntersectTest$MyMethod2@3b4ac99 took 4.8268489 count=743000
                Running tests for 10x10
                IntersectTest$PostMethod@51c700a0 took 23.0709476 count=4326000
                IntersectTest$MyMethod1@5ffa3251 took 5.5460785 count=4326000
                IntersectTest$MyMethod2@22fd9511 took 13.4853948 count=4326000
                Running tests for 10x100
                IntersectTest$PostMethod@46b49793 took 25.1295491 count=4256000
                IntersectTest$MyMethod1@7a4b5828 took 5.8520418 count=4256000
                IntersectTest$MyMethod2@6888e8d1 took 14.0856942 count=4256000
                Running tests for 10x1000
                IntersectTest$PostMethod@5339af0d took 25.1752685 count=4158000
                IntersectTest$MyMethod1@7013a92a took 5.7978328 count=4158000
                IntersectTest$MyMethod2@1ac73de2 took 13.8914112 count=4158000
                Running tests for 100x1
                IntersectTest$PostMethod@3d1344c8 took 9.5123442 count=717000
                IntersectTest$MyMethod1@3c08c5cb took 2.34665 count=717000
                IntersectTest$MyMethod2@63f1b137 took 4.907277 count=717000
                Running tests for 100x10
                IntersectTest$PostMethod@71695341 took 24.9830339 count=4180000
                IntersectTest$MyMethod1@39d90a92 took 5.8467864 count=4180000
                IntersectTest$MyMethod2@584514e9 took 13.2197964 count=4180000
                Running tests for 100x100
                IntersectTest$PostMethod@21b8dc91 took 195.1796213 count=41060000
                IntersectTest$MyMethod1@6f98c4e2 took 44.5775162 count=41060000
                IntersectTest$MyMethod2@16a60aab took 121.1754402 count=41060000
                Running tests for 100x1000
                IntersectTest$PostMethod@20b37d62 took 200.973133 count=40940000
                IntersectTest$MyMethod1@67ecbdb3 took 45.4832226 count=40940000
                IntersectTest$MyMethod2@679a6812 took 121.791293 count=40940000
                Running tests for 1x1
                IntersectTest$PostMethod@237aa07b took 9.2210288 count=1000000
                IntersectTest$MyMethod1@47bdfd6f took 2.3394042 count=1000000
                IntersectTest$MyMethod2@a49a735 took 6.1688936 count=1000000
                Running tests for 1x10
                IntersectTest$PostMethod@2b25ffba took 9.4103967 count=736000
                IntersectTest$MyMethod1@4bb82277 took 2.2976994 count=736000
                IntersectTest$MyMethod2@25ded977 took 5.3310813 count=736000
                Running tests for 1x100
                IntersectTest$PostMethod@7154a6d5 took 9.3818786 count=704000
                IntersectTest$MyMethod1@6c952413 took 2.3014931 count=704000
                IntersectTest$MyMethod2@33739316 took 5.3307998 count=704000
                Running tests for 1x1000
                IntersectTest$PostMethod@58334198 took 9.3831841 count=736000
                IntersectTest$MyMethod1@d178f65 took 2.3071236 count=736000
                IntersectTest$MyMethod2@5c7369a took 5.4062184 count=736000
                Running tests for 10x2
                IntersectTest$PostMethod@7c4bdf7c took 9.4040537 count=735000
                IntersectTest$MyMethod1@593d85a4 took 2.3584088 count=735000
                IntersectTest$MyMethod2@5610ffc1 took 4.8318229 count=735000
                Running tests for 10x10
                IntersectTest$PostMethod@46bd9fb8 took 23.004925 count=4331000
                IntersectTest$MyMethod1@4b410d50 took 5.5678172 count=4331000
                IntersectTest$MyMethod2@1bd125c9 took 14.6517184 count=4331000
                Running tests for 10x100
                IntersectTest$PostMethod@75d6ecff took 25.0114913 count=4223000
                IntersectTest$MyMethod1@716195c9 took 5.798676 count=4223000
                IntersectTest$MyMethod2@3db0f946 took 13.8064737 count=4223000
                Running tests for 10x1000
                IntersectTest$PostMethod@761d8e2a took 25.1910652 count=4292000
                IntersectTest$MyMethod1@e60a3fb took 5.8621189 count=4292000
                IntersectTest$MyMethod2@6aadbb1c took 13.8150282 count=4292000
                Running tests for 100x1
                IntersectTest$PostMethod@48a50a6e took 9.4141906 count=736000
                IntersectTest$MyMethod1@4b4fe104 took 2.3507252 count=736000
                IntersectTest$MyMethod2@693df43c took 4.7506854 count=736000
                Running tests for 100x10
                IntersectTest$PostMethod@4f7d29df took 24.9574096 count=4219000
                IntersectTest$MyMethod1@2248183e took 5.8628954 count=4219000
                IntersectTest$MyMethod2@2b2fa007 took 12.9836817 count=4219000
                Running tests for 100x100
                IntersectTest$PostMethod@545d7b6a took 193.2436192 count=40987000
                IntersectTest$MyMethod1@4551976b took 44.634367 count=40987000
                IntersectTest$MyMethod2@6fac155a took 119.2478037 count=40987000
                Running tests for 100x1000
                IntersectTest$PostMethod@7b6c238d took 200.4385174 count=40817000
                IntersectTest$MyMethod1@78923d48 took 45.6225227 count=40817000
                IntersectTest$MyMethod2@48f57fcf took 121.0602757 count=40817000
                Running tests for 1x1
                IntersectTest$PostMethod@102c79f4 took 9.0931408 count=1000000
                IntersectTest$MyMethod1@57fa8a77 took 2.3309466 count=1000000
                IntersectTest$MyMethod2@198b7c1 took 5.7627226 count=1000000
                Running tests for 1x10
                IntersectTest$PostMethod@8c646d0 took 9.3208571 count=726000
                IntersectTest$MyMethod1@11530630 took 2.3123797 count=726000
                IntersectTest$MyMethod2@61bb4232 took 5.405318 count=726000
                Running tests for 1x100
                IntersectTest$PostMethod@1a137105 took 9.387384 count=710000
                IntersectTest$MyMethod1@72610ca2 took 2.2938749 count=710000
                IntersectTest$MyMethod2@41849a58 took 5.3865938 count=710000
                Running tests for 1x1000
                IntersectTest$PostMethod@100001c8 took 9.4289031 count=696000
                IntersectTest$MyMethod1@7074f9ac took 2.2977923 count=696000
                IntersectTest$MyMethod2@fb3c4e2 took 5.3724119 count=696000
                Running tests for 10x2
                IntersectTest$PostMethod@52c638d6 took 9.4074124 count=775000
                IntersectTest$MyMethod1@53bd940e took 2.3544881 count=775000
                IntersectTest$MyMethod2@43434e15 took 4.9228549 count=775000
                Running tests for 10x10
                IntersectTest$PostMethod@73b7628d took 23.2110252 count=4374000
                IntersectTest$MyMethod1@ca75255 took 5.5877838 count=4374000
                IntersectTest$MyMethod2@3d0e50f0 took 13.5902641 count=4374000
                Running tests for 10x100
                IntersectTest$PostMethod@6d6bbba9 took 25.1999918 count=4227000
                IntersectTest$MyMethod1@3bed8c5e took 5.7879144 count=4227000
                IntersectTest$MyMethod2@689a8e0e took 13.9617882 count=4227000
                Running tests for 10x1000
                IntersectTest$PostMethod@3da3b0a2 took 25.1627329 count=4222000
                IntersectTest$MyMethod1@45a17b4b took 5.8319523 count=4222000
                IntersectTest$MyMethod2@6ca59ca3 took 13.8885479 count=4222000
                Running tests for 100x1
                IntersectTest$PostMethod@360202a5 took 9.5115367 count=705000
                IntersectTest$MyMethod1@3dfbba56 took 2.3470254 count=705000
                IntersectTest$MyMethod2@598683e4 took 4.8955489 count=705000
                Running tests for 100x10
                IntersectTest$PostMethod@21426d0d took 25.8234298 count=4231000
                IntersectTest$MyMethod1@1005818a took 5.8832067 count=4231000
                IntersectTest$MyMethod2@597b933d took 13.3676148 count=4231000
                Running tests for 100x100
                IntersectTest$PostMethod@6d59b81a took 193.676662 count=41015000
                IntersectTest$MyMethod1@1d45eb0c took 44.6519088 count=41015000
                IntersectTest$MyMethod2@594a6fd7 took 119.1646115 count=41015000
                Running tests for 100x1000
                IntersectTest$PostMethod@6d57d9ac took 200.1651432 count=40803000
                IntersectTest$MyMethod1@2293e349 took 45.5311168 count=40803000
                IntersectTest$MyMethod2@1b2edf5b took 120.1697135 count=40803000

                这是丑陋的(可能有缺陷的)微基准:

                And here is the ugly (and possibly flawed) micro-benchmark:

                import java.util.*;
                
                public class IntersectTest {
                
                    static Random rng = new Random();
                
                    static abstract class RunIt {
                        public long count;
                        public long nsTime;
                        abstract int Run (Set<Long> s1, Set<Long> s2);
                    }
                
                    // As presented in the post
                    static class PostMethod extends RunIt {
                        public int Run(Set<Long> set1, Set<Long> set2) {
                            boolean set1IsLarger = set1.size() > set2.size();
                            Set<Long> cloneSet = new HashSet<Long>(set1IsLarger ? set2 : set1);
                            cloneSet.retainAll(set1IsLarger ? set1 : set2);
                            return cloneSet.size();
                        }
                    }
                
                    // No intermediate HashSet
                    static class MyMethod1 extends RunIt {
                        public int Run (Set<Long> set1, Set<Long> set2) {
                            Set<Long> a;
                            Set<Long> b;
                            if (set1.size() <= set2.size()) {
                                a = set1;
                                b = set2;           
                            } else {
                                a = set2;
                                b = set1;
                            }
                            int count = 0;
                            for (Long e : a) {
                                if (b.contains(e)) {
                                    count++;
                                }           
                            }
                            return count;
                        }
                    }
                
                    // With intermediate HashSet
                    static class MyMethod2 extends RunIt {
                        public int Run (Set<Long> set1, Set<Long> set2) {
                            Set<Long> a;
                            Set<Long> b;
                            Set<Long> res = new HashSet<Long>();
                            if (set1.size() <= set2.size()) {
                                a = set1;
                                b = set2;           
                            } else {
                                a = set2;
                                b = set1;
                            }
                            for (Long e : a) {
                                if (b.contains(e)) {
                                    res.add(e);
                                }           
                            }
                            return res.size();
                        }
                    }
                
                    static Set<Long> makeSet (int count, float load) {
                        Set<Long> s = new HashSet<Long>();
                        for (int i = 0; i < count; i++) {
                            s.add((long)rng.nextInt(Math.max(1, (int)(count * load))));                     
                        }
                        return s;
                    }
                
                    // really crummy ubench stuff
                    public static void main(String[] args) {
                        int[][] bounds = {
                                {1, 1},
                                {1, 10},
                                {1, 100},
                                {1, 1000},
                                {10, 2},
                                {10, 10},
                                {10, 100},
                                {10, 1000},
                                {100, 1},
                                {100, 10},
                                {100, 100},
                                {100, 1000},
                        };
                        int totalReps = 4;
                        int cycleReps = 1000;
                        int subReps = 1000;
                        float load = 0.8f;
                        for (int tc = 0; tc < totalReps; tc++) {
                            for (int[] bound : bounds) {
                                int set1size = bound[0];
                                int set2size = bound[1];
                                System.out.println("Running tests for " + set1size + "x" + set2size);               
                                ArrayList<RunIt> allRuns = new ArrayList<RunIt>(
                                        Arrays.asList(
                                                new PostMethod(),
                                                new MyMethod1(),
                                                new MyMethod2()));
                                for (int r = 0; r < cycleReps; r++) {
                                    ArrayList<RunIt> runs = new ArrayList<RunIt>(allRuns);
                                    Set<Long> set1 = makeSet(set1size, load);
                                    Set<Long> set2 = makeSet(set2size, load);
                                    while (runs.size() > 0) {
                                        int runIdx = rng.nextInt(runs.size());
                                        RunIt run = runs.remove(runIdx);
                                        long start = System.nanoTime();
                                        int count = 0;
                                        for (int s = 0; s < subReps; s++) {
                                            count += run.Run(set1, set2); 
                                        }                       
                                        long time = System.nanoTime() - start;
                                        run.nsTime += time;
                                        run.count += count;
                                    }
                                }
                                for (RunIt run : allRuns) {
                                    double sec = run.nsTime / (10e6);
                                    System.out.println(run + " took " + sec + " count=" + run.count);
                                }
                            }
                        }       
                    }
                }
                

                这篇关于在 Java 中有效地计算两个集合的交集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                <tfoot id='nstQ4'></tfoot>

                    <bdo id='nstQ4'></bdo><ul id='nstQ4'></ul>

                    <small id='nstQ4'></small><noframes id='nstQ4'>

                    <legend id='nstQ4'><style id='nstQ4'><dir id='nstQ4'><q id='nstQ4'></q></dir></style></legend>

                    1. <i id='nstQ4'><tr id='nstQ4'><dt id='nstQ4'><q id='nstQ4'><span id='nstQ4'><b id='nstQ4'><form id='nstQ4'><ins id='nstQ4'></ins><ul id='nstQ4'></ul><sub id='nstQ4'></sub></form><legend id='nstQ4'></legend><bdo id='nstQ4'><pre id='nstQ4'><center id='nstQ4'></center></pre></bdo></b><th id='nstQ4'></th></span></q></dt></tr></i><div id='nstQ4'><tfoot id='nstQ4'></tfoot><dl id='nstQ4'><fieldset id='nstQ4'></fieldset></dl></div>
                        <tbody id='nstQ4'></tbody>