1. <tfoot id='RXkIJ'></tfoot>

      <i id='RXkIJ'><tr id='RXkIJ'><dt id='RXkIJ'><q id='RXkIJ'><span id='RXkIJ'><b id='RXkIJ'><form id='RXkIJ'><ins id='RXkIJ'></ins><ul id='RXkIJ'></ul><sub id='RXkIJ'></sub></form><legend id='RXkIJ'></legend><bdo id='RXkIJ'><pre id='RXkIJ'><center id='RXkIJ'></center></pre></bdo></b><th id='RXkIJ'></th></span></q></dt></tr></i><div id='RXkIJ'><tfoot id='RXkIJ'></tfoot><dl id='RXkIJ'><fieldset id='RXkIJ'></fieldset></dl></div>

        <bdo id='RXkIJ'></bdo><ul id='RXkIJ'></ul>
      1. <small id='RXkIJ'></small><noframes id='RXkIJ'>

        <legend id='RXkIJ'><style id='RXkIJ'><dir id='RXkIJ'><q id='RXkIJ'></q></dir></style></legend>

        PySpark-从值列表中添加列

        时间:2024-04-20

        <tfoot id='f0iLV'></tfoot><legend id='f0iLV'><style id='f0iLV'><dir id='f0iLV'><q id='f0iLV'></q></dir></style></legend>

            <small id='f0iLV'></small><noframes id='f0iLV'>

            <i id='f0iLV'><tr id='f0iLV'><dt id='f0iLV'><q id='f0iLV'><span id='f0iLV'><b id='f0iLV'><form id='f0iLV'><ins id='f0iLV'></ins><ul id='f0iLV'></ul><sub id='f0iLV'></sub></form><legend id='f0iLV'></legend><bdo id='f0iLV'><pre id='f0iLV'><center id='f0iLV'></center></pre></bdo></b><th id='f0iLV'></th></span></q></dt></tr></i><div id='f0iLV'><tfoot id='f0iLV'></tfoot><dl id='f0iLV'><fieldset id='f0iLV'></fieldset></dl></div>

                  <tbody id='f0iLV'></tbody>
                • <bdo id='f0iLV'></bdo><ul id='f0iLV'></ul>
                  本文介绍了PySpark-从值列表中添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我必须根据值列表将列添加到PySpark DataFrame。

                  a= spark.createDataFrame([("Dog", "Cat"), ("Cat", "Dog"), ("Mouse", "Cat")],["Animal", "Enemy"])
                  

                  我有一个名为Rating的列表,它是对每只宠物的评级。

                  rating = [5,4,1]
                  

                  我需要向数据帧追加一个名为Rating的列,以便

                  +------+-----+------+
                  |Animal|Enemy|Rating|
                  +------+-----+------+
                  |   Dog|  Cat|     5|
                  |   Cat|  Dog|     4|
                  | Mouse|  Cat|     1|
                  +------+-----+------+
                  

                  我执行了以下操作,但它只返回评级列中列表中的第一个值

                  def add_labels():
                      return rating.pop(0)
                  
                  labels_udf = udf(add_labels, IntegerType())
                  
                  new_df = a.withColumn('Rating', labels_udf()).cache()
                  

                  输出:

                  +------+-----+------+
                  |Animal|Enemy|Rating|
                  +------+-----+------+
                  |   Dog|  Cat|     5|
                  |   Cat|  Dog|     5|
                  | Mouse|  Cat|     5|
                  +------+-----+------+
                  

                  推荐答案

                  from pyspark.sql.functions import monotonically_increasing_id, row_number
                  from pyspark.sql import Window
                  
                  #sample data
                  a= sqlContext.createDataFrame([("Dog", "Cat"), ("Cat", "Dog"), ("Mouse", "Cat")],
                                                 ["Animal", "Enemy"])
                  a.show()
                  
                  #convert list to a dataframe
                  rating = [5,4,1]
                  b = sqlContext.createDataFrame([(l,) for l in rating], ['Rating'])
                  
                  #add 'sequential' index and join both dataframe to get the final result
                  a = a.withColumn("row_idx", row_number().over(Window.orderBy(monotonically_increasing_id())))
                  b = b.withColumn("row_idx", row_number().over(Window.orderBy(monotonically_increasing_id())))
                  
                  final_df = a.join(b, a.row_idx == b.row_idx).
                               drop("row_idx")
                  final_df.show()
                  

                  输入:

                  +------+-----+
                  |Animal|Enemy|
                  +------+-----+
                  |   Dog|  Cat|
                  |   Cat|  Dog|
                  | Mouse|  Cat|
                  +------+-----+
                  

                  输出为:

                  +------+-----+------+
                  |Animal|Enemy|Rating|
                  +------+-----+------+
                  |   Cat|  Dog|     4|
                  |   Dog|  Cat|     5|
                  | Mouse|  Cat|     1|
                  +------+-----+------+
                  

                  这篇关于PySpark-从值列表中添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:如何在超大DF中根据姓名有效地为具有多个条目的个人分配唯一ID 下一篇:错误&Quot;需要Microsoft Visual C++14.0(找不到vcvarsall.bat)&

                  相关文章

                  • <bdo id='SihOk'></bdo><ul id='SihOk'></ul>
                  <i id='SihOk'><tr id='SihOk'><dt id='SihOk'><q id='SihOk'><span id='SihOk'><b id='SihOk'><form id='SihOk'><ins id='SihOk'></ins><ul id='SihOk'></ul><sub id='SihOk'></sub></form><legend id='SihOk'></legend><bdo id='SihOk'><pre id='SihOk'><center id='SihOk'></center></pre></bdo></b><th id='SihOk'></th></span></q></dt></tr></i><div id='SihOk'><tfoot id='SihOk'></tfoot><dl id='SihOk'><fieldset id='SihOk'></fieldset></dl></div>

                • <tfoot id='SihOk'></tfoot>
                • <legend id='SihOk'><style id='SihOk'><dir id='SihOk'><q id='SihOk'></q></dir></style></legend>

                  <small id='SihOk'></small><noframes id='SihOk'>