3
如何在PySpark中指定Parquet Block Size和Page Size?我到處搜索,但找不到函數調用或導入庫的任何文檔。指定Parquet屬性pyspark
如何在PySpark中指定Parquet Block Size和Page Size?我到處搜索,但找不到函數調用或導入庫的任何文檔。指定Parquet屬性pyspark
sc.hadoopConfiguration.setInt("dfs.blocksize", some_value)
sc.hadoopConfiguration.setInt("parquet.block.size", some_value)
所以PySpark
sc._jsc.hadoopConfiguration().setInt("dfs.blocksize", some_value)
sc._jsc.hadoopConfiguration().setInt("parquet.block.size", some_value)