About 29,000,000 results
Open links in new tab
  1. scala - What is RDD in spark - Stack Overflow

    Dec 23, 2015 · An RDD is, essentially, the Spark representation of a set of data, spread across multiple machines, with APIs to let you act on it. An RDD could come from any datasource, …

  2. Difference between DataFrame, Dataset, and RDD in Spark

    Feb 18, 2020 · I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]) in Apache Spark? Can you convert …

  3. java - What are the differences between Dataframe, Dataset, and …

    Sep 27, 2021 · In Apache Spark, what are the differences between those API? Why and when should we choose one over the others?

  4. What's the difference between RDD and Dataframe in Spark?

    Aug 20, 2019 · RDD stands for Resilient Distributed Datasets. It is Read-only partition collection of records. RDD is the fundamental data structure of Spark. It allows a programmer to perform …

  5. Difference between RDD.foreach () and RDD.map () - Stack Overflow

    Jan 19, 2018 · I am learning Spark in Python and wondering can anyone explain the difference between the action foreach() and transformation map()? rdd.map() returns a new RDD, like the …

  6. scala - How to print the contents of RDD? - Stack Overflow

    Apr 20, 2014 · } Example usage: val rdd = sc.parallelize(List(1,2,3,4)).map(_*2) p(rdd) // 1 rdd.print // 2 Output: 2 6 4 8 Important This only makes sense if you are working in local mode …

  7. lambda - Pyspark RDD column value selection - Stack Overflow

    Apr 15, 2022 · Pyspark RDD column value selection Asked 3 years ago Modified 3 years ago Viewed 803 times

  8. Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]

    Oct 24, 2014 · Cartesian product and combinations are two different things, the cartesian product will create an RDD of size rdd.size() ^ 2 and combinations will create an RDD of size rdd.size() …

  9. How to create a Spark Dataset from an RDD - Stack Overflow

    May 29, 2016 · I have an RDD[LabeledPoint] intended to be used within a machine learning pipeline. How do we convert that RDD to a DataSet? Note the newer spark.ml apis require …

  10. Difference and use-cases of RDD and Pair RDD - Stack Overflow

    May 6, 2016 · I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal …