About 47,400 results
Open links in new tab
  1. Apache Spark™ - Unified Engine for large-scale data analytics

    Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

  2. Overview - Spark 4.0.0 Documentation - Apache Spark

    Running Spark Client Applications Anywhere with Spark Connect. Spark Connect is a new client-server architecture introduced in Spark 3.4 that decouples Spark client applications and allows …

  3. Quick Start - Spark 4.0.0 Documentation - Apache Spark

    Unlike the earlier examples with the Spark shell, which initializes its own SparkSession, we initialize a SparkSession as part of the program. To build the program, we also write a Maven …

  4. Documentation - Apache Spark

    The documentation linked to above covers getting started with Spark, as well the built-in components MLlib, Spark Streaming, and GraphX. In addition, this page lists other resources …

  5. Downloads - Apache Spark

    Download Spark: Verify this release using the and project release KEYS by following these procedures. Note that Spark 4 is pre-built with Scala 2.13, and support for Scala 2.12 has …

  6. Examples - Apache Spark

    Spark is a great engine for small and large datasets. It can be used with single-node/localhost environments, or distributed clusters. Spark’s expansive API, excellent performance, and …

  7. PySpark Overview — PySpark 4.0.0 documentation - Apache Spark

    May 19, 2025 · PySpark combines Python’s learnability and ease of use with the power of Apache Spark to enable processing and analysis of data at any size for everyone familiar with Python. …

  8. Spark SQL & DataFrames - Apache Spark

    Seamlessly mix SQL queries with Spark programs. Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, …

  9. Spark SQL and DataFrames - Spark 4.0.0 Documentation - Apache …

    Spark SQL, DataFrames and Datasets Guide. Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide …

  10. Getting Started — PySpark 4.0.0 documentation - Apache Spark

    Quickstart: Spark Connect. Launch Spark server with Spark Connect; Connect to Spark Connect server; Create DataFrame; Quickstart: Pandas API on Spark. Object Creation; Missing Data; …

Refresh