Spark

Note

You need to install and configure Hadoop, the Google Cloud Storage connector and the Hadoop-AWS module for this feature to work properly.

Note

You must be authenticated towards the appropriate cloud provider for the cloud storage connectors to function.

You can easily create Spark sessions that are automatically configured and authenticated towards the appropriate cloud storage providers using mindlab.spark_session().

mindlab.spark_session(organization: Optional[str] = None, aws_auth: Optional[AWSAuth] = None, gcp_auth: Optional[GCPAuth] = None) → SparkSession

Create a new Spark session.

Returns: A pre-configured SparkSession instance.