Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50917][EXAMPLES] Add SparkConnectPi Scala example to work both for Connect and Classic #49617

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jan 23, 2025

What changes were proposed in this pull request?

This PR adds SparkConnectPi Scala example to work both for Connect and Classic

Why are the changes needed?

The SparkPi example, mostly as the first step for users to get to know Spark, should be able to run on Spark Connect mode.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Manually build and test

bin/spark-submit --remote 'sc://localhost' --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.13-4.1.0-SNAPSHOT.jar
WARNING: Using incubator modules: jdk.incubator.vector
25/01/23 15:00:03 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
25/01/23 15:00:03 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type
25/01/23 15:00:03 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class
Pi is roughly 3.1388756943784717
25/01/23 15:00:04 INFO ShutdownHookManager: Shutdown hook called
25/01/23 15:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/84/dgr9ykwn6yndcmq1kjxqvk200000gn/T/spark-25ed842e-5888-47ce-bb0b-442385d643cb

Was this patch authored or co-authored using generative AI tooling?

no

@yaooqinn
Copy link
Member Author

cc @cloud-fan @dongjoon-hyun @HyukjinKwon, thank you!

@github-actions github-actions bot added the SQL label Jan 23, 2025
@yaooqinn yaooqinn changed the title [SPARK-50917][EXAMPLES] Make SparkPi Scala example spark-connect compatible [SPARK-50917][EXAMPLES] Add SparkSQLPi Scala example to work both for Connect and Classic Jan 23, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you aim, but this is not SQL in a user perspective, @yaooqinn .

We should distinguish SQL vs Spark Connect because Apache Spark already has Spark SQL modules and user interfaces like JDBC and spark-sql shell. Could you revise the name, 😄 ?

@yaooqinn
Copy link
Member Author

I'd rename it with the FQDN as org.apache.spark.examples.sql.connect.SparkConnectPi

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 24, 2025

Thank you. Please revise the PR title and description accordingly too.

@yaooqinn yaooqinn changed the title [SPARK-50917][EXAMPLES] Add SparkSQLPi Scala example to work both for Connect and Classic [SPARK-50917][EXAMPLES] Add SparkConnectPi Scala example to work both for Connect and Classic Jan 24, 2025
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

/** Computes an approximation to pi with SparkSession/DataFrame APIs */
Copy link
Contributor

@cloud-fan cloud-fan Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the SQL example? My understanding is that the example should just use public SQL/DataFrame APIs and then it will work for both classic and Spark Connect. We should encourage users to use Spark SQL correctly (don't rely on private APIs), and in the example we can enable or disable Spark Connect w.r.t. the arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC,this example seem to be the exact thing you described. Or you were just concerning about the classname?

@dongjoon-hyun
Copy link
Member

To @yaooqinn , the PR description seems to be outdated still~

bin/spark-submit --remote 'sc://localhost' --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.13-4.1.0-SNAPSHOT.jar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants