Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50906][SS] Add nullability check for if inputs of to_avro align with schema #49590

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

fanyue-xia
Copy link

What changes were proposed in this pull request?

Previously, we don't explicitly check when input of to_avro is null but the schema does not allow null. As a result, a NPE will be raised in this situation. This PR adds the check during serialization before writing to avro and raises user-facing error if above occurs.

Why are the changes needed?

It makes it easier for the user to understand and face the error.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

Was this patch authored or co-authored using generative AI tooling?

No

@fanyue-xia fanyue-xia force-pushed the to_avro_improve_NPE branch 5 times, most recently from 01e48be to d5ad9b1 Compare January 23, 2025 20:34
row: InternalRow =>
val result = new Record(avroStruct)
var i = 0
while (i < numFields) {
if (row.isNullAt(i)) {
val avroField = avroFields.get(i)
if (!avroField.schema().isNullable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put this nullable info in an array outside the lambda? So that we can simply check

if isSchemaNullable(i)

assert(ex.getCause.isInstanceOf[java.lang.NullPointerException])
assert(ex.getCause.getMessage.contains(
"null value for (non-nullable) string at test_schema.Name"))
checkError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also test null nested columns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants