Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48841][SQL] Include collationName to sql() of Collate #47265

Closed
wants to merge 3 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jul 9, 2024

What changes were proposed in this pull request?

In the PR, I propose to fix the sql() method of the Collate expression, and append the collationName clause.

Why are the changes needed?

To distinguish column names when the collationName argument is used by collate. Before the changes, columns might conflict like the example below, and that could confuse users:

sql("CREATE TEMP VIEW tbl as (SELECT collate('A', 'UTF8_BINARY'), collate('A', 'UTF8_LCASE'))")
  • Before:
[COLUMN_ALREADY_EXISTS] The column `collate(a)` already exists. Choose another name or rename the existing column. SQLSTATE: 42711
org.apache.spark.sql.AnalysisException: [COLUMN_ALREADY_EXISTS] The column `collate(a)` already exists. Choose another name or rename the existing column. SQLSTATE: 42711
	at org.apache.spark.sql.errors.QueryCompilationErrors$.columnAlreadyExistsError(QueryCompilationErrors.scala:2595)
	at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:115)
	at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:97)
  • After:
describe extended tbl;
+-----------------------+-------------------------+-------+
|col_name               |data_type                |comment|
+-----------------------+-------------------------+-------+
|collate(A, UTF8_BINARY)|string                   |NULL   |
|collate(A, UTF8_LCASE) |string collate UTF8_LCASE|NULL   |
+-----------------------+-------------------------+-------+

Does this PR introduce any user-facing change?

Should not.

How was this patch tested?

Update existed UT.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jul 9, 2024
@panbingkun panbingkun marked this pull request as ready for review July 9, 2024 11:54
@panbingkun
Copy link
Contributor Author

@panbingkun
Copy link
Contributor Author

The file conflict has been resolved, let it run GA again
Thanks all.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 73eab92 Jul 12, 2024
jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?
In the PR, I propose to fix the `sql()` method of the `Collate` expression, and append the `collationName` clause.

### Why are the changes needed?
To distinguish column names when the `collationName` argument is used by `collate`. Before the changes, columns might conflict like the example below, and that could confuse users:
```
sql("CREATE TEMP VIEW tbl as (SELECT collate('A', 'UTF8_BINARY'), collate('A', 'UTF8_LCASE'))")
```
- Before:
```
[COLUMN_ALREADY_EXISTS] The column `collate(a)` already exists. Choose another name or rename the existing column. SQLSTATE: 42711
org.apache.spark.sql.AnalysisException: [COLUMN_ALREADY_EXISTS] The column `collate(a)` already exists. Choose another name or rename the existing column. SQLSTATE: 42711
	at org.apache.spark.sql.errors.QueryCompilationErrors$.columnAlreadyExistsError(QueryCompilationErrors.scala:2595)
	at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:115)
	at org.apache.spark.sql.util.SchemaUtils$.checkColumnNameDuplication(SchemaUtils.scala:97)
```

- After:
```
describe extended tbl;
+-----------------------+-------------------------+-------+
|col_name               |data_type                |comment|
+-----------------------+-------------------------+-------+
|collate(A, UTF8_BINARY)|string                   |NULL   |
|collate(A, UTF8_LCASE) |string collate UTF8_LCASE|NULL   |
+-----------------------+-------------------------+-------+

```

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
Update existed UT.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47265 from panbingkun/SPARK-48841.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants