Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve]Schema change parses ddl sql using jsqlparser framework #422

Merged
merged 9 commits into from
Jul 17, 2024

Conversation

DongLiang-0
Copy link
Contributor

@DongLiang-0 DongLiang-0 commented Jul 10, 2024

Proposed changes

Problem Summary:

Introduce the com.github.jsqlparser framework to parse DDL SQL statements when schema changes.
The original change principle is to infer the columns that need to be added, deleted or modified in this change based on the last schema structure. In some special scenarios, this method will have certain bugs.
After the jsqlparser framework is introduced, the content of this schema change can be parsed based on the ddl passed by the upstream flink-cdc. Very accurate and no need to rely on the last schema.
At the same time, we also deprecated org.apache.doris.flink.sink.writer.serializer.jsondebezium.JsonDebeziumSchemaChangeImpl, which uses regular expressions to match DDL. It is recommended to use org.apache.doris.flink.sink.writer.serializer.jsondebezium .SQLParserService.

Through schema-change-mode, you can specify which way to parse DDL:

  1. specify sql_parser to use JSQLParser to parse DDL statements;
  2. the default is the debezium_structure parameter, which infers schema change changes by parsing the upstream Debezium structure.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@DongLiang-0 DongLiang-0 marked this pull request as draft July 10, 2024 10:19
@DongLiang-0 DongLiang-0 marked this pull request as ready for review July 11, 2024 10:11
@JNSimba
Copy link
Member

JNSimba commented Jul 11, 2024

Thanks to @DongLiang-0 for his contribution. It is suggested to control it through a switch here.

@DongLiang-0 DongLiang-0 marked this pull request as draft July 15, 2024 03:38
@DongLiang-0 DongLiang-0 force-pushed the improve-schema branch 2 times, most recently from 2c60f2d to 4c5595e Compare July 15, 2024 09:40
@DongLiang-0 DongLiang-0 marked this pull request as ready for review July 16, 2024 06:34
Comment on lines +243 to +245
if (org.apache.commons.lang3.StringUtils.isEmpty(schemaChangeMode)) {
return this;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which schemachangemode is used when it is empty?

Copy link
Contributor Author

@DongLiang-0 DongLiang-0 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonDebeziumSchemaChangeImplV2 is created when empty.

image
@@ -90,6 +92,7 @@ public static void main(String[] args) throws Exception {
.setTableConfig(tableConfig)
.setCreateTableOnly(false)
.setNewSchemaChange(useNewSchemaChange)
.setSchemaChangeMode(schemaChangeMode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if this is not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.


protected void extractSourceConnector(JsonNode record) {
if (Objects.isNull(sourceConnector)) {
sourceConnector =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will sourceConnector be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the program is null when it is started, because the sourceConnector is extracted from the schema change data.

Comment on lines -158 to -159
} else {
LOG.info("Unsupported event type {}", eventType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this branch is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractEventType(JsonNode record)only returns three types of events, namely ALTER, CREATE, and null. where null has been handled here
image

}
List<String> ddlList = tryParserAlterDDLs(recordRoot);
status = executeAlterDDLs(ddlList, recordRoot, dorisTableTuple, status);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing an else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment on lines 160 to 165
if (columnSpecs.contains("default")) {
int defaultIndex = columnSpecs.indexOf("default");
defaultValue = removeQuotes(columnSpecs.get(defaultIndex + 1));
} else if (columnSpecs.contains("DEFAULT")) {
int defaultIndex = columnSpecs.indexOf("DEFAULT");
defaultValue = removeQuotes(columnSpecs.get(defaultIndex + 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any problem with this judgment?

Copy link
Contributor Author

@DongLiang-0 DongLiang-0 Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, in this example, there is no special method to obtain default and comment, only through the columnSpecs collection.
In order to avoid NPE, I also added some prevention methods.
image

Comment on lines 170 to 181
private String extractComment(List<String> columnSpecs) {
String comment = null;
if (columnSpecs.contains("comment")) {
int commentIndex = columnSpecs.indexOf("comment");
comment = removeQuotes(columnSpecs.get(commentIndex + 1));
}
if (columnSpecs.contains("COMMENT")) {
int commentIndex = columnSpecs.indexOf("COMMENT");
comment = removeQuotes(columnSpecs.get(commentIndex + 1));
}
return comment;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same above, does JSqlParser have any suspicious methods to directly obtain the Default value and Comment?

Copy link
Member

@JNSimba JNSimba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JNSimba JNSimba merged commit 13f1fcd into apache:master Jul 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants