Datastream fails to read latin1 encoded tables

Hello,

I'm using Datastream to unload binlog files to Google Cloud Storage. Unfortunately, my source database is encoded in latin1 :

 

> SHOW VARIABLES LIKE 'character_set_database';

> latin1

> SHOW VARIABLES LIKE 'collation_database';

> latin1_swedish_ci

Even though I tried to start the stream using a "mysql-source-config" that looks like this :

 

{
    "includeObjects": {
        "mysqlDatabases": [
            {
                "database": "my_db",
                "mysqlTables": [
                    {
                        "table": "my_table",
                        "mysqlColumns": [
                            {
                                "column": "id",
                                "dataType": "int"
                            },
                            {
                                "column": "some_text_col",
                                "dataType": "varchar",
                                "primaryKey": false,
                                "collation": "latin1_swedish_ci"
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "excludeObjects": {}
}

 

The stream fails to read the content of the "some_text_col" column with the following error:

Discarded 2 unsupported events with reason code: MYSQL_DECODE_ERROR. Latest discarded event details: Discarded an event from my_db.my_table: Event Parsing Error: Failed to parse event: === UpdateRowsEvent === Date: 2024-07-03T13:26:08 Log position: 17343280 Event size: 839 Read bytes: 161. Successfully parsed rows: []., caused by: Row Parsing Error: Failed to parse row of table xxx ... [skipping because the full schema is written] 

, caused by:\n Column Parsing Error: Failed to parse bytes:0x312ee382b9e382abe382a4e383a9e383b3e383aae38383e382b8e58f82e58aa0e381aee3819fe38281e38081e4bba5e4b88be381aee98081e8bf8ee38292e3818ae9a198e38184e887b4e38197e381bee38199e380820a382f313728e59c9f2931373a3130e4b887e5baa7e9b9bfe6b2a2e58fa3e9a785e28692e3839be38386e383ab0a382f313828e697a52931373a3035e3839be38386e383abe28692e4b887e5baa7e9b9bfe6b2a2e58fa3e9a7850a322ee383ace382a4e38388e38381e382a7e38383e382afe382a2e382a6e3838831373a3030e381abe381a6e3818ae9a198e38184e887b4e38197e381bee38199e380820a332ee7a681e78599e381abe381a6e3818ae9a198e38184e887b4e38197e381bee38199e380823c62723e44696e6e65722a2030204a5059 as value of column {'type': 252, 'name': 'Comment', 'collation_name': 'latin1_swedish_ci', 'character_set_name': 'latin1', 'comment': '', 'unsigned': False, 'zerofill': False, 'type_is_bool': False, 'is_primary': False, 'fixed_binary_length': None, 'length_size': 2}., caused by:\n 'charmap' codec can't decode byte 0x8f in position 27: character maps to <undefined>",
 
Datastream seems to be actually ignoring the collation parameter provided through the configuration.
Since the error seems to be happening most of the time on the same columns, I also tried to exclude clearly these columns as a dirty fix. 

Unfortunately, Datastream is also ignoring the exclude column parameter : it keeps trying to read the whole row (including all the columns, even the ones I purposely excluded) and keeps failing to read the row, hence leading to a full binary log event being ignored.
 
Is there a way I am not aware of, to make sure Datastream is not ignoring the parameters it is provided?
 
Or at least, to replace the column values with NULL values when it fails to decode it?
 
Thanks in advance!

 

0 0 39