07-30-2023, 01:05 PM
I have some JSON data stored in a JSON (not JSONB) column in my postgresql database (9.4.1). Some of these JSON structures contain unicode sequences in their attribute values. For example:
{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }
When I try to query this JSON column (even if I'm not directly trying to access the `device_name` attribute), I get the following error:
> ERROR: unsupported Unicode escape sequence
> Detail: `\u0000` cannot be converted to text.
You can recreate this error by executing the following command on a postgresql server:
select '{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }'::json->>'client_id'
The error makes sense to me - there is simply no way to represent the unicode sequence `NULL` in a textual result.
Is there any way for me to query the same JSON data without having to perform "sanitation" on the incoming data? These JSON structures change regularly so scanning a specific attribute (`device_name` in this case) would not be a good solution since there could easily be other attributes that might hold similar data.
---
After some more investigations, it seems that this behavior is new for version 9.4.1 as [mentioned in the changelog](
> ...Therefore `\u0000` will now also be rejected in json values when conversion to de-escaped form is required. This change does not break the ability to store `\u0000` in json columns so long as no processing is done on the values...
Was this really the intention? Is a downgrade to pre 9.4.1 a viable option here?
---
<sub>
As a side note, this property is taken from the name of the client's mobile device - it's the user that entered this text into the device. How on earth did a user insert [`NULL`](
</sub>
{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }
When I try to query this JSON column (even if I'm not directly trying to access the `device_name` attribute), I get the following error:
> ERROR: unsupported Unicode escape sequence
> Detail: `\u0000` cannot be converted to text.
You can recreate this error by executing the following command on a postgresql server:
select '{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }'::json->>'client_id'
The error makes sense to me - there is simply no way to represent the unicode sequence `NULL` in a textual result.
Is there any way for me to query the same JSON data without having to perform "sanitation" on the incoming data? These JSON structures change regularly so scanning a specific attribute (`device_name` in this case) would not be a good solution since there could easily be other attributes that might hold similar data.
---
After some more investigations, it seems that this behavior is new for version 9.4.1 as [mentioned in the changelog](
[To see links please register here]
):> ...Therefore `\u0000` will now also be rejected in json values when conversion to de-escaped form is required. This change does not break the ability to store `\u0000` in json columns so long as no processing is done on the values...
Was this really the intention? Is a downgrade to pre 9.4.1 a viable option here?
---
<sub>
As a side note, this property is taken from the name of the client's mobile device - it's the user that entered this text into the device. How on earth did a user insert [`NULL`](
[To see links please register here]
) and [`REPLACEMENT CHARACTER`]([To see links please register here]
) values?!</sub>