Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 307 Vote(s) - 3.58 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Handling Unicode sequences in postgresql

#1
I have some JSON data stored in a JSON (not JSONB) column in my postgresql database (9.4.1). Some of these JSON structures contain unicode sequences in their attribute values. For example:

{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }

When I try to query this JSON column (even if I'm not directly trying to access the `device_name` attribute), I get the following error:

> ERROR: unsupported Unicode escape sequence
> Detail: `\u0000` cannot be converted to text.

You can recreate this error by executing the following command on a postgresql server:

select '{"client_id": 1, "device_name": "FooBar\ufffd\u0000\ufffd\u000f\ufffd" }'::json->>'client_id'

The error makes sense to me - there is simply no way to represent the unicode sequence `NULL` in a textual result.

Is there any way for me to query the same JSON data without having to perform "sanitation" on the incoming data? These JSON structures change regularly so scanning a specific attribute (`device_name` in this case) would not be a good solution since there could easily be other attributes that might hold similar data.

---

After some more investigations, it seems that this behavior is new for version 9.4.1 as [mentioned in the changelog](

[To see links please register here]

):

> ...Therefore `\u0000` will now also be rejected in json values when conversion to de-escaped form is required. This change does not break the ability to store `\u0000` in json columns so long as no processing is done on the values...

Was this really the intention? Is a downgrade to pre 9.4.1 a viable option here?

---

<sub>
As a side note, this property is taken from the name of the client's mobile device - it's the user that entered this text into the device. How on earth did a user insert [`NULL`](

[To see links please register here]

) and [`REPLACEMENT CHARACTER`](

[To see links please register here]

) values?!
</sub>
Reply

#2
`\u0000` is the one Unicode code point which is not valid in a string. I see no other way than to sanitize the string.

Since `json` is just a string in a specific format, you can use the standard string functions, without worrying about the JSON structure. A one-line sanitizer to remove the code point would be:

SELECT (regexp_replace(the_string::text, '\\u0000', '', 'g'))::json;

But you can also insert any character of your liking, which would be useful if the zero code point is used as some form of delimiter.

Note also the subtle difference between what is stored in the database and how it is presented to the user. You can store the code point in a JSON string, but you have to pre-process it to some other character before processing the value as a `json` data type.
Reply

#3
The solution by Patrick didn't work out of the box for me. Regardless there was always an error thrown. I then researched a little more and was able to write a small custom function that fixed the issue for me.

First I could reproduce the error by writing:

select json '{ "a": "null \u0000 escape" }' ->> 'a' as fails

Then I added a custom function which I used in my query:

CREATE OR REPLACE FUNCTION null_if_invalid_string(json_input JSON, record_id UUID)
RETURNS JSON AS $$
DECLARE json_value JSON DEFAULT NULL;
BEGIN
BEGIN
json_value := json_input ->> 'location';
EXCEPTION WHEN OTHERS
THEN
RAISE NOTICE 'Invalid json value: "%". Returning NULL.', record_id;
RETURN NULL;
END;
RETURN json_input;
END;
$$ LANGUAGE plpgsql;

To call the function do this. You should not receive an error.

select null_if_invalid_string('{ "a": "null \u0000 escape" }', id) from my_table


Whereas this should return the json as expected:

select null_if_invalid_string('{ "a": "null" }', id) from my_table

Reply

#4
Just for websearchers, who strand here:

This is not a solution to the exact question, but in some similar cases the solution, if you just **don't want those datasets containing nullbytes in your json**. Just add:

AND json NOT LIKE '%\u0000%'

in your WHERE statement.

You could also use the REPLACE SQL-syntax to sanitize the data:

REPLACE(source_field, '\u0000', '' );
Reply

#5
You can fix all entries with SQL like this:

```
update ___MY_TABLE___
set settings = REPLACE(settings::text, '\u0000', '' )::json
where settings::text like '%\u0000%'
```
Reply

#6
I found [solution][1] that works for me

SELECT (regexp_replace(the_string::text, '(?<!\\)\\u0000', '', 'g'))::json;

Note the match pattern *'(?<!\\)\\u0000'*.

[1]:
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through