Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 508 Vote(s) - 3.48 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ElasticSearch - string concat aggregation?

#1
I've got the following simple mapping:

"element": {
"dynamic": "false",
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"group": { "type": "string", "index": "not_analyzed" },
"type": { "type": "string", "index": "not_analyzed" }
}
}

Which basically is a way to store `Group` object:

{
id : "...",
elements : [
{id: "...", type: "..."},
...
{id: "...", type: "..."}
]
}

I want to find how many different groups exist sharing the same set of element types (ordered, including repetitions).

An obvious solution would be to change the schema to:

"element": {
"dynamic": "false",
"properties": {
"group": { "type": "string", "index": "not_analyzed" },
"concatenated_list_of_types": { "type": "string", "index": "not_analyzed" }
}
}

But, due to the requirements, we need to be able to exclude some types from group by (aggregation) :(

All fields of the document are mongo ids, so in SQL I would do something like this:

SELECT COUNT(id), concat_value FROM (
SELECT GROUP_CONCAT(type_id), group_id
FROM table
WHERE type_id != 'some_filtered_out_type_id'
GROUP BY group_id
) T GROUP BY concat_value

In Elastic with given mapping it's really easy to filter out, its also not a problem to count assuming we have a concated value. Needless to say, sum aggregation does not work for strings.

How can I get this working? :)

Thanks!
Reply

#2
Finally I solved this problem with [scripting][1] and by changing the mapping.

{
"mappings": {
"group": {
"dynamic": "false",
"properties": {
"id": { "type": "string", "index": "not_analyzed" },
"elements": { "type": "string", "index": "not_analyzed" }
}
}
}
}

There are still some issues with duplicate elements in array (ScriptDocValues.Strings) for some reason strips out dups, but here's an aggregation that counts by string concat:

{
"aggs": {
"path": {
"scripted_metric": {
"map_script": "key = doc['elements'].join('-'); _agg[key] = _agg[key] ? _agg[key] + 1 : 1",
"combine_script": "_agg",
"reduce_script": "_aggs.collectMany { it.entrySet() }.inject( [:] ) { result, e -> result << [ (e.key):e.value + ( result[ e.key ] ?: 0 ) ]}"
}
}
}
}

The result would be as follows:

"aggregations" : {
"path" : {
"value" : {
"5639abfb5cba47087e8b457e" : 362,
"568bfc495cba47fc308b4567" : 3695,
"5666d9d65cba47701c413c53" : 14,
"5639abfb5cba47087e8b4571-5639abfb5cba47087e8b457b" : 1,
"570eb97abe529e83498b473d" : 1
}
}
}


[1]:

[To see links please register here]


Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through