Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 566 Vote(s) - 3.57 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Better Way to remove cycles from a path in neo4j graph

#1
I am using neo4j graph database version 2.1.7. Brief Details around data:
2 million nodes with 6 different type of nodes, 5 million relationships with only 5 different type of relationships and mostly connected graph but contains a few isolated subgraphs.

While resolving paths, i get cycles in path. And to restrict that, i used the solution shared in below:

[To see links please register here]


Here is the Query, i am using:

MATCH (n:nodeA{key:905728})
MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA)
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a)))
and (length(EXTRACT (p in NODES(path)| p.key)) > 1)
and ((exists (©-[:rel5]->(b)) and (not exists((b)-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where (b)-[]->(x))))
OR (not exists (©-[:rel5]->()) and (not exists (©-[:rel1|rel2|rel3|rel4]->(:nodeA)) OR ANY (x in nodes(path) where ©-[]->(x)))))
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);

The above query solves mine requirement but is not cost effective and keeps running if is run for huge subgraph. I have used 'Profile' command to improve query performance from what i started with. But, now stuck at this point. The performance has improved but, not what i expected from neo4j :(

Reply

#2
I don't know that I have a solution, but I have a number of suggestions. Some might speed things up, some might just make the query easier to read.

Firstly, rather than putting `exists (©-[:rel5]->(b))` in your `WHERE`, I believe you can put it in your `MATCH` like this:

MATCH path = n-[:rel1|rel2|rel3|rel4*0..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA), ©-[:rel5]->(b)

I don't think you need the `exists` keyword. I think you can just say, for example, `(NOT (b)-[:rel1|rel2|rel3|rel4]->(:nodeA))`

I'd also suggest thinking about the [`WITH` clause](

[To see links please register here]

) for potential performance improvements.

A couple of notes about your variable paths: In `*0..` the `0` means that your potentially looking for a self-reference. That may or may not be what you want. Also, leaving the variable path open ended can often cause performance problems (as I think you're seeing). If you can possibly cap it that may help.

Also, if you upgrade to 2.2.1, there are a number of built-in performance improvements with the 2.2.x line, but you also get visual `PROFILE`ing in the console and a new `EXPLAIN` command which both profiles and tells you the real performance of the query after running it.

One thing to consider too is that I don't think you're hitting performance boundaries of Neo4j but rather, perhaps, you're potentially hitting some boundaries of Cypher. If so, I might suggest you do your querying with the Java APIs that Neo4j provides for better performance and more control. This can either be via embedding your database if you're using a JVM-compatible language or by writing an [unmanaged extension](

[To see links please register here]

) which lets you do your own querying in java but provide a custom REST API from the server
Reply

#3
Did a couple of more tweaks to my query as suggested above by Brian. And found improvement in query response time. Now, It takes almost 20% of time in execution compared to my original query and the current query makes almost 60% less db hits, compared to the query i shared earlier, during query execution. PFB the updated query:

MATCH (n:nodeA{key:905728})
MATCH path = n-[:rel1|rel2|rel3|rel4*1..]->(c:nodeA)-[:rel5*0..1]->(b:nodeA)
WHERE ALL(a in nodes(path) where 1=length (filter (m in nodes(path) where m=a)))
and (length(path) > 0)
and ((exists (©-[:rel5]->(b)) and (not (©-[:rel1|rel2|rel3|rel4]->()) OR ANY (x in nodes(path) where ©-[]->(x))))
OR (not exists (©-[:rel5]->()) and (not (©-[:rel1|rel2|rel3|rel4]->()) OR ANY (x in nodes(path) where ©-[]->(x)))))
RETURN distinct EXTRACT (rp in Rels(path)| type(rp)), EXTRACT (p in NODES(path)| p.key);

And observed dramatic improvement when capped the path from *1.. to *1..15. Also, removed one filter from query which too was taking longer time.
But, the query response time increased when queried on nodes having relationships more than 18-20 depths.

I would advise to use profile command oftenly to find pain points in your query. That would help you resolve the issues faster.
Thanks Brian.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through