07-30-2023, 03:17 PM
I've a user model, having attributes as follow:-
class User(Model):
user_id = columns.Integer(primary_key=True)
username = columns.Text()
email = columns.Text()
fname = columns.Text()
lname = columns.Text()
age = columns.Text()
state = columns.Text()
city = columns.Text()
country = columns.Text()
gender = columns.Text()
phone = columns.Text()
school_name = columns.Text()
created_at = columns.Text()
race = columns.boolean()
This is my normal RDBMS model. My queries are as follow:-
1) Get all users with city = 'something'
2) Get a user with email = 'something'
3) Get a user with username = 'something'
4) Get all users with phones IN ('something' )
5) Get all users with state = 'something'
6) Get all users with age > something
7) Get all users with gender = 'something'
8) Get all users with race = 'something'
9) Get count(*),school_name users Group By schoolname
10) Get all users with created_date > 'something' LIMIT 1000
11) Get all users with username IN ('something') AND age IN ('something') AND phone IN ('something') AND state IN ('something') AND so on LIMIT 1000
I can get the above results for queries with a simple Select queries in RDBMS, but the problem lies in Cassandra.
Since, to get the result for the above queries in Cassandra, it is recommended to have a different model per query, which will speed up the reading capability. In this day and age disk is WAY cheaper than it used to be. That being said, I understand that it isn't always easy to just throw more disk at a problem. The bigger problem I see is adjusting the DAO layer of your application to keep 10 different tables in-sync. (Also, my inner instinct is not convinced to have 10 models for different queries. :P )
Can please someone explain me the proper model in Cassandra to get the result for these queries?
PS: The actions on the above model can be Read/Write/Update/Delete. **Query 11** is the most important query.
The most important is to make these queries really fast on large amounts of data, considering that the information about a particular user can be updated.
class User(Model):
user_id = columns.Integer(primary_key=True)
username = columns.Text()
email = columns.Text()
fname = columns.Text()
lname = columns.Text()
age = columns.Text()
state = columns.Text()
city = columns.Text()
country = columns.Text()
gender = columns.Text()
phone = columns.Text()
school_name = columns.Text()
created_at = columns.Text()
race = columns.boolean()
This is my normal RDBMS model. My queries are as follow:-
1) Get all users with city = 'something'
2) Get a user with email = 'something'
3) Get a user with username = 'something'
4) Get all users with phones IN ('something' )
5) Get all users with state = 'something'
6) Get all users with age > something
7) Get all users with gender = 'something'
8) Get all users with race = 'something'
9) Get count(*),school_name users Group By schoolname
10) Get all users with created_date > 'something' LIMIT 1000
11) Get all users with username IN ('something') AND age IN ('something') AND phone IN ('something') AND state IN ('something') AND so on LIMIT 1000
I can get the above results for queries with a simple Select queries in RDBMS, but the problem lies in Cassandra.
Since, to get the result for the above queries in Cassandra, it is recommended to have a different model per query, which will speed up the reading capability. In this day and age disk is WAY cheaper than it used to be. That being said, I understand that it isn't always easy to just throw more disk at a problem. The bigger problem I see is adjusting the DAO layer of your application to keep 10 different tables in-sync. (Also, my inner instinct is not convinced to have 10 models for different queries. :P )
Can please someone explain me the proper model in Cassandra to get the result for these queries?
PS: The actions on the above model can be Read/Write/Update/Delete. **Query 11** is the most important query.
The most important is to make these queries really fast on large amounts of data, considering that the information about a particular user can be updated.