Login

**<h1>My Understanding on auto_bootstrap is</h1>**

Below are my understanding about `auto_bootstrap` property. At first, please correct me if I am wrong at any point.

Initially the property ‘`auto_bootstrap`’ will not be available in the `cassandra.yaml` file. This means that the default value was ‘true’.

**true** - this means that bootstrap/stream the data to the respective node from all the other nodes while starting/restarting <br>
**false** - do not stream the data while starting/restarting

**Where do we need ‘auto_bootstrap: true’**

1) When a new node needs to be added in the existing cluster, this needs to set to **‘true’** to bootstrap the data automatically from all the other nodes in the cluster. This will take some considerable amount of time (based on the current load of the cluster) to get the new node added in the cluster. But this will make the load balance automatically in the cluster.

**Where do we need ‘auto_bootstrap: false’**

1) When a new node needs to be added quickly in the existing cluster without bootstrapping the data, this needs to set to **‘false’**. The new node will be added quickly irrespective of the current load of the cluster. Later we need to manually stream the data to the new node to make cluster load balanced.

2) When initializing the fresh cluster with no data, this needs to set to **‘false’**. At least the first seed node to be started/added in the fresh cluster should have the value as ‘false’.

**<h1>My Question is</h1>**

We are using Cassandra 2.0.3 of six nodes with two data centers (each has 3 nodes). Our Cassandra is a ***stand-alone process*** (not service). I am going to change few properties in `cassandra.yaml` file for one node. It is apparent that node should be restarted after updating the `cassandra.ymal` file to take the changes effect. Our cluster is loaded with huge data.

**How to restart the node**<br>
After killing the node, I can simply restart the node as below <br>

$ cd install_location
$ bin/cassandra
This means that restart the node with no `auto_bootstrap` property (default is true).

**with 'true'**

1) The node to be restarted currently has its own huge data. Does the node bootstrap again all its own data and replace the existing data. <br>
2) Will it take more time the node to join the cluster again.

**with 'false'**

I do not want to bootstrap the data. So <br>
3) Can I add the property as `auto_bootstrap: false` and restart the node as mentioned above. <br>
4) After successful restart I will go and delete the auto_bootstrap property. Is that okay?

**Else**
5) As I am restarting the node with the same ip address, Will the cluster automatically identify that this is an existing node through gossip info and hence restart the node without streaming the data despite auto_bootstrap is set to true or not present in `cassandra.yaml` file?

As I am restarting an existing node with the same ip address, restart will happen without streaming any data despite the value of auto_bootstrap. So we can merely restart the existing node without touching any parameters. So option 5 fits here.

First of all, you should always run

> nodetool drain

on the node before killing Cassandra so that client connections/ongoing operations have a chance to gracefully complete.

Assuming that the node was fully bootstrapped & had status "Up" and "Joined": when you start Cassandra up again, the node will **not** need to bootstrap again since it's already joined the cluster & taken ownership of certain sets of tokens. However, it will need to catch up with the data that has been mutated since it was down. Therefore, the commitlogs that occurred during that time will be streamed to the node and the changes will be applied. So, it will take much less time to start up after it has bootstrapped once. Just don't leave it down for too long.

You should not set auto_bootstrap to false unless you're creating the first seed node for a new cluster.

The node will be identified as a pre-existing node which has tokens assigned to it by virtue of the host id that is assigned to it when it joins the cluster. The IP address does not matter unless it is a seed node.

chiwuyavqjzv

trachyglossate684510

ergostat675997