In this lab, we will be setting up a presto cluster with one coordinator node and one worker node. You can follow the same settings to sync up the “n” number of worker nodes along with the coordinator.
Note: For understanding presto architecture and how a cluster works in presto, you can visit Chapter 1
Steps to perform on the coordinator node :
-
Pr-requisites:
Download Presto tar ball from the below link :https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.250/presto-server-0.250.tar.gz
a. The above tar will contain a single directory,
presto-server-0.250
, which will be called in the installation directory,b. Java version: OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.18.04) ,
and Ubuntu 18.04.5 LTS2. Create a data directory for storing the logs, create it outside the installation directory for easy upgradations of Presto.
3. Create an “etc” directory inside the installation path and create the following configurations:
-
Node Properties: environmental configuration specific to each node
<presto_home_dir/etc/node.properties>
node.environment=my-first-presto-cluster
node.id=<hostname>
node.data-dir=<path to data directory where logs can be written>
catalog.config-dir=<recommend to define inside etc directory under presto installation>
node.server-log-file=<path/to/server_logs_dir/server.log>
node.launcher-log-file=</path/to/launcher_log/launcher.log>-
JVM Config: command line options for the Java Virtual Machine
<presto_home_dir/etc/jvm.config>
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:+IgnoreUnrecognizedVMOptionscoordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://<I.P of the coordinoator node>:8080-
Catalog Properties: configuration for Connectors (data sources)
create a file hive.properties inside <presto_home_dir/etc/catalog> and paste the below content:
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.s3.aws-access-key=<put your access key>
hive.s3.aws-secret-key=<put your secret key>
hive.non-managed-table-writes-enabled=true-
Log Levels: The optional log levels file, allows setting the minimum log level . Create <presto_home/etc/log.properties> and paste the below content
com.app.presto=INFO
To serve Presto catalog information such as table schema and partition location, we will be needing hive-metastore. For the first time to launch the Hive Metastore, proceed with the following:
$ mkdir ~/hive-metastore $ cd ~/hive-metastore $ wget https://downloads.apache.org/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz $ tar -xvzf apache-hive-2.3.8-bin.tar.gz $ cd apache-hive-2.3.8-bin $ export HIVE_HOME=`pwd` $ export JAVA_HOME=<path of java installation directory> # copy the below lines in ~/hive-metastore/apache-hive-2.3.8- bin/conf/hive-env.shexport HIVE_AUX_JARS_PATH=${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar:${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar export AWS_ACCESS_KEY_ID=<access key> export AWS_SECRET_ACCESS_KEY=<secret key>$ mkdir ~/hadoop $ cd ~/hadoop $ wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz $ tar -xvf hadoop-3.2.1.tar.gz $ cd hadoop-3.2.1 $ export HADOOP_HOME=`pwd` $ cp conf/hive-default.xml.template conf/hive-site.xml $ mkdir -p hcatalog/var/log/ $ bin/schematool -dbType derby -initSchema $ hcatalog/sbin/hcat_server.sh start
Start Presto Server
$ cd <presto_home_dir>/bin $ launcher start $ cd .. $ ./presto --server localhost:8080 --catalog hive presto> use default; USE presto:default> select * from system.runtime.nodes; node_id | http_uri | node_version | coordinator | state --------------------------------------+---------------------------+--------------+-------------+-------- ffffffff-ffff-ffff-ffff-ffffffffffff | http://<coordinotor_IP>:8080 | 348 | true | active (1 row)Query 20210411_094403_00021_54idy, FINISHED, 1 node Splits: 17 total, 17 done (100.00%) 0:00 [1 rows, 71B] [4 rows/s, 352B/s]
Steps to perform on the worker node :
Follow all the steps as we did above for the coordinator node, there will be only one change for the worker node in <presto_home_dir/etc/config.properties> file
Paste the below content in this file:
coordinator=false http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB #discovery-server.enabled=true discovery.uri=http://<coordinator_node_IP>:8080
Now check if the worker node is been shown in the cluster
$ ./presto --server localhost:8080 --catalog hive presto> use default; USE select * from system.runtime.nodes; node_id | http_uri | node_version | coordinator | state ---------------------+--------------------------+--------------+-------------+-------- i-049b73cfe3ce27289 | http://<woker_IP>:8080 | 350-e.1 | false | active i-02e604adaf2f5052c | http://<coordinator_IP>:8080 | 350-e.1 | true | active
You can add “n” number of worker nodes for faster query execution and more parallelism.
All the workers can be configured like above.Hope this was helpful!
See you in next Chapter!
Happy Learning!
Shivani S. -