Chapter 2. Deploy Presto Cluster From Scratch

  • by
Indus os - presto

In this lab, we will be setting up a presto cluster with one coordinator node and one worker node. You can follow the same settings to sync up the “n” number of worker nodes along with the coordinator.
Note: For understanding presto architecture and how a cluster works in presto, you can visit Chapter 1

Steps to perform on the coordinator node :

  1. Pr-requisites:

    Download Presto tar ball from the below link :

    a. The above tar will contain a single directory, presto-server-0.250, which will be called in the installation directory,

    b. Java version: OpenJDK 64-Bit Server VM (build ,
    and Ubuntu 18.04.5 LTS

    2. Create a data directory for storing the logs, create it outside the installation directory for easy upgradations of Presto.

    3. Create an “etc” directory inside the installation path and create the following configurations:

    • Node Properties: environmental configuration specific to each node


    node.environment=my-first-presto-cluster<hostname><path to data directory where logs can be written>
    catalog.config-dir=<recommend to define inside etc directory under presto installation>

    • JVM Config: command line options for the Java Virtual Machine



    discovery.uri=http://<I.P of the coordinoator node>:8080

    • Catalog Properties: configuration for Connectors (data sources)

    create a file inside <presto_home_dir/etc/catalog> and paste the below content:
    hive.metastore.uri=thrift://localhost:9083<put your access key><put your secret key>

    • Log Levels: The optional log levels file, allows setting the minimum log level . Create <presto_home/etc/> and paste the below content

    To serve Presto catalog information such as table schema and partition location, we will be needing hive-metastore. For the first time to launch the Hive Metastore, proceed with the following:

    $ mkdir ~/hive-metastore
    $ cd ~/hive-metastore
    $ wget
    $ tar -xvzf apache-hive-2.3.8-bin.tar.gz
    $ cd apache-hive-2.3.8-bin
    $ export HIVE_HOME=`pwd`
    $ export JAVA_HOME=<path of java installation directory>
    # copy the below lines in ~/hive-metastore/apache-hive-2.3.8-    bin/conf/hive-env.shexport HIVE_AUX_JARS_PATH=${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar:${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar
    export AWS_ACCESS_KEY_ID=<access key>
    export AWS_SECRET_ACCESS_KEY=<secret key>$ mkdir ~/hadoop
    $ cd ~/hadoop
    $ wget
    $ tar -xvf hadoop-3.2.1.tar.gz
    $ cd hadoop-3.2.1
    $ export HADOOP_HOME=`pwd`
    $ cp conf/hive-default.xml.template conf/hive-site.xml
    $ mkdir -p hcatalog/var/log/
    $ bin/schematool -dbType derby -initSchema
    $ hcatalog/sbin/ start

    Start Presto Server

    $ cd <presto_home_dir>/bin
    $ launcher start
    $ cd ..
    $ ./presto --server localhost:8080 --catalog hive
    presto> use default;
    presto:default> select * from system.runtime.nodes;
                   node_id                |         http_uri          | node_version | coordinator | state  
     ffffffff-ffff-ffff-ffff-ffffffffffff | http://<coordinotor_IP>:8080 | 348          | true        | active 
    (1 row)Query 20210411_094403_00021_54idy, FINISHED, 1 node
    Splits: 17 total, 17 done (100.00%)
    0:00 [1 rows, 71B] [4 rows/s, 352B/s]

    Steps to perform on the worker node :

    Follow all the steps as we did above for the coordinator node, there will be only one change for the worker node in <presto_home_dir/etc/> file

    Paste the below content in this file:


    Now check if the worker node is been shown in the cluster

    $ ./presto --server localhost:8080 --catalog hive
    presto> use default;
     select * from system.runtime.nodes;
           node_id       |         http_uri         | node_version | coordinator | state  
     i-049b73cfe3ce27289 | http://<woker_IP>:8080  | 350-e.1      | false       | active 
     i-02e604adaf2f5052c | http://<coordinator_IP>:8080 | 350-e.1      | true        | active

    You can add “n” number of worker nodes for faster query execution and more parallelism.
    All the workers can be configured like above.

    Hope this was helpful!
    See you in next Chapter!
    Happy Learning!
    Shivani S.

Leave a Reply

Your email address will not be published. Required fields are marked *