Chapter 2. Deploy Presto Cluster From Scratch

  • by

In this lab, we will be setting up a presto cluster with one coordinator node and one worker node. You can follow the same settings to sync up the “n” number of worker nodes along with the coordinator.
Note: For understanding presto architecture and how a cluster works in presto, you can visit Chapter 1

Steps to perform on the coordinator node :

  1. Pr-requisites:

    Download Presto tar ball from the below link :https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.250/presto-server-0.250.tar.gz

    a. The above tar will contain a single directory, presto-server-0.250, which will be called in the installation directory,

    b. Java version: OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.18.04) ,
    and Ubuntu 18.04.5 LTS

    2. Create a data directory for storing the logs, create it outside the installation directory for easy upgradations of Presto.

    3. Create an “etc” directory inside the installation path and create the following configurations:

    • Node Properties: environmental configuration specific to each node

    <presto_home_dir/etc/node.properties>

    node.environment=my-first-presto-cluster
    node.id=<hostname>
    node.data-dir=<path to data directory where logs can be written>
    catalog.config-dir=<recommend to define inside etc directory under presto installation>
    node.server-log-file=<path/to/server_logs_dir/server.log>
    node.launcher-log-file=</path/to/launcher_log/launcher.log>

    • JVM Config: command line options for the Java Virtual Machine

    <presto_home_dir/etc/jvm.config>

    -server
    -Xmx16G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    -XX:+IgnoreUnrecognizedVMOptions

    coordinator=true
    node-scheduler.include-coordinator=false
    http-server.http.port=8080
    query.max-memory=5GB
    query.max-memory-per-node=2GB
    discovery-server.enabled=true
    discovery.uri=http://<I.P of the coordinoator node>:8080

    • Catalog Properties: configuration for Connectors (data sources)

    create a file hive.properties inside <presto_home_dir/etc/catalog> and paste the below content:

    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://localhost:9083
    hive.s3.aws-access-key=<put your access key>
    hive.s3.aws-secret-key=<put your secret key>
    hive.non-managed-table-writes-enabled=true

    • Log Levels: The optional log levels file, allows setting the minimum log level . Create <presto_home/etc/log.properties> and paste the below content

    com.app.presto=INFO

    To serve Presto catalog information such as table schema and partition location, we will be needing hive-metastore. For the first time to launch the Hive Metastore, proceed with the following:

    $ mkdir ~/hive-metastore
    $ cd ~/hive-metastore
    $ wget https://downloads.apache.org/hive/hive-2.3.8/apache-hive-2.3.8-bin.tar.gz
    $ tar -xvzf apache-hive-2.3.8-bin.tar.gz
    $ cd apache-hive-2.3.8-bin
    $ export HIVE_HOME=`pwd`
    $ export JAVA_HOME=<path of java installation directory>
    # copy the below lines in ~/hive-metastore/apache-hive-2.3.8-    bin/conf/hive-env.shexport HIVE_AUX_JARS_PATH=${HADOOP_HOME}/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.375.jar:${HADOOP_HOME}/share/hadoop/tools/lib/hadoop-aws-3.2.1.jar
    export AWS_ACCESS_KEY_ID=<access key>
    export AWS_SECRET_ACCESS_KEY=<secret key>$ mkdir ~/hadoop
    $ cd ~/hadoop
    $ wget  https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
    $ tar -xvf hadoop-3.2.1.tar.gz
    $ cd hadoop-3.2.1
    $ export HADOOP_HOME=`pwd`
    $ cp conf/hive-default.xml.template conf/hive-site.xml
    $ mkdir -p hcatalog/var/log/
    $ bin/schematool -dbType derby -initSchema
    $ hcatalog/sbin/hcat_server.sh start

    Start Presto Server

    $ cd <presto_home_dir>/bin
    $ launcher start
    $ cd ..
    $ ./presto --server localhost:8080 --catalog hive
    presto> use default;
    USE
    presto:default> select * from system.runtime.nodes;
                   node_id                |         http_uri          | node_version | coordinator | state  
    --------------------------------------+---------------------------+--------------+-------------+--------
     ffffffff-ffff-ffff-ffff-ffffffffffff | http://<coordinotor_IP>:8080 | 348          | true        | active 
    (1 row)Query 20210411_094403_00021_54idy, FINISHED, 1 node
    Splits: 17 total, 17 done (100.00%)
    0:00 [1 rows, 71B] [4 rows/s, 352B/s]

    Steps to perform on the worker node :

    Follow all the steps as we did above for the coordinator node, there will be only one change for the worker node in <presto_home_dir/etc/config.properties> file

    Paste the below content in this file:

    coordinator=false
    http-server.http.port=8080
    query.max-memory=5GB
    query.max-memory-per-node=1GB
    query.max-total-memory-per-node=2GB
    #discovery-server.enabled=true
    discovery.uri=http://<coordinator_node_IP>:8080

    Now check if the worker node is been shown in the cluster

    $ ./presto --server localhost:8080 --catalog hive
    presto> use default;
    USE
     select * from system.runtime.nodes;
           node_id       |         http_uri         | node_version | coordinator | state  
    ---------------------+--------------------------+--------------+-------------+--------
     i-049b73cfe3ce27289 | http://<woker_IP>:8080  | 350-e.1      | false       | active 
     i-02e604adaf2f5052c | http://<coordinator_IP>:8080 | 350-e.1      | true        | active

    You can add “n” number of worker nodes for faster query execution and more parallelism.
    All the workers can be configured like above.

    Hope this was helpful!
    See you in next Chapter!
    Happy Learning!
    Shivani S.

Leave a Reply

Your email address will not be published. Required fields are marked *