GeoMesa安装

GeoMesa安装

GeoMesa的安装主要包括5个组件的安装,分别是:

  • GeoMesa Accumulo安装
  • GeoMesa Kafka安装
  • GeoMesa HBase安装
  • GeoMesa Bigtable安装
  • GeoMesa Cassandra安装

1 GeoMesa Accumulo安装

Apache Accumulo 是一个可靠的、可伸缩的、高性能的排序分布式的 Key-Value 存储解决方案,基于单元访问控制以及可定制的服务器端处理。使用 Google BigTable 设计思路,基于 Apache Hadoop、Zookeeper 和 Thrift 构建。
GeoMesa 提供了对Accumulo的支持,可以将时空数据存储到Accumulo中。

1.1 安装要求

操作系统为Unbunu 14.04 server,需要有sudo权限,并且交换分区最少为2G。
目前GeoMesa 支持的Accumulo版本为1.7。因为Accumulo依赖于Hadoop和Zookeeper,因此在安装Accumulo之前,要首先安装Hadoop和Zookepper。为了方便演示和学习,本教程中Hadoop和Zookeeper以及Accumulo安装的都是单机版本,并会安装到同一台机器上。关于集群的安装和配置,请查阅相关文档。
GeoMesa Accumulo还提供了通过GeoServer访问的能力,因此需要使用该功能的话,还需要安装GeoServer。目前GeoMesa Accumulo支持的GeoServer版本为2.9.1。

1.2 安装Hadoop

安装java

首先,需要安装Java1.8,打开Ubuntu命令行,添加pa:webupd8team/java并更新apt-get,如下面代码:

root@HDMachine:~$ sudo add-apt-repository ppa:webupd8team/java    
root@HDMachine:~$ sudo apt-get update

然后安装Java1.8

root@HDMachine:~$ sudo apt-get install oracle-java8-installer

安装完java之后可以使用下面命令来查看java版本:

root@HDMachine:~$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

创建hadoop用户

使用下面的命令创建hadoop用户:

root@HDMachine:~$ sudo addgroup hadoop
Adding group `hadoop' (GID 1002) ...
Done.
root@HDMachine:~$ sudo adduser --ingroup hadoop hduser
Adding user `hduser' ...
Adding new user `hduser' (1001) with group `hadoop' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
    Full Name []: 
    Room Number []: 
    Work Phone []: 
    Home Phone []: 
    Other []: 
Is the information correct? [Y/n] Y

将hduser用户加入到sudo用户列表中,命令如下:

hduser@HDMachine:~$ su root
Password: 

root@HDMachine:/home/hduser$ sudo adduser hduser sudo
[sudo] password for root: 
Adding user `hduser' to group `sudo' ...
Adding user hduser to group sudo
Done.

安装SSH

ssh包含两部分,ssh和sshd:

  • ssh:客户端,用于连接远程机器。
  • sshd:服务器,用于接受客户端的连接请求 ssh在Linux上是默认安装的,但是为了能够启动sshd服务,需要重新安装ssh,命令如下:
root@HDMachine:~$ sudo apt-get install ssh

使用下面命令检查ssh是否安装成功:

root@HDMachine:~$ which ssh
/usr/bin/ssh

root@HDMachine:~$ which sshd
/usr/sbin/sshd

创建和安装SSH证书

Hadoop使用SSH来管理其节点,对应单节点的安装,我们需要配置SSH能够免密码访问locahost。当ssh-keygen命令需要输入文件名的时候,直接按回车键即可创建没有密码的公钥。命令如下:

root@HDMachine:~$ su hduser
Password: 
hduser@HDMachine:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa): 
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
5c:9f:d5:64:8c:fa:2a:a0:a5:48:ff:5b:ed:9d:e0:85 hduser@HDMachine
The key's randomart image is:
+--[ RSA 2048]----+
|               oo|
|              .+.|
|          .  .. .|
|       . . ..o   |
|        S   o.   |
|    .   o  . ..  |
|   . o + .. E..  |
|    . +  ..o.+ . |
|       .o. .o o  |
+-----------------+


hduser@HDMachine:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

第二个命令将新创建的密钥添加到授权密钥列表中,以便Hadoop可以使用ssh免密码登陆。
使用下面命令可以检查ssh是否工作:

hduser@HDMachine:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is e1:8b:a0:a5:75:ef:f4:b4:5e:a9:ed:be:64:be:5c:2f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-40-generic x86_64)

下载Hadoop

使用wget命令去hadoop网站下载hadoop2.8.0安装包,并解压,命令如下:

hduser@HDMachine:~$ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
hduser@HDMachine:~$ tar xvzf hadoop-2.8.0.tar.gz

为了方便管理和使用,将hadoop移动到/usr/local/hadoop目录,命令如下:

hduser@HDMachine:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop
[sudo] password for hduser: 

修改hadoop配置文件

需要修改的配置文件主要有5个:

  • ~/.bashrc
  • /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  • /usr/local/hadoop/etc/hadoop/core-site.xml
  • /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
  • /usr/local/hadoop/etc/hadoop/hdfs-site.xml
配置~/.bashrc

在修改 .bashrc文件之前,需要先找到Java的安装路径,命令如下:

hduser@HDMachine update-alternatives --config java
There is only one alternative in link group java (providing /usr/bin/java):  /usr/lib/jvm/java-8-oracle/jre/bin/java 
Nothing to configure.

找到Java的安装路径后,即可以在.bashrc文件设置环境变量JAVA_HOME。在使用vi命令打开.bashrc文件,并在文件末尾添加下面的环境变量:

hduser@HDMachine:~$ vi ~/.bashrc

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

hduser@HDMachine:~$ source ~/.bashrc

最后使用source命令使环境变量起作用。
需要注意的是JAVA_HOME环境变量设置的目录是java目录的’…/bin/’,即 ‘/usr/lib/jvm/java-8-oracle/jre/bin/java’ 目录中的’/usr/lib/jvm/java-8-oracle’。

配置 /usr/local/hadoop/etc/hadoop/hadoop-env.sh

在hadoop-env.sh文件中需要设置 JAVAHOME 环境变量,使用vi编辑hadoop-env.sh文件,并设置JAVAHOME环境变量。命令如下:

hduser@HDMachine:~$ vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

在hadoop-env.sh文件中添加上述语句可确保每当Hadoop启动时,JAVA_HOME变量的值都可用于Hadoop。

配置 /usr/local/hadoop/etc/hadoop/core-site.xml

/usr/local/hadoop/etc/hadoop/core-site.xml文件包含Hadoop启动时候的配置选项,在该文件里面配置的选项会覆盖Hadoop的默认设置。
在配置/usr/local/hadoop/etc/hadoop/core-site.xml文件前,先创建临时目录,用于Hadoop临时文件的存储,命令如下:

hduser@HDMachine:~$ sudo mkdir -p /app/hadoop/tmp
hduser@HDMachine:~$ sudo chown hduser:hadoop /app/hadoop/tmp

然后使用vi命令打开/usr/local/hadoop/etc/hadoop/core-site.xml文件,并输入以下内容:

hduser@HDMachine:~$ vi /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
 </property>

 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
 </property>
</configuration>

在这个配置里需要记住hdfs的路径,即’hdfs://localhost:54310’,这个在后面配置Accumulo的时候会用到。

配置/usr/local/hadoop/etc/hadoop/mapred-site.xml

默认情况下, /usr/local/hadoop/etc/hadoop/目录下包含/usr/local/hadoop/etc/hadoop/mapred-site.xml.template 文件,没有mapred-site.xml文件。因此我们要拷贝mapred-site.xml.template文件并重命名为mapred-site.xml,命令如下:

hduser@HDMachine:~$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

mapred-site.xml用来指定MapReduce使用的框架。使用vi编辑mapred-site.xml文件,并输入下面的内容:
~~~ hduser@HDMachine:~$ vi /usr/local/hadoop/etc/hadoop/mapred-site.xml

mapred.job.tracker localhost:54311 The host and port that the MapReduce job tracker runs at. If “local”, then jobs are run in-process as a single map and reduce task. ~~~

配置 /usr/local/hadoop/etc/hadoop/hdfs-site.xml

/usr/local/hadoop/etc/hadoop/hdfs-site.xml需要在集群中的每台机器上配置,其主要用来配置namenode 和datanode使用的目录。在配置该文件之前,先创建两个目录用来给namenode和datanode使用,命令如下:

hduser@HDMachine:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
hduser@HDMachine:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
hduser@HDMachine:~$ sudo chown -R hduser:hadoop /usr/local/hadoop_store

然后使用vi编辑hdfs-site.xml文件,并输入下面的内容:

hduser@HDMachine:~$ vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<configuration>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
</configuration>
格式化Hadoop文件系统

在使用Hadoop之前,我们需要格式化Hadoop文件系统,命令如下:

hduser@HDMachine:~$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

17/05/03 11:12:45 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hduser
STARTUP_MSG:   host = HDMachine/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.8.0
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009; compiled by 'jdu' on 2017-03-17T04:12Z
STARTUP_MSG:   java = 1.8.0_121
************************************************************/
17/05/03 11:12:45 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
17/05/03 11:12:45 INFO namenode.NameNode: createNameNode [-format]
17/05/03 11:12:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-ae42624b-2814-4b43-b473-a181d3075c2e
17/05/03 11:12:49 INFO namenode.FSEditLog: Edit logging is async:false
17/05/03 11:12:49 INFO namenode.FSNamesystem: KeyProvider: null
17/05/03 11:12:49 INFO namenode.FSNamesystem: fsLock is fair: true
17/05/03 11:12:49 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
17/05/03 11:12:49 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
17/05/03 11:12:49 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
17/05/03 11:12:49 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
17/05/03 11:12:49 INFO blockmanagement.BlockManager: The block deletion will start around 2017 May 03 11:12:49
17/05/03 11:12:49 INFO util.GSet: Computing capacity for map BlocksMap
17/05/03 11:12:49 INFO util.GSet: VM type       = 64-bit
17/05/03 11:12:49 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
17/05/03 11:12:49 INFO util.GSet: capacity      = 2^21 = 2097152 entries
17/05/03 11:12:49 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
17/05/03 11:12:49 INFO blockmanagement.BlockManager: defaultReplication         = 1
17/05/03 11:12:49 INFO blockmanagement.BlockManager: maxReplication             = 512
17/05/03 11:12:49 INFO blockmanagement.BlockManager: minReplication             = 1
17/05/03 11:12:49 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
17/05/03 11:12:49 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
17/05/03 11:12:49 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
17/05/03 11:12:49 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
17/05/03 11:12:49 INFO namenode.FSNamesystem: fsOwner             = hduser (auth:SIMPLE)
17/05/03 11:12:49 INFO namenode.FSNamesystem: supergroup          = supergroup
17/05/03 11:12:49 INFO namenode.FSNamesystem: isPermissionEnabled = true
17/05/03 11:12:49 INFO namenode.FSNamesystem: HA Enabled: false
17/05/03 11:12:49 INFO namenode.FSNamesystem: Append Enabled: true
17/05/03 11:12:50 INFO util.GSet: Computing capacity for map INodeMap
17/05/03 11:12:50 INFO util.GSet: VM type       = 64-bit
17/05/03 11:12:50 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
17/05/03 11:12:50 INFO util.GSet: capacity      = 2^20 = 1048576 entries
17/05/03 11:12:50 INFO namenode.FSDirectory: ACLs enabled? false
17/05/03 11:12:50 INFO namenode.FSDirectory: XAttrs enabled? true
17/05/03 11:12:50 INFO namenode.NameNode: Caching file names occurring more than 10 times
17/05/03 11:12:51 INFO util.GSet: Computing capacity for map cachedBlocks
17/05/03 11:12:51 INFO util.GSet: VM type       = 64-bit
17/05/03 11:12:51 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
17/05/03 11:12:51 INFO util.GSet: capacity      = 2^18 = 262144 entries
17/05/03 11:12:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/05/03 11:12:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/05/03 11:12:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
17/05/03 11:12:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
17/05/03 11:12:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
17/05/03 11:12:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
17/05/03 11:12:51 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/05/03 11:12:51 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/05/03 11:12:51 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/05/03 11:12:51 INFO util.GSet: VM type       = 64-bit
17/05/03 11:12:51 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
17/05/03 11:12:51 INFO util.GSet: capacity      = 2^15 = 32768 entries
17/05/03 11:12:51 INFO namenode.NNConf: ACLs enabled? false
17/05/03 11:12:51 INFO namenode.NNConf: XAttrs enabled? true
17/05/03 11:12:51 INFO namenode.NNConf: Maximum size of an xattr: 16384
17/05/03 11:12:52 INFO namenode.FSImage: Allocated new BlockPoolId: BP-130729900-192.168.1.1-1429393391595
17/05/03 11:12:52 INFO common.Storage: Storage directory /usr/local/hadoop_store/hdfs/namenode has been successfully formatted.
17/05/03 11:12:52 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/05/03 11:12:52 INFO util.ExitUtil: Exiting with status 0
17/05/03 11:12:52 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at HDMachine/192.168.1.1

需要注意的是在开始使用Hadoop之前,hadoop namenode -format 命令只能执行一次。如果已经开始使用了Hadoop,再次执行该命令会销毁所有存储在HDFS上的数据。

启动Hadoop

启动Hadoop,直接执行sbin目录下的start-all.sh即可,命令如下:

hduser@HDMachine:~$ cd /usr/local/hadoop/sbin/
hduser@HDMachine:/usr/local/hadoop/sbin$ ls
distribute-exclude.sh  hdfs-config.sh           refresh-namenodes.sh  start-balancer.sh    start-yarn.cmd  stop-balancer.sh    stop-yarn.cmd
hadoop-daemon.sh       httpfs.sh                slaves.sh             start-dfs.cmd        start-yarn.sh   stop-dfs.cmd        stop-yarn.sh
hadoop-daemons.sh      kms.sh                   start-all.cmd         start-dfs.sh         stop-all.cmd    stop-dfs.sh         yarn-daemon.sh
hdfs-config.cmd        mr-jobhistory-daemon.sh  start-all.sh          start-secure-dns.sh  stop-all.sh     stop-secure-dns.sh  yarn-daemons.sh
hduser@HDMachine:/usr/local/hadoop/sbin$ start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
17/05/03 14:07:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
hduser@localhost's password: 
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-HDMachine.out
hduser@localhost's password: 
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-HDMachine.out
Starting secondary namenodes [0.0.0.0]
hduser@0.0.0.0's password: 
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-HDMachine.out
17/05/03 14:07:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-HDMachine.out
hduser@localhost's password: 
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-HDMachine.out
hduser@HDMachine:/usr/local/hadoop/sbin$ 

使用jps命令检查Hadoop是否在运行:

hduser@HDMachine:/usr/local/hadoop/sbin$ jps
51633 Jps
50756 DataNode
50981 SecondaryNameNode
51318 NodeManager
50570 NameNode
51149 ResourceManager

看到上面的输出,意味着hadoop已经成功运行。
另外一种检查Hadoop是否在运行的方法是使用netstat命令,命令如下:

hduser@HDMachine:/usr/local/hadoop/sbin$ netstat -plten | grep java
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      1003       119588      50570/java      
tcp        0      0 127.0.0.1:59447         0.0.0.0:*               LISTEN      1003       127666      50756/java      
tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      1003       127653      50756/java      
tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      1003       119763      50756/java      
tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN      1003       128653      50756/java      
tcp        0      0 127.0.0.1:54310         0.0.0.0:*               LISTEN      1003       128405      50570/java      
tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1003       130314      50981/java      
tcp6       0      0 :::8088                 :::*                    LISTEN      1003       129481      51149/java      
tcp6       0      0 :::8030                 :::*                    LISTEN      1003       131806      51149/java      
tcp6       0      0 :::8031                 :::*                    LISTEN      1003       131788      51149/java      
tcp6       0      0 :::8032                 :::*                    LISTEN      1003       131810      51149/java      
tcp6       0      0 :::8033                 :::*                    LISTEN      1003       137455      51149/java      
tcp6       0      0 :::60261                :::*                    LISTEN      1003       131852      51318/java      
tcp6       0      0 :::8040                 :::*                    LISTEN      1003       131858      51318/java      
tcp6       0      0 :::8042                 :::*                    LISTEN      1003       134564      51318/java  

50070、50010、54310等端口都是Hadoop在使用的端口。

停止Hadoop

通过运行sbin目录中的stop-all.sh脚本可以停止Hadoop。命令如下:

hduser@HDMachine:/usr/local/hadoop/sbin$ stop-all.sh 
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
17/05/03 14:25:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
hduser@localhost's password: 
localhost: stopping namenode
hduser@localhost's password: 
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
hduser@0.0.0.0's password: 
0.0.0.0: stopping secondarynamenode
17/05/03 14:25:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
hduser@localhost's password: 
localhost: stopping nodemanager
no proxyserver to stop

Hadoop的wen管理页面

在浏览器中打开http://localhost:50070,即可以通过浏览器查看namenode的信息,如图1:

1.3 安装ZooKeeper

ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。
ZooKeeper的目标就是封装好复杂易出错的关键服务,将简单易用的接口和性能高效、功能稳定的系统提供给用户。

下载ZooKeeper

使用wget命令去ZooKeeper下载网站下载ZooKeeper3.4.10安装包,并解压,命令如下:
~~~ hduser@HDMachine:~$ wget http://www.eu.apache.org/dist/zookeeper/stable/zookeeper-3.4.10.tar.gz hduser@HDMachine:~$ tar xvzf zookeeper-3.4.10.tar.gz ~~~ 为了方便管理和使用,将zookeeper移动到/usr/local/zookeeper目录,命令如下:

hduser@HDMachine:~/zookeeper-3.4.10$ sudo mv * /usr/local/zookeeper
[sudo] password for hduser: 

配置ZooKeeper

拷贝ZooKeeper的配置文件模板到conf目录下,并重命名为zoo.cfg,命令如下:

hduser@HDMachine:~$ cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg

启动ZooKeeper

使用bin目录下的zkServer.sh脚本启动ZooKeeper,命令如下:

hduser@HDMachine:/usr/local/zookeeper$ bin/zkServer.sh start

看到下面的日志,则说明ZooKeeper启动成功。

ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

1.4 安装Accumulo

下载Accumulo

使用wget命令去Accumulo下载网站下载Accumulo1.7 安装包,并解压,命令如下:

hduser@HDMachine:~$ wget https://www.apache.org/dyn/closer.lua/accumulo/1.7.3/accumulo-1.7.3-bin.tar.gz
hduser@HDMachine:~$ tar xvzf accumulo-1.7.3-bin.tar.gz

为了方便管理和使用,将Accumulo移动到/usr/local/accumulo目录,命令如下:

hduser@HDMachine:~/accumulo-1.7.3-bin$ sudo mv * /usr/local/accumulo
[sudo] password for hduser: 

配置Accumulo

Accumulo提供了具有各种内存大小的服务器的示例配置,分别为512 MB,1 GB,2 GB和3 GB。本文采取512 MB的配置,用户可以根据服务器配置来选择不同的Accumulo配置文件。
拷贝512M对应的配置文件到conf目录,命令如下:

hduser@HDMachine:~$ cp /usr/local/accumulo/conf/examples/512MB/standalone/* /usr/local/accumulo/conf/
配置~/.bashrc文件

使用vi编辑~/.bashrc文件,设置HADOOPHOME和ZOOKEEPERHOME环境变量,命令如下:

hduser@HDMachine:~$sudo vi  ~/.bashrc

export HADOOP_HOME=/usr/local/hadoop/
export ZOOKEEPER_HOME=/usr/local/zookeeper/
配置accumulo-env.sh

使用vi打开accumulo-env.sh文件,设置ACCUMULOMONITORBIND_ALL选项为true,命令如下:

hduser@HDMachine:~$sudo vi  /usr/local/accumulo/conf/accumulo-env.sh

export ACCUMULO_MONITOR_BIND_ALL="true"

默认情况下,Accumulo的HTTP监听进程仅绑定到本地网络接口。 为了能够通过Internet访问它,必须将ACCUMULOMONITORBIND_ALL的值设置为true。

配置accumulo-site.xml

accumulo的工作进程之间沟通需要使用密码,在accumulo-site.xml文件中我们可以将密码修改为一个安全的密码。在accumulo-site.xml文件中找到instance.secret,然后修改其value,本文中修改为PASS1234。修改后的配置文件如下:

<property>
    <name>instance.secret</name>
    <value>PASS1234</value>
    <description>A secret unique to a given instance that all servers must know in order to communicate with one another.
      Change it before initialization. To
      change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret --old [oldpasswd] --new [newpasswd],
      and then update this file.
    </description>
</property>

然后在accumulo-site.xml文件中增加instance.volumes属性,该属性用来配置accumulo存储数据的HDFS路径,配置好的属性如下:

<property>
    <name>instance.volumes</name>
    <value>hdfs://localhost:54310/accumulo</value>
</property>

最后,在accumulo-site.xml文件中找到trace.token.property.password选项,修改其value值为安全的密码。这个密码在accumulo初始化的时候会使用,配置好的属性如下:

  <property>
    <name>trace.token.property.password</name>
    <value>mypassw</value>
  </property>

初始化Accumulo

使用bin目录下的accumulo进行初始化,命令如下:

hduser@HDMachine:/usr/local/accumulo$ bin/accumulo init
2017-05-03 16:38:11,332 [conf.ConfigSanityCheck] WARN : Use of instance.dfs.uri and instance.dfs.dir are deprecated. Consider using instance.volumes instead.
2017-05-03 16:38:12,800 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2017-05-03 16:38:12,802 [init.Initialize] INFO : Hadoop Filesystem is hdfs://localhost:54310
2017-05-03 16:38:12,803 [init.Initialize] INFO : Accumulo data dirs are [hdfs://localhost:54310/accumulo]
2017-05-03 16:38:12,803 [init.Initialize] INFO : Zookeeper server is localhost:2181
2017-05-03 16:38:12,803 [init.Initialize] INFO : Checking if Zookeeper is available. If this hangs, then you need to make sure zookeeper is running
Instance name : geomesa
Enter initial password for root (this may not be applicable for your security setup): ******
Confirm initial password for root: ******
2017-05-03 16:38:28,350 [Configuration.deprecation] INFO : dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
2017-05-03 16:38:33,501 [Configuration.deprecation] INFO : dfs.block.size is deprecated. Instead, use dfs.blocksize
2017-05-03 16:38:35,553 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthorizor
2017-05-03 16:38:35,568 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKAuthenticator
2017-05-03 16:38:35,574 [conf.AccumuloConfiguration] INFO : Loaded class : org.apache.accumulo.server.security.handler.ZKPermHandler

在初始化过程中需要输入Instance Name和password。Instance name 设置的为geomesa,password为accumulo-site.xml文件中设置的trace.token.property.password。

启动Accumulo

使用bin目录下的start-all.sh脚本可以启动accumulo,命令如下:

hduser@HDMachine:/usr/local/accumulo$ ./bin/start-all.sh
Starting monitor on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting tablet servers .... done
2017-05-03 16:44:46,682 [conf.ConfigSanityCheck] WARN : Use of instance.dfs.uri and instance.dfs.dir are deprecated. Consider using instance.volumes instead.
2017-05-03 16:44:48,422 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2017-05-03 16:44:48,426 [server.Accumulo] INFO : Attempting to talk to zookeeper
2017-05-03 16:44:48,578 [server.Accumulo] INFO : ZooKeeper connected and initialized, attempting to talk to HDFS
2017-05-03 16:44:48,720 [server.Accumulo] INFO : Connected to HDFS
Starting tablet server on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting master on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting garbage collector on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting tracer on localhost
WARN : Max open files on localhost is 1024, recommend 32768

Web管理页面

Accumulo启动以后,可以通过 http://localhost:50095打开Accumulo Web管理页面,如图3-2:

1.5 安装GeoServer

GeoServer 是 OpenGIS Web 服务器规范的 J2EE 实现,利用 GeoServer 可以方便的发布地图数据,允许用户对特征数据进行更新、删除、插入操作,通过 GeoServer 可以比较容易的在用户之间迅速共享空间地理信息。GeoServer兼容 WMS 和 WFS 特性;支持 PostgreSQL、 Shapefile 、 ArcSDE 、 Oracle 、 VPF 、 MySQL 、 MapInfo ;支持上百种投影;能够将网络地图输出为 jpeg 、 gif 、 png 、 SVG 、 KML 等格式;能够运行在任何基于 J2EE/Servlet 容器之上;嵌入 MapBuilder 支持 AJAX 的地图客户端OpenLayers;除此之外还包括许多其他的特性。

下载GeoServer

使用wget命令去GeoServer下载网站下载GeoServer2.9.1 安装包,并解压,命令如下:

hduser@HDMachine:~$ wget https://sourceforge.net/projects/geoserver/files/GeoServer/2.9.1/geoserver-2.9.1-bin.zip
hduser@HDMachine:~$ unzip geoserver-2.9.1-bin.zip

为了方便管理和使用,将GeoServer移动到/usr/local/geoserver目录,命令如下:

hduser@HDMachine:~/geoserver-2.9.1-bin$ sudo mv * /usr/local/geoserver
[sudo] password for hduser: 

配置~/.bashrc文件

使用vi打开 ~/.bashrc文件,设置GEOSERVER_HOME环境变量,命令如下:

hduser@HDMachine:~$ vi ~/.bashrc 

export GEOSERVER_HOME=/usr/local/geoserver

使用source命令加载~/.bashrc文件,使新配置的环境变量生效,命令如下:

hduser@HDMachine:~$source ~/.bashrc

修改geoserver文件夹拥有者

使用chown命令修改geoserver文件夹拥有者,将geoserver文件夹拥有者修改为hadoop用户,即hduser。命令如下:

hduser@HDMachine:~$sudo chown -R hduser /usr/local/geoserver/

启动GeoServer

进入geoserver/bin目录,执行startup.sh脚本,即可启动GeoServer,命令如下:

hduser@HDMachine:~$cd /usr/local/geoserver/bin
hduser@HDMachine:/usr/local/geoserver/bin$ ./startup.sh 

打开GeoServerWeb控制台

在Web浏览器中输入http://localhost:8080/geoserver,即可打开GeoServerWeb控制台,如图3-3: 用户可以使用默认 admin/geoserver 登陆进行管理。

1.6 安装GeoMesa Accumulo

GeoMesa Accumulo的安装有2种方式,从编译好的二进制包安装和编译源码安装。

1.6.1 从二进制包安装

从从编译好的二进制包安装比较简单,直接下载编译好的安装包,然后解压出来即可,脚本如下:

$ wget http://repo.locationtech.org/content/repositories/geomesa-releases/org/locationtech/geomesa/geomesa-accumulo-dist_2.11/$VERSION/geomesa-accumulo-dist_2.11-$VERSION-bin.tar.gz
$ tar xvf geomesa-accumulo-dist_2.11-$VERSION-bin.tar.gz
$ cd geomesa-accumulo-dist_2.11-$VERSION
$ ls
bin/  conf/  dist/  docs/  emr4/  examples/  lib/  LICENSE.txt  logs/

其中$VERSION为当前GeoMesa Accumulo的版本,目前最新版本为1.3.1

1.6.2 从源代码编译安装

1)环境依赖

在编译代码之前,需要确保安装了下面的软件:

  • Java JDK 8
  • Apache Maven 3.2.2+
  • git client
2)下载代码

使用git命令下载代码,命令如下:

$ git clone https://github.com/locationtech/geomesa.git
$ cd geomesa

切换代码到最新版本,当前版本为1.3.1($VERSION=1.3.1),命令如下:

 git checkout tags/geomesa-$VERSION -b geomesa-$VERSION
3)编译代码

使用maven编译代码,maven的项目文件pom.xml在代码的根目录下。命令如下:

$ mvn clean install

将skipTests属性设为true可以加快编译速度,命令如下:

$ mvn clean install -DskipTests=true

使用 build/mvn命令可以基于Zinc 进行增量编译,命令如下:

$ build/mvn clean install

1.6.3 安装Accumulo 分布式运行时库

geomesa-accumulo-dist_2.11-$VERSION/dist/accumulo目录下包含了accumuloserver端运行时所需要的类库,这些类库需要部署到accumulo集群中每台tablet服务器上。
需要注意的是安装目录下有2个运行时类库,一个支持Raster,一个不支持Raster。安装的时候只需要安装一个即可。两个同时安装会导致问题。另外运行时类库的版本必须和GeoMesa 数据存储客户端类库(在GeoServer中使用)的版本一致,否则在查询数据不一定能够正确工作。

1)手动安装

将运行时类库拷贝到集群中每个tablet服务器的$ACCUMULO_HOME/lib/ext目录中,命令如下:

# something like this for each tablet server
$ scp dist/accumulo/geomesa-accumulo-distributed-runtime_2.11-$VERSION.jar \
    tserver1:$ACCUMULO_HOME/lib/ext
# or for raster support
$ scp dist/accumulo/geomesa-accumulo-distributed-runtime-raster_2.11-$VERSION.jar \
    tserver1:$ACCUMULO_HOME/lib/ext
    

需要注意的是accumulo主服务器( master server)不需要安装运行时类库。

1)命名空间安装

使用手工的方式安装运行时库,可以保证能够GeoMesa Accumulo正确运行。但是从Accumulo 1.6+,我们可以利用namespace将GeoMesa类路径与其余的Accumulo隔离开来。
使用geomesa-accumulo-dist_2.11-$VERSION/bin目录下的setup-namespace.sh脚本,可以基于NameSpace进行安装,命令如下:

./setup-namespace.sh -u myUser -n myNamespace

setup-namespace.sh脚本的参数如下:

  • -u <Accumulo username>
  • -n <Accumulo namespace>
  • -p <Accumulo password> (可选,如果不提供,将提示)
  • -g <Path of GeoMesa distributed runtime JAR> (可选, 默认为distribution 文件夹,并且不支持Raster)
  • -h <HDFS URI e.g. hdfs://localhost:54310> (可选,如果不提供,将提示)

或者,可以使用以下命令手动安装分布式运行时类库:

$ accumulo shell -u root
&gt; createnamespace myNamespace
&gt; grant NameSpace.CREATE_TABLE -ns myNamespace -u myUser
&gt; config -s general.vfs.context.classpath.myNamespace=hdfs://NAME_NODE_FDQN:54310/accumulo/classpath/myNamespace/[^.].*.jar
&gt; config -ns myNamespace -s table.classpath.context=myNamespace

执行完上面的命令后,可以手动拷贝分布式运行时类库到HDFS中指定的目录下。上面的例子中的目录只是个例子,可以使用包括项目名称,版本号和其他信息的嵌套文件夹,以便在同一个Accumulo实例上具有不同版本的GeoMesa。

1.6.4 配置Accumulo命令行工具

在geomesa-accumulo2.11-$VERSION/bin/目录中,GeoMesa提供了一些命令行工具帮助用户管理Accumulo。可以通过运行geomesa-accumulo2.11-$VERSION/bin/目录下的geomesa-env.sh脚本来设置环境变量。
在geomesa-accumulo_2.11-$VERSION目录下运行bin/geomesa configure 来配置这些工具,命令如下:

### in geomesa-accumulo_2.11-$VERSION/:
$ bin/geomesa configure
Warning: GEOMESA_ACCUMULO_HOME is not set, using /path/to/geomesa-accumulo_2.11-$VERSION
Using GEOMESA_ACCUMULO_HOME as set: /path/to/geomesa-accumulo_2.11-$VERSION
Is this intentional? Y\n y
Warning: GEOMESA_LIB already set, probably by a prior configuration.
Current value is /path/to/geomesa-accumulo_2.11-$VERSION/lib.

Is this intentional? Y\n y

To persist the configuration please update your bashrc file to include:
export GEOMESA_ACCUMULO_HOME=/path/to/geomesa-accumulo_2.11-$VERSION
export PATH=${GEOMESA_ACCUMULO_HOME}/bin:$PATH

执行完上面的命令后,编辑~/.bashrc文件,并将下面命令加入到bashrc文件中:

export GEOMESA_ACCUMULO_HOME=/path/to/geomesa-accumulo_2.11-$VERSION
export PATH=${GEOMESA_ACCUMULO_HOME}/bin:$PATH

然后保存bashrc文件,并重新加载bashrc文件,命令如下:

$ source ~/.bashrc

由于授权的限制,支持shapefile和raster的相关文件需要单独安装,命令如下:

$ bin/install-jai.sh
$ bin/install-jline.sh

测试GeoMesa命令行工具,执行geomesa即可,命令如下:

$ geomesa
Using GEOMESA_ACCUMULO_HOME = /path/to/geomesa-accumulo-dist_2.11-$VERSION
Usage: geomesa [command] [command options]
  Commands:
  ...

1.6.5 在GeoServer中安装GeoMesa Accumulo插件

GeoMesa实现了兼容GeoTools的数据存储接口,因此在GeoServer中可以将GeoMesa Accumulo作为数据源使用。
在GeoServer运行之后,需要安装GeoServer的WPS插件,关于WPS插件的安装请参考GeoServer的安装文档。
在GeoServer中安装GeoMesa Accumulo插件可以使用bin目录下的manage-geoserver-plugins.sh脚本,命令如下:

$ bin/manage-geoserver-plugins.sh --lib-dir /path/to/geoserver/WEB-INF/lib/ --install
Collecting Installed Jars
Collecting geomesa-gs-plugin Jars

Please choose which modules to install
Multiple may be specified, eg: 1 4 10
Type 'a' to specify all
--------------------------------------
0 | geomesa-accumulo-gs-plugin_2.11-$VERSION
1 | geomesa-blobstore-gs-plugin_2.11-$VERSION
2 | geomesa-process_2.11-$VERSION
3 | geomesa-stream-gs-plugin_2.11-$VERSION

Module(s) to install: 0 1
0 | Installing geomesa-accumulo-gs-plugin_2.11-$VERSION-install.tar.gz
1 | Installing geomesa-blobstore-gs-plugin_2.11-$VERSION-install.tar.gz
Done

如果使用手动安装的方式,需要解压geomesa-accumulo2.11-$VERSION/dist/geoserver/目录下的 geomesa-accumulo-gs-plugin2.11-$VERSION-install.tar.gz文件,然后将解压的文件拷贝到GeoServer’s lib目录下面,如果GeoServer使用Tomcat部署,命令如下:

$ tar -xzvf \
  geomesa-accumulo_2.11-$VERSION/dist/geoserver/geomesa-accumulo-gs-plugin_2.11-$VERSION-install.tar.gz \
  -C /path/to/tomcat/webapps/geoserver/WEB-INF/lib/
 ~~~
 如果使用GeoServer内置的Jetty,命令如下:
 

tar -xzvf \ geomesa-accumulo2.11-$VERSION/dist/geoserver/geomesa-accumulo-gs-plugin2.11-$VERSION-install.tar.gz \ -C /path/to/geoserver/webapps/geoserver/WEB-INF/lib/
~~~ 还有一些其他的Jar包,例如Accumulo, Zookeeper, Hadoop, and Thrift的包需要拷贝到GeoServer’s WEB-INF/lib目录下面,使用geomesa-accumulo2.11-$VERSION/bin目录下的$GEOMESAACCUMULO_HOME/bin/install-hadoop-accumulo.sh脚本可以方便的安装这些依赖包,命令如下:

$ $GEOMESA_ACCUMULO_HOME/bin/install-hadoop-accumulo.sh /path/to/tomcat/webapps/geoserver/WEB-INF/lib/
Install accumulo and hadoop dependencies to /path/to/tomcat/webapps/geoserver/WEB-INF/lib/?
Confirm? [Y/n]y
fetching https://search.maven.org/remotecontent?filepath=org/apache/accumulo/accumulo-core/1.6.5/accumulo-core-1.6.5.jar
--2015-09-29 15:06:48--  https://search.maven.org/remotecontent?filepath=org/apache/accumulo/accumulo-core/1.6.5/accumulo-core-1.6.5.jar
Resolving search.maven.org (search.maven.org)... 207.223.241.72
Connecting to search.maven.org (search.maven.org)|207.223.241.72|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4646545 (4.4M) [application/java-archive]
Saving to: ‘/path/to/tomcat/webapps/geoserver/WEB-INF/lib/accumulo-core-1.6.5.jar’
...

安装完GeoMesa Accumulo插件后,就可以在GeoServer中使用Accumulo作为数据源了,具体的使用会在后面章节介绍。

This entry was posted in OpenSource, Uncategorized.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">