Hive安裝和使用

參考資料:

Hive3.1.2安裝指南_廈大數據庫實驗室博客

Hive學習（一）安裝環境：CentOS 7 + Hadoop3.2 + Hive3.1 - 一個人、一座城 - 博客園

1.安裝hive

1.1下載地址hive

鏡像路徑
http://www.Apache.org/dyn/closer.cgi/hive 或者 https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/

wget https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz .

1.2.解壓到安裝路徑

tar -zxvf ./apache-hive-3.1.2-bin.tar.gz -C  /usr/local
cd   /usr/local
sudo mv apache-hive-3.1.2-bin hive

1.3.設置hive路徑

vi ~/.bash_profile
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf
export PATH=$PATH:$HIVE_HOME/bin

source ~/.bash_profile

1.4.修改/usr/local/hive/conf下的hive-site.xml

cd /usr/local/hive/conf
mv hive-default.xml.template hive-default.xml

vim hive-site.xml

<configuration>
  <property>
    <name>JAVAx.jdo.option.ConnectionURL</name>
    <value>jdbc:MySQL://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
  <value>hiveMhxzKhl88!</value>
  </property>
  <!--hive工作的hdfs臨時存儲空間-->
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
</property>
<!--hive工作的本地臨時存儲空間-->
<property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/xxx/hive/tmp</value>
</property>
<!--如果啟用了日志功能，則存儲操作日志的頂級目錄-->
<property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>home/xxx/hive/tmp/operation_logs</value>
</property>
<!--Hive運行時結構化日志文件的位置-->
<property>
    <name>hive.querylog.location</name>
    <value>/home/xxx/hive/tmp</value>
</property>
<!--用于在遠程文件系統中添加資源的臨時本地目錄-->
<property>
    <name>hive.downloaded.resources.dir</name>
    <value>/home/xxx/hive/tmp/${hive.session.id}_resources</value>
</property>
 <property>
     <name>hive.server2.authentication</name>
     <value>NONE</value>
</property>

<property>
   <name>dfs.permissions.enabled</name>
   <value>false</value>
</property>

<property>
     <name>hive.server2.enable.doAs</name>
     <value>FALSE</value>
</property>
</configuration>

1.5 hive-env.sh

cp hive-env.sh.template hive-env.sh
#并編輯

export HIVE_CONF_DIR=${HIVE_HOME}/conf
export HIVE_AUX_JARS_PATH=${HIVE_HOME}/conf/lib

1.5 拷貝mysql驅動包

wget https://cdn.mysql.com//archives/mysql-connector-java-5.1/mysql-connector-java-5.1.46.zip
unzip  mysql-connector-java-5.1.46.zip

cp mysql-connector-java-5.1.46-bin.jar ${HIVE_HOME}/conf/lib

2.安裝mysql

2.1下載myql的版本

https://downloads.mysql.com/archives/community/

yum -y install mysql-community-server
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
No package mysql-community-server available.
Error: Nothing to do

如果yum沒辦法安裝，需要單獨下載安裝
mysql-community-release-el7-5.noarch.rpm

wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
rpm -ivh mysql57-community-release-el7-10.noarch.rpm
 yum install mysql-serve

查詢是否安裝ok

 rpm -qa|grep mysql

修改默認字符集

vi /etc/my.cnf

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

systemctl restart mysqld.service

2.2 啟動mysql服務

systemctl  start mysqld.service

2.3 設置登錄密碼

查看臨時密碼

grep 'temporary password' /var/log/mysqld.log
2021-01-07T10:48:25.445856Z 1 [Note] A temporary password is generated for root@localhost: K:3q5rpb4+8R

試著用初始密碼登錄

mysql -p 
Enter password:

設置初始密碼

ALTER USER 'root'@'localhost' IDENTIFIED BY 'MhxzKhl88!'

2.4 新建hive數據庫。

這個hive數據庫與hive-site.xml中localhost:3306/hive的hive對應，用來保存hive元數據

mysql> create database hive;

. 配置mysql允許hive接入：

注意yourname是你當前的用戶

mysql> grant all on *.* to hive@localhost identified by 'hiveMhxzKhl88!';  
#將所有數據庫的所有表的所有權限賦給hive用戶，by后面的是配置hive-site.xml中配置的連接密碼
mysql> flush privileges;  #刷新mysql系統權限關系表

初始化數據庫

schematool -dbType mysql -initSchema

如果沒有初始化執行命令的時候會報錯

FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.me

關于mysql的授權命令

use mysql;

#給某個用戶授權
格式：grant 權限 on 數據庫.* to 用戶名@登錄主機 identified by "密碼";　
grant all privileges on testDB.* to test@localhost identified by '1234';

#如果想指定部分權限給一用戶，可以這樣來寫:
grant select,update on testDB.* to test@localhost identified by '1234';

#刪除某個授權的用戶
Delete FROM user Where User='test' and Host='localhost';
#修改密碼
update mysql.user set password=password('新密碼') where User="test" and Host="localhost";

3.hive sql入門

3.1 Hive基本數據類型

Hive支持基本數據類型和復雜類型, 基本數據類型主要有數值類型(INT、FLOAT、DOUBLE ) 、布爾型和字符串, 復雜類型有三種:ARRAY、MAP 和 STRUCT。

a.基本數據類型

TINYINT: 1個字節
SMALLINT: 2個字節
INT: 4個字節
BIGINT: 8個字節
BOOLEAN: TRUE/FALSE
FLOAT: 4個字節，單精度浮點型
DOUBLE: 8個字節，雙精度浮點型
STRING 字符串

b.復雜數據類型

ARRAY: 有序字段
MAP: 無序字段
STRUCT: 一組命名的字段

3.2 常用的HiveQL操作命令

3.2.1 庫操作

create database if not exists testdb;       #創建數據庫
show databases;                           #查看Hive中包含數據庫
show databases like 'h.*';                #查看Hive中以h開頭數據庫
describe databases;                       #查看hive數據庫位置等信息
alter database testdb set dbproperties;     #為hive設置鍵值對屬性
use testdb;                                 #切換到hive數據庫下
drop database if exists testdb;             #刪除不含表的數據庫
drop database if exists testdb cascade;     #刪除數據庫和它中的表

3.2.2 表操作

1.創建表

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]

CREATE TABLE 創建一個指定名字的表。如果相同名字的表已經存在，則拋出異常；用戶可以用 IF NOT EXIST 選項來忽略這個異常
EXTERNAL 關鍵字可以讓用戶創建一個外部表，在建表的同時指定一個指向實際數據的路徑（LOCATION）
LIKE 允許用戶復制現有的表結構，但是不復制數據
COMMENT可以為表與字段增加描述

例如,建立一個usr表

create table if not exists testdb.usr(
      uid string comment 'uid',
      cuid string comment 'cuid',
      type string comment 'type'
 ); 
 
create table person(name STRING,age INT);
 
create table if not exists testdb.usr2(
  id int,
  name string,
  address string
);

2.表查看

show tables in hive;  
show tables 'u.*';        #查看hive中以u開頭的表
describe hive.usr;        #查看usr表相關信息

3.表修改

#重命名表
alter table usr rename to custom;      
#修改列信息
alter table usr change column pwd password string after address;
#增加列
alter table usr add columns(hobby string);     
#刪除替換列
alter table usr replace columns(uname string);  
drop table if exists usr1;

4.導入數據

建立表,設定分隔符是t

create table if not exists testdb.testhive(unique_id string,uid string,cuid string,create_time int) 
row format delimited fields terminated by 't';

load data local inpath "/home/team/r.txt" overwrite into table testhive;

Hive中追加導入數據的4種方式

從本地導入： load data local inpath '/home/st.txt' (overwrite) into table student;

從Hdfs導入： load data inpath '
/user/hive/warehouse/st.txt' (overwrite) into table student;

查詢導入： create table student_a as select * from student;(也可以具體查詢某項數據)

查詢結果導入： insert （overwrite）into table student select * from student_a;

5.導出數據

insert overwrite local directory '/usr/local/hadoop/tmp/stu'  select id,name from stu;

3.2.3查詢

和標注sql語法基本一致,例如查詢一天的日活

select count(distinct cuid) from testhive;

Total MapReduce CPU Time Spent: 1 minutes 5 seconds 520 msec
OK
2942630
Time taken: 33.79 seconds, Fetched: 1 row(s)

3.3 hive命令的執行方式

1).CLI 方式直接執行 2).作為字符串通過shell調用hive –e執行（-S開啟靜默，去掉”OK”，”Time taken”）

 Hql作為字符串在shell腳本中執行，查詢結果可以直接導出到本地本件（默認分隔符為t）:
 hive -e "use ${database};select * from tb"  > tb.txt

如果字符串較長的話,可以按照如下方式書寫,sql=$(cat <<endtag 字符串endtag)方式可以將字符串復制給sql

file_path='/home/abc.txt'
sql=$(cat <<!EOF

USE pmp;
set mapred.queue.names=queue3;

drop table if exists people_targeted_delivery;
create table people_targeted_delivery
( special_tag_id int,
  cnt bigint
);

INSERT OVERWRITE LOCAL DIRECTORY $file_path
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' 

select special_tag_id,count(1) from t_pmp_special_user_tags group by special_tag_id;

!EOF)

############  execute begin   ###########
echo $sql
$HIVE_HOME/bin/hive -e "$sql"

exitCode=$?
if [ $exitCode -ne 0 ];then
         echo "[ERROR] hive execute failed!"
         exit $exitCode
fi

3).作為獨立文件，通過shell調用 hive –f

mytest.hql書寫我們編寫好的hivesql文件

hive -f  mytest.hql

4.配置遠程機器訪問

基于資源隔離的原則,不可能所有的hive操作會登錄到hive服務本地操作,更多的是在其他機器進行.此時我們需要配置遠程訪問.

4.1 遠程配置

使用遠程模式，需要在hadoop的core-site.xml文件中添加一下屬性

其中，XXX是用來代理其它用戶訪問hdfs的用戶名，此處我的配置如下

<property>
    <name>hadoop.proxyuser.xxx.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.xxx.groups</name>
    <value>*</value>
</property>

重啟

#啟動
./hadoop/sbin/start-all.sh
#./hadoop/sbin/stop-all.sh
#關閉安全模式 
hdfs dfsadmin -safemode leave

設置hive-site.xml

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://ip:9083</value>
</property>

設置防火墻

vi /etc/sysconfig/iptable

-A INPUT -m state --state NEW -m tcp -p tcp --dport 9083 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 10000 -j ACCEPT

systemctl restart iptables.service

啟動metastore或者hiveserver2

nohup hive --service metastore &
# 下面這個支持beeline連接,官方
nohup hive --service hiveserver2 &

4.2 客戶端配置

確保安裝了java環境

yum  localinstall jdk-8u151-linux-x64.rpm

新建一個hadoopclient 的目錄,用于存放hadoopclient和hiveclient,如下.所謂客戶端就是copy遠程集群的目錄即可

# tree -L 1
.
├── hadoop
├── hive

配置正確環境變量

export JAVA_HOME=/usr/java/jdk1.8.0_151/
export HADOOP_HOME=/home/xxx/hadoopclient/hadoop

export HIVE_HOME=/home/xxx/hadoopclient/hive
export HIVE_CONF_DIR=${HIVE_HOME}/conf


export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
export PATH=$JAVA_HOME/bin:$HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$PATH

客戶端發起連接

方式1 -hive

繼續使用hivecli命令

hive

方式2-beeline(推薦)

beeline
!connect jdbc:hive2://ip:10000

#或者
beeline -u jdbc:hive2://ip:10000
beeline -u  "jdbc:hive2://ip:10000/testdb;"

Beeline和其他工具有一些不同，執行查詢都是正常的SQL輸入，但是如果是一些管理的命令，

比如進行連接，中斷，退出，執行Beeline命令需要帶上“！”，不需要終止符。常用命令介紹：
1、!connect url –連接不同的Hive2服務器
2、!exit –退出shell
3、!help –顯示全部命令列表
4、!verbose –顯示查詢追加的明細

The Beeline CLI 支持以下命令行參數:  
Option  
Description  
--autoCommit=[true/false] ---進入一個自動提交模式：beeline --autoCommit=true  
--autosave=[true/false]   ---進入一個自動保存模式：beeline --autosave=true  
--color=[true/false]    ---顯示用到的顏色：beeline --color=true  
--delimiterForDSV= DELIMITER ---分隔值輸出格式的分隔符。默認是“|”字符。  
--fastConnect=[true/false]  ---在連接時，跳過組建表等對象：beeline --fastConnect=false  
--force=[true/false]    ---是否強制運行腳本：beeline--force=true  
--headerInterval=ROWS   ---輸出的表間隔格式，默認是100: beeline --headerInterval=50  
--help ---幫助  beeline --help  
--hiveconf property=value  ---設置屬性值，以防被hive.conf.restricted.list重置：beeline --hiveconf prop1=value1   
--hivevar name=value   ---設置變量名：beeline --hivevar var1=value1  
--incremental=[true/false]  ---輸出增量
--isolation=LEVEL  ---設置事務隔離級別：beeline --isolation=TRANSACTION_SERIALIZABLE  
--maxColumnWidth=MAXCOLWIDTH ---設置字符串列的最大寬度：beeline --maxColumnWidth=25  
--maxWidth=MAXWIDTH ---設置截斷數據的最大寬度：beeline --maxWidth=150  
--nullemptystring=[true/false]  ---打印空字符串：beeline --nullemptystring=false  
--numberFormat=[pattern]     ---數字使用DecimalFormat：beeline --numberFormat="#,###,##0.00"  
--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2] ---輸出格式：beeline --outputformat=tsv   
--showHeader=[true/false]   ---顯示查詢結果的列名：beeline --showHeader=false  
--showNestedErrs=[true/false] ---顯示嵌套錯誤：beeline --showNestedErrs=true  
--showWarnings=[true/false] ---顯示警告：beeline --showWarnings=true  
--silent=[true/false]  ---減少顯示的信息量：beeline --silent=true  
--truncateTable=[true/false] ---是否在客戶端截斷表的列     
--verbose=[true/false]  ---顯示詳細錯誤信息和調試信息：beeline --verbose=true  
-d <driver class>  ---使用一個驅動類：beeline -d driver_class  
-e <query>  ---使用一個查詢語句：beeline -e "query_string"  
-f <file>  ---加載一個文件：beeline -f filepath  多個文件用-e file1 -e file2
-n <username>  ---加載一個用戶名：beeline -n valid_user  
-p <password>  ---加載一個密碼：beeline -p valid_password  
-u <database URL> ---加載一個JDBC連接字符串：beeline -u db_URL

4.3 查詢并導出結果

這個理由小陷阱,--outputformat=tsv要放到-e前面才生效

beeline -u  "jdbc:hive2://ip:10000/testdb;"  --outputformat=tsv  -e "select count(*) from testhive;" > r.txt

5.遇到的錯誤

Hive JDBC：Permission denied: user=anonymous, access=EXECUTE, inode=”/tmp”

解決辦法：報錯內容提示hive沒有/tmp目錄的權限，賦予權限即可：（注意：該tmp目錄為hdfs的目錄，不是Linux系統的目錄）

hadoop fs -chmod 777 /tmp
#hadoop fs -chmod -R 777 /tmp

日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡