
概念:The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
本质:将SQL转化为MapReduce程序。Hive的存储系统是HDFS,HIVE计算引擎是MapReduce,HIVE的资源调度器是YARN。
优点: 1)类SQL语法,易于上手;2)SQL自动转化为MapReduce,减少学习成本;3)适用于大数据的分析场景;4)Hive支持用户根据需求自定义函数;
缺点:1)Hive的执行延迟比较高;2)迭代式算法无法表达;3)处理小数据没有优势;4)自动生成的MapReduce作业不够智能化;5)Hive调优比较困难,粒度较粗;
Hive简介及Hive架构和原理
10、Hive核心概念和架构原理
Thrift Server
Hive的可选组件,此组件是一个软件框架服务,允许客户端使用包括Java、C++、Ruby和其他很多种语言,通过编程的方式远程访问Hive。
驱动器:Driver
(1)解析器(SQL Parser):将SQL字符串转换成抽象语法树AST,这一步一般都用第三方工具库完成,比如antlr;对AST进行语法分析,比如表是否存在、字段是否存在、SQL语义是否有误;
(2)编译器(Physical Plan):对HQL语句进行词法、语法、语义的编译(需要跟元数据关联),编译完成后会生成一个执行计划;
(3)优化器(Query Optimizer):对逻辑执行计划进行优化,减少不必要的列、使用分区、使用索引等;
(4)执行器(Execution):将优化后的把逻辑执行计划转换成可以运行的物理计划(对于Hive来说,就是MapReduce/Spark),提交到Hadoop的Yarn上执行。
原因在于Hive默认使用的元数据库为derby,开启Hive之后就会占用元数据库,且不与其他客户端共享数据,如果想多窗口 *** 作就会报错, *** 作比较局限。以我们需要将Hive的元数据地址改为MySQL(rpm安装和卸载软件),可支持多窗口 *** 作。
卸载已经安装的MySql[atguigu@hadoop102 ~]$ rpm -qa|grep mariadb mariadb-libs-5.5.56-2.el7.x86_64 [atguigu@hadoop102 ~]$ sudo rpm -e --nodeps mariadb-libs解压MySQL安装包
[atguigu@hadoop102 software]$ tar -zxvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now # MySql的tar包没有压缩使用 -xvf选项就可 [atguigu@hadoop102 software]$ tar -xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar mysql-community-embedded-5.7.28-1.el7.x86_64.rpm mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm mysql-community-devel-5.7.28-1.el7.x86_64.rpm mysql-community-embedded-compat-5.7.28-1.el7.x86_64.rpm mysql-community-libs-5.7.28-1.el7.x86_64.rpm mysql-community-test-5.7.28-1.el7.x86_64.rpm mysql-community-common-5.7.28-1.el7.x86_64.rpm mysql-community-embedded-devel-5.7.28-1.el7.x86_64.rpm mysql-community-client-5.7.28-1.el7.x86_64.rpm mysql-community-server-5.7.28-1.el7.x86_64.rpm [atguigu@hadoop102 software]$ ll 总用量 1720252 -rw-r--r--. 1 atguigu atguigu 609556480 10月 17 17:24 mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar -rw-r--r--. 1 atguigu atguigu 45109364 9月 30 2019 mysql-community-client-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 318768 9月 30 2019 mysql-community-common-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 7037096 9月 30 2019 mysql-community-devel-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 49329100 9月 30 2019 mysql-community-embedded-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 23354908 9月 30 2019 mysql-community-embedded-compat-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 136837816 9月 30 2019 mysql-community-embedded-devel-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 4374364 9月 30 2019 mysql-community-libs-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 1353312 9月 30 2019 mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 208694824 9月 30 2019 mysql-community-server-5.7.28-1.el7.x86_64.rpm -rw-r--r--. 1 atguigu atguigu 133129992 9月 30 2019 mysql-community-test-5.7.28-1.el7.x86_64.rpm [atguigu@hadoop102 software]$使用RPM安装MySql
RPM包名称及其概要:
mysql-community-server 数据库服务器和相关工具
mysql-community-client MySQL 客户端应用程序和工具
mysql-community-common 服务器和客户端通用的库文件
mysql-community-devel MySQL 数据库客户端应用程序开发的头文件和库文件
mysql-community-libs 用于 MySQL 数据库客户端应用程序的共享库
mysql-community-libs-compat 对于之前 MySQL 安装的共享兼容性库
mysql-community-embedded MySQL 嵌入式库
mysql-community-embedded-devel 嵌入式的 MySQL 开发头文件和库文件
mysql-community-test MySQL 服务器的测试套件
[atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm 警告:mysql-community-common-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY 准备中... ################################# [100%] 正在升级/安装... 1:mysql-community-common-5.7.28-1.e################################# [100%] [atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm 警告:mysql-community-libs-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY 准备中... ################################# [100%] 正在升级/安装... 1:mysql-community-libs-5.7.28-1.el7################################# [100%] [atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm 警告:mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY 准备中... ################################# [100%] 正在升级/安装... 1:mysql-community-libs-compat-5.7.2################################# [100%] [atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm 警告:mysql-community-client-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY 准备中... ################################# [100%] 正在升级/安装... 1:mysql-community-client-5.7.28-1.e################################# [100%] [atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm 警告:mysql-community-server-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY 准备中... ################################# [100%] 正在升级/安装... 1:mysql-community-server-5.7.28-1.e################################# [100%]配置安装的MySql
# 查看datadir指向的目录
[atguigu@hadoop102 etc]$ cat /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECts.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
# datadir指向的目录下的所有内容
[atguigu@hadoop102 etc]$ cd /var/lib/mysql
[atguigu@hadoop102 mysql]$ sudo rm -rf ./*
# 初始化数据库
[atguigu@hadoop102 mysql]$ sudo mysqld --initialize --user=mysql
# 查看临时生成的root用户的密码
[atguigu@hadoop102 mysql]$ sudo cat /var/log/mysqld.log
2022-01-03T09:05:10.990246Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-01-03T09:05:16.710989Z 0 [Warning] InnoDB: New log files created, LSN=45790
2022-01-03T09:05:17.853899Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2022-01-03T09:05:18.010178Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 4586a786-6c74-11ec-9fb9-000c2955b598.
2022-01-03T09:05:18.042571Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2022-01-03T09:05:18.451056Z 0 [Warning] CA certificate ca.pem is self signed.
2022-01-03T09:05:18.589675Z 1 [Note] A temporary password is generated for root@localhost: )9Shrzd/oirE
# 启动MySQL服务
[atguigu@hadoop102 mysql]$ sudo systemctl start mysqld
# 登录MySQL数据库
[atguigu@hadoop102 mysql]$ mysql -uroot -p')9Shrzd/oirE'
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or g.
Your MySQL connection id is 2
Server version: 5.7.28
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
# 修改root用户的密码,否则执行其他的 *** 作会报错
mysql> set password = password("root");
Query OK, 0 rows affected, 1 warning (0.00 sec)
# 修改mysql库下的user表中的root用户允许任意ip连接
mysql> update mysql.user set host='%' where user='root';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 离开MySql
mysql> quit;
Bye
2.2、Hive安装部署
# 解压Hive [atguigu@hadoop102 software]$ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/module/ # 添加Hive环境变量 [atguigu@hadoop102 software]$ cat /etc/profile.d/my_env.sh # JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin # HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin # ZOOKEEPER_HOME export ZOOKEEPER_HOME=/opt/module/apache-zookeeper-3.5.7-bin export PATH=$PATH:$ZOOKEEPER_HOME/bin export PATH=$PATH:$ZOOKEEPER_HOME/sbin # HIVE_HOME HIVE_HOME=/opt/module/apache-hive-3.1.2-bin PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin # 刷新系统环境变量 [atguigu@hadoop102 software]$ source /etc/profile # 解决日志Jar包冲突 [atguigu@hadoop102 software]$ mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar.bak2.3、Hive元数据配置到MySql
[atguigu@hadoop102 software]$ cp mysql-connector-java-5.1.37.jar $HIVE_HOME/lib [atguigu@hadoop102 software]$ cat $HIVE_HOME/conf/hive-site.xml2.4、启动Hivejavax.jdo.option.ConnectionURL jdbc:mysql://hadoop102:3306/metastore?useSSL=false javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword root hive.metastore.warehouse.dir /user/hive/warehouse hive.metastore.schema.verification false hive.metastore.event.db.notification.api.auth false
myclusters.sh
# 登陆MySQL [atguigu@hadoop102 software]$ mysql -uroot -proot # 新建Hive元数据库 mysql> create database metastore; mysql> quit; # 初始化Hive元数据库 [atguigu@hadoop102 software]$ schematool -initSchema -dbType mysql -verbose # 启动Hadoop集群 [atguigu@hadoop102 software]$ myclusters.sh start # 启动Hive [atguigu@hadoop102 hive]$ hive which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin) Hive Session ID = 0e138704-f46b-48e6-9b5f-67f1acda92a2 Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = 41ae884b-9d14-40eb-b53b-f58a1410049c # 查看数据库 hive> show databases; OK default Time taken: 1.009 seconds, Fetched: 1 row(s) # 查看默认数据库中的表 hive> show tables; OK Time taken: 0.045 seconds # 在默认数据库中建表 hive> create table test (id int); OK Time taken: 0.864 seconds # 向表中插入数据 hive> insert into test values(1); Query ID = atguigu_20220103174139_894e6a7c-1997-41f5-8abe-34478acad198 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Interrupting... Be patient, this might take some time. Press Ctrl+C again to kill JVM Starting Job = job_1641202744100_0001, Tracking URL = http://hadoop103:8088/proxy/application_1641202744100_0001/ Kill Command = /opt/module/hadoop-3.1.3/bin/mapred job -kill job_1641202744100_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2022-01-03 17:42:33,045 Stage-1 map = 0%, reduce = 0% 2022-01-03 17:42:59,762 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.01 sec 2022-01-03 17:43:25,209 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 22.36 sec MapReduce Total cumulative CPU time: 22 seconds 360 msec Ended Job = job_1641202744100_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://hadoop102:9820/user/hive/warehouse/test/.hive-staging_hive_2022-01-03_17-41-40_004_8605542926052281340-1/-ext-10000 Loading data to table default.test MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 22.36 sec HDFS Read: 12765 HDFS Write: 199 SUCCESS Total MapReduce CPU Time Spent: 22 seconds 360 msec OK Time taken: 112.741 seconds # 查看表中的数据 hive> select * from test; OK 1 Time taken: 0.211 seconds, Fetched: 1 row(s) # 离开Hive hive> quit; [atguigu@hadoop102 software]$
另开一个窗口查看新建的表
[atguigu@hadoop102 ~]$ hive which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/home/atguigu/.local/bin:/home/atguigu/bin) Hive Session ID = dcc0168b-a0ec-4510-a48a-861a82019407 Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = ec1d304b-3988-40e5-8955-54e3bdfd0eb9 hive> show tables; OK test Time taken: 0.836 seconds, Fetched: 1 row(s) hive> show databases; OK default Time taken: 0.026 seconds, Fetched: 1 row(s) hive> quit;2.5、使用JDBC方式访问Hive
在hive-site.xml文件中添加如下配置信息(前两个property是新添加的)
[atguigu@hadoop102 ~]$ cat $HIVE_HOME/conf/hive-site.xmlhive.server2.thrift.bind.host hadoop102 hive.server2.thrift.port 10000 javax.jdo.option.ConnectionURL jdbc:mysql://hadoop102:3306/metastore?useSSL=false javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword root hive.metastore.warehouse.dir /user/hive/warehouse hive.metastore.schema.verification false hive.metastore.event.db.notification.api.auth false
启动hiveserver2(使用过程中不能关掉)
[atguigu@hadoop102 ~]$ hive --service hiveserver2 which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/home/atguigu/.local/bin:/home/atguigu/bin) 2022-01-03 20:14:16: Starting HiveServer2 Hive Session ID = a60c0358-d232-4d08-8f11-215017b6122c Hive Session ID = 43853016-ce83-429f-9670-9d2a96c081ea Hive Session ID = 496671ff-41d6-44f2-8f41-a838572d61cd Hive Session ID = da1c1cb7-c0c0-49a0-82df-cf32fb6c6cd0 Hive Session ID = c506f9f9-6a36-4b8f-828b-fb5c9723e514
启动beeline客户端(需要多等待一会)
# -n atguigu 与 [atguigu@hadoop102 software] 的用户名保持一致 [atguigu@hadoop102 software]$ beeline -u jdbc:hive2://hadoop102:10000 -n atguigu Connecting to jdbc:hive2://hadoop102:10000 Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive # 命令的输出格式更美观 0: jdbc:hive2://hadoop102:10000> show databases; INFO : Compiling command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c): show databases INFO : Concurrency mode is disabled, not creating a lock manager INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c); Time taken: 1.023 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c); Time taken: 0.058 seconds INFO : OK INFO : Concurrency mode is disabled, not creating a lock manager +----------------+ | database_name | +----------------+ | default | +----------------+ 1 row selected (1.787 seconds)启动hiveserver2(后台运行)
[atguigu@hadoop202 hive]$ nohup hive --service metastore 2>&1 & [atguigu@hadoop202 hive]$ nohup hive --service hiveserver2 2>&1 &
2.6、Hive常用交互命令nohup: 放在命令开头,表示不挂起,也就是关闭终端进程也继续保持运行状态
0:标准输入
1:标准输出
2:错误输出
2>&1 : 表示将错误重定向到标准输出上
&: 放在命令结尾,表示后台运行
一般会组合使用: nohup [xxx命令 *** 作]> file 2>&1 & , 表示将xxx命令运行的结果输出到file中,并保持命令启动的进程在后台运行。
[atguigu@hadoop102 bin]$ hive -help which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin) Hive Session ID = ab0b4235-084f-4d02-bc00-dc9dbc2a0acf usage: hive -d,--defineVariable substitution to apply to Hive commands. e.g. -d A=B or --define A=B --database Specify the database to use -e SQL from command line -f SQL from files -H,--help Print help information --hiveconf Use value for given property --hivevar Variable substitution to apply to Hive commands. e.g. --hivevar A=B -i Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console)
# 不进入hive的交互窗口执行sql语句 [atguigu@hadoop102 bin]$ hive -e "select id from test;" which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin) Hive Session ID = b798f0f2-44d9-4429-b699-327a1f527e96 Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive Session ID = 82850e5e-accf-4a6e-9208-eb8b21e596d3 OK 1 Time taken: 3.405 seconds, Fetched: 1 row(s)
[atguigu@hadoop102 datas]$ cat hivef.sql select * from test; # 不执行SQL脚本 [atguigu@hadoop102 datas]$ hive -f /opt/module/apache-hive-3.1.2-bin/datas/hivef.sql which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin) Hive Session ID = 6c763bda-fb28-4d62-bdf0-54a2d9d3c31e Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive Session ID = 01eb09c5-52df-4579-bdc1-1f8ddd37eaa7 OK 1 Time taken: 2.594 seconds, Fetched: 1 row(s)
# hive中查看hdfs文件系统 hive> dfs -ls /; Found 10 items -rw-r--r-- 2 atguigu supergroup 5 2021-12-14 23:17 /dage.txt -rw-r--r-- 3 atguigu supergroup 12562 2021-12-17 21:24 /fsimage.xml drwxr-xr-x - atguigu supergroup 0 2021-12-12 22:10 /hahahha drwxr-xr-x - atguigu supergroup 0 2021-12-12 22:09 /input drwxr-xr-x - atguigu supergroup 0 2021-12-26 23:19 /output drwxr-xr-x - atguigu supergroup 0 2021-12-26 23:23 /output1 drwxr-xr-x - atguigu supergroup 0 2021-12-28 20:46 /output2 drwxr-xr-x - atguigu supergroup 0 2021-12-14 21:21 /sanguo drwxrwx--- - atguigu supergroup 0 2022-01-03 17:38 /tmp drwxr-xr-x - atguigu supergroup 0 2022-01-03 17:41 /user # 查看在hive中输入的所有历史命令 [atguigu@hadoop102 ~]$ cd;pwd /home/atguigu [atguigu@hadoop102 ~]$ cat .hivehistory show tables; show databases; quit; quit ; dfs -ls /; cat .hivehistory ; dfs -ls /; quit;2.7、Hive常见属性配置
hive窗口打印默认库和表头
[atguigu@hadoop102 ~]$ cat $HIVE_HOME/conf/hive-site.xml[atguigu@hadoop102 ~]$ hive which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin) Hive Session ID = 2974245c-c05f-49c0-aeff-dad02314e8fc Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Hive Session ID = ac9d8985-4539-4103-bec5-14f6e03676bb hive (default)> select * FROM test; OK test.id 1 Time taken: 2.317 seconds, Fetched: 1 row(s) hive.cli.print.header true hive.cli.print.current.db true hive.server2.thrift.bind.host hadoop102 hive.server2.thrift.port 10000 javax.jdo.option.ConnectionURL jdbc:mysql://hadoop102:3306/metastore?useSSL=false javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword root hive.metastore.warehouse.dir /user/hive/warehouse hive.metastore.schema.verification false hive.metastore.event.db.notification.api.auth false
Hive运行日志信息配置
[atguigu@hadoop102 ~]$ mv $HIVE_HOME/conf/hive-log4j2.properties.template $HIVE_HOME/conf/hive-log4j2.properties
[atguigu@hadoop102 ~]$ cat $HIVE_HOME/conf/hive-log4j2.properties.template
# Hive的log默认存放在/tmp/atguigu/hive.log目录下(当前用户名下)
# property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
property.hive.log.dir = /opt/module/apache-hive-3.1.2-bin/logs
Hive 参数配置的三种方式
3、Hive数据类型 3.1、hive3-hive基本数据类型[atguigu@hadoop102 ~]$ cat /opt/module/apache-hive-3.1.2-bin/datas/test.txt songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
hive (default)> create table test(
> name string,
> friends array,
> children map,
> address struct
> )
> row format delimited fields terminated by ','
> collection items terminated by '_'
> map keys terminated by ':'
> lines terminated by 'n';
OK
Time taken: 0.541 seconds
hive (default)> load data local inpath'/opt/module/apache-hive-3.1.2-bin/datas/test.txt' into table test;
Loading data to table default.test
OK
Time taken: 1.643 seconds
hive (default)> select friends[1],children['xiao song'],address.city from test
> where name='songsong';
OK
_c0 _c1 city
lili 18 beijing
Time taken: 1.915 seconds, Fetched: 1 row(s)
3.2、类型转化
隐式类型转换规则如下:
(1)任何整数类型都可以隐式地转换为一个范围更广的类型,如TINYINT可以转换成INT,INT可以转换成BIGINT。
(2)所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。
(3)TINYINT、SMALLINT、INT都可以转换为FLOAT。
(4)BOOLEAN类型不可以转换为任何其它的类型。
# cast强制类型转化,转化失败返回null
hive (default)> select '1'+2, cast('1'as int) + 2;
OK
_c0 _c1
3.0 3
Time taken: 0.189 seconds, Fetched: 1 row(s)
hive (default)> select 1 + 2 + 2.1;
OK
_c0
5.1
Time taken: 0.21 seconds, Fetched: 1 row(s)
hive (default)> select 2.4 + 2.1;
OK
_c0
4.5
Time taken: 0.176 seconds, Fetched: 1 row(s)
hive (default)> select true + 1;
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '1': No matching method for class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNumericPlus with (boolean, int)
4、DDL数据定义
4.1、数据库相关 *** 作
[atguigu@hadoop102 ~]$ hive --service hiveserver2 which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/home/atguigu/.local/bin:/home/atguigu/bin) 2022-01-04 22:01:57: Starting HiveServer2 Hive Session ID = 44d9eacb-c941-4f6f-af92-b854323b3d21 Hive Session ID = e97640ae-e1ac-40b1-a84d-150a862043cd Hive Session ID = 231724bb-f3d1-45ca-bd88-f8d0d7abe7dc Hive Session ID = 45fc01d9-0b42-43cd-b6ec-bd800c7f084a
[atguigu@hadoop102 ~]$ beeline -u jdbc:hive2://hadoop102:10000 -n atguigu Connecting to jdbc:hive2://hadoop102:10000 Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive 0: jdbc:hive2://hadoop102:10000>
# 创建数据库
0: jdbc:hive2://hadoop102:10000> create database if not exists db_hive1;
INFO : Compiling command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b): create database if not exists db_hive1
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b); Time taken: 0.016 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b): create database if not exists db_hive1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b); Time taken: 0.031 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.066 seconds)
# 修改数据库
0: jdbc:hive2://hadoop102:10000> alter database db_hive1 set dbproperties('createtime'='20170830');
INFO : Compiling command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3): alter database db_hive1 set dbproperties('createtime'='20170830')
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO : Completed compiling command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3); Time taken: 0.024 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3): alter database db_hive1 set dbproperties('createtime'='20170830')
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3); Time taken: 0.072 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.115 seconds)
# 显示数据库详细信息
0: jdbc:hive2://hadoop102:10000> describe database extended db_hive1;
INFO : Compiling command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298): describe database extended db_hive1
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:db_name, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer), FieldSchema(name:location, type:string, comment:from deserializer), FieldSchema(name:owner_name, type:string, comment:from deserializer), FieldSchema(name:owner_type, type:string, comment:from deserializer), FieldSchema(name:parameters, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298); Time taken: 0.021 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298): describe database extended db_hive1
INFO : Starting task [Stage-0:DDL] in serial mode
+-----------+----------+----------------------------------------------------+-------------+-------------+------------------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+-----------+----------+----------------------------------------------------+-------------+-------------+------------------------+
| db_hive1 | | hdfs://hadoop102:9820/user/hive/warehouse/db_hive1.db | atguigu | USER | {createtime=20170830} |
+-----------+----------+----------------------------------------------------+-------------+-------------+------------------------+
1 row selected (0.059 seconds)
INFO : Completed executing command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298); Time taken: 0.011 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
0: jdbc:hive2://hadoop102:1000
# 显示数据库信息
0> describe database db_hive1;
INFO : Compiling command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92): describe database db_hive1
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:db_name, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer), FieldSchema(name:location, type:string, comment:from deserializer), FieldSchema(name:owner_name, type:string, comment:from deserializer), FieldSchema(name:owner_type, type:string, comment:from deserializer), FieldSchema(name:parameters, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92); Time taken: 0.025 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92): describe database db_hive1
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92); Time taken: 0.012 seconds
INFO : OK
INFO : Concurrency mode is disabled, not creating a lock manager
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_hive1 | | hdfs://hadoop102:9820/user/hive/warehouse/db_hive1.db | atguigu | USER | |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.082 seconds)
4.2、创建表
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], …)]
[COMMENT table_comment]
[PARTITIonED BY (col_name data_type [COMMENT col_comment], …)]
[CLUSTERED BY (col_name, col_name, …)
[SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, …)]
[AS select_statement]
字段解释说明
(01)CREATE TABLE: 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXISTS 选项来忽略这个异常。
(02)EXTERNAL:关键字可以让用户创建一个外部表,在建表的同时可以指定一个指向实际数据的路径(LOCATION),在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。
(03)COMMENT:为表和列添加注释。
(04)PARTITIonED BY:创建分区表
(05)CLUSTERED BY:创建分桶表
(06)SORTED BY:不常用,对桶中的一个或多个列另外排序
(07)ROW FORMAT :用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED,将会使用自带的SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的SerDe,Hive通过SerDe确定表的具体的列的数据。SerDe是Serialize/Deserilize的简称, hive使用Serde进行行对象的序列与反序列化。
(08)STORED AS:指定存储文件类型。常用的存储文件类型:SEQUENCEFILE(二进制序列文件)、TEXTFILE(文本)、RCFILE(列式存储格式文件)。如果文件数据是纯文本,可以使用STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。
(09)LOCATION :指定表在HDFS上的存储位置。
(10)AS:后跟查询语句,根据查询结果创建表。
(11)LIKE:允许用户复制现有的表结构,但是不复制数据。
欢迎分享,转载请注明来源:内存溢出
微信扫一扫
支付宝扫一扫
评论列表(0条)