请选择 进入手机版 | 继续访问电脑版

Hadoop大数据——用户行为分析Hive数据分析

[复制链接]
东方龙头 发表于 2021-1-1 17:48:41 | 显示全部楼层 |阅读模式 打印 上一主题 下一主题
1.本地数据集上传到数据堆栈Hive

1.1下令

  1. sudo mkdir -R /usr/local/bigdatacase/dataset //这里会提示你输入当前用户(本教程是hadoop用户名)的暗码  //下面给hadoop用户赋予针对bigdatacase目次的各种操作权限  cd /usr/local/sudo chown -R hadoop:hadoop ./bigdatacase  cd bigdatacase  //下面就可以解压缩user.zip文件  cd ~  //表现进入hadoop用户的目次  cd 下载  ls  unzip user.zip -d /usr/local/bigdatacase/dataset  cd /usr/local/bigdatacase/dataset  ls  head -5 raw_user.csv
复制代码
1.2数据集举行预处理惩罚

1.2.1删除首行

  1. cd /usr/local/bigdatacase/dataset  //下面删除raw_user中的第1行  //1d表现删除第1行,同理,3d表现删除第3行,nd表现删除第n行sed -i '1d' raw_user.csv   //下面删除small_user中的第1行  sed -i '1d' small_user.csv  //下面再用head下令去检察文件的前5行记载,就看不到字段名称这一行了  head -5 raw_user.csv  head -5 small_user.csv  
复制代码
1.2.2 添加字段

添加随机省份字段
1.3 导入HDFS中


1.4 Hive创建数据库

  1. //这个为小数据集,下面的为大数据集hive>  create database dblab;hive>  create database dblab_big;//创建表并映射入HDFS中的文件hive>  CREATE EXTERNAL TABLE dblab.bigdata_user(id INT,uid STRING,item_id STRING,behavior_type INT,item_category STRING,visit_date DATE,province STRING) COMMENT 'Welcome to xmu dblab!' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/bigdatacase/dataset';
复制代码
到现在为止就完成了数据的预处理惩罚,添加省份,创建Hive数据库,上传到HDFS,而且映射到Hive内里
2.Hive数据分析语句

Hive数据分析就是写SQL语句对刚刚映射的表举行查询、筛选、统计等等一系列操作。
2.1 检察表的属性

uiditem_idbehavior_typeitem_categoryvisit_dateprovince1用户ID商品ID用户行为商品分类产生时间用户地理位置 uiditem_idbehavior_typeitem_categoryvisit_dateprovince110001082285259775140762014-12-08广东2100010824368907155032014-12-08湖南310001082285259788440762014-12-08北京behaviour_type属性1234代表的意思
12341浏览收藏加购物车购买2.2 SQL语句

2.2.1 2014-12-11~12号有多少条购买商品的记载

  1. //首先查询表中的数据select * from bigdata_user limit 10;//然后查询10号之后select * from bigdata_user where visit_date >'2014-12-10' limit 10;//再添加限制 13号之前select * from bigdata_user where visit_date >'2014-12-10' and visit_date '2014-12-10' and visit_date 4 limit 10;  select uid,count(*) from bigdata_user where behavior_type='4'and visit_date='2014-12-12' group by uid having count(behavior_type='4')>4 limit 10;  
复制代码
2.2.4 查询商品的购买数量和购买用户的数量,排序购买数量

查询出每个商品的购买次数 和购买人数
  1. //查询所有购买商品聚集select * from bigdata_user where behavior_type='4' limit 10;//查询每个商品的购买次数select item_id,count(1) buy_number  from bigdata_user where behavior_type='4' group by item_id  order by buy_number desc  limit 10;//查询每个商品购买的用户数量select item_id,count(distinct uid) user_number from bigdata_user where behavior_type='4' group by item_id  order by user_number desc  limit 10;//查询每个商品购买的总数量和购买的用户数量select item_id,count(1) buy_number,count(distinct uid) user_number  from bigdata_user where behavior_type='4' group by item_id  order by buy_number desc  limit 10;
复制代码
2.2.5 查询广东每日的商品购买率(购买量/浏览量)

  1. //广东每日浏览商品数select visit_date, count(*) browse_number from bigdata_user where province='广东' and behavior_type='1'  group by visit_date order by browse_number desc ;//广东每日购买览商品数select  visit_date, count(*) buy_number from bigdata_user where province='广东' and behavior_type='4'  group by visit_date order by buy_number desc ;//为以上两个创建单独的表出来//创建表,每日用户浏览数量create table browse_user as select  visit_date, count(*) browse_number from bigdata_user where province='广东' and behavior_type='1'  group by visit_date order by browse_number desc ;//创建表,每日用户购买数量create table buy_user as select  visit_date, count(*) buy_number from bigdata_user where province='广东' and behavior_type='4'  group by visit_date order by buy_number desc ;//每日浏览购买乐成率//通过左联查询出每日的浏览量和购买量,而且算出每日购买率select browse_user.visit_date, browse_number,buy_number,buy_number/browse_number buy_rate from browse_user  left join buy_user  on browse_user.visit_date=buy_user.visit_date;
复制代码
2.2.6查询2019-11-11每个省的浏览数量,购买数量和比例

  1. //查询2019-12-12每个省的浏览数量,购买数量和比例//a查询2014-12-12每个省的浏览数量select province,count(1) browse_number from bigdata_user where visit_date='2014-12-12' and behavior_type='1' group by province order by browse_number desc;//b查询2014-12-12每个省的购买数量select province,count(1) buy_number from bigdata_user where visit_date='2014-12-12' and behavior_type='4' group by province order by buy_number desc;//以下语句太长格式为select a.province,a.browse_number,b.buy_number,两个相除 from (表格a) a left join (表格b) b on a.province=b.province;//一条语句//直接显示每个省浏览数量、购买数量、购买率select a.province,a.browse_number,b.buy_number,b.buy_number/a.browse_number rate from  ( select province,count(1) browse_number from bigdata_user where visit_date='2014-12-12' and behavior_type='1' group by province order by browse_number desc) a left join ( select province,count(1) buy_number from bigdata_user where visit_date='2014-12-12' and behavior_type='4' group by province order by buy_number desc ) b on a.province=b.province;
复制代码
2.2.7用户10001082在2014-12-12号当天生动度:该用户点击行为占该天所有点击行为的比例

  1. //该天所有点击行为数量select count(*) users from bigdata_user where visit_date='2014-12-12' and behavior_type='1';//该用户的点击量select count(*) user_number from bigdata_user where visit_date='2014-12-12' and behavior_type='1' and uid=10001082;//联合上面两个语句,显示用户点击量、当天点击量和比例select '10001082' uid,a.users,b.user_number,b.user_number/a.users rate from ( select count(*) users from bigdata_user where visit_date='2014-12-12' and behavior_type='1' ) a ,( select count(*) user_number  from bigdata_user where visit_date='2014-12-12' and behavior_type='1' and uid=10001082  ) b;
复制代码
2.2.8查询出用户最喜欢的商品集(收藏)

  1. //查询出用户最喜欢的商品集select item_id,count(1) collect from bigdata_user where behavior_type='2'  group by item_id order by collect desc limit 20 ;
复制代码
2.2.9某个地域的用户当天浏览网站的次数

  1. //查询出每个省浏览网站的次数select province,count(behavior_type) count from bigdata_user where behavior_type='1' group by province;//把效果存入表格create table browse_count as select province,count(behavior_type) count from bigdata_user where behavior_type='1' group by province;//检察效果select * from browse_count;
复制代码
2.2.10查询潜力最大的商品(购物车)

  1. //查询加入购物车最多的商品select item_id,count(behavior_type) cart_number from bigdata_user where behavior_type='3'  group by item_id order by cart_number desc limit 20;
复制代码
来源:https://blog.csdn.net/Luckly_smile/article/details/111773580
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

发布主题

专注素材教程免费分享
全国免费热线电话

18768367769

周一至周日9:00-23:00

反馈建议

27428564@qq.com 在线QQ咨询

扫描二维码关注我们

Powered by Discuz! X3.4© 2001-2013 Comsenz Inc.( 蜀ICP备2021001884号-1 )