Flume Agent与HDFS数据存储流程配置及验证

目标场景

本文将介绍如何配置Flume Agent并使用curl发送HTTP POST请求，将数据存储到HDFS中。通过本文的步骤操作，用户能够实现基于时间戳的数据存储到HDFS的目标。

Flume Agent配置

Flume Agent的配置主要包括以下几个方面：

源（Source）配置

数据类型：http

绑定地址：master

端口：6666

处理器：JSONHandler

a1.sources.r1.type = http  a1.sources.r1.bind = master  a1.sources.r1.port = 6666  a1.sources.r1.handler = org.apache.flume.source.http.JSONHandler

拦截器（Interceptor）配置

类型：timestamp
-.preserveExisting：false

a1.sources.r1.interceptors = i1  a1.sources.r1.interceptors.i1.type = timestamp  a1.sources.r1.interceptors.i1.preserveExisting = false

下沉（Sink）配置

类型：hdfs

路径：hdfs://master:9000/flume/%Y-%m-%d

文件类型：DataStream

a1.sinks.k1.type = hdfs  a1.sinks.k1.hdfs.path = hdfs://master:9000/flume/%Y-%m-%d  a1.sinks.k1.hdfs.useLocalTimeStamp = true  a1.sinks.k1.hdfs.filePrefix = interceptor  a1.sinks.k1.hdfs.fileType = DataStream  a1.sinks.k1.hdfs.writeFormat = Text  a1.sinks.k1.hdfs.rollSize = 102400000  a1.sinks.k1.hdfs.rollCount = 5  a1.sinks.k1.hdfs.rollInterval = 0

通道（Channel）配置

类型：memory

容量：1000

a1.channels.c1.type = memory  a1.channels.c1.capacity = 1000  a1.channels.c1.transactionCapacity = 100

源与下沉绑定

a1.sources.r1.channels = c1  a1.sinks.k1.channel = c1

curl命令，模拟发送HTTP请求(POST方法)

使用curl命令发送HTTP POST请求，数据格式为JSON：

curl -X POST -d '[{"headers":{}, "body":"timestamp teset 001"}]' http://master:6666

说明：

-X POST：使用HTTP POST方法

-d：指定发送的JSON数据

http://master:6666：目标服务器地址和端口

检查HDFS上基于event时间戳信息的目录是否成功创建

运行curl命令后，Flume Agent打印日志，提示基于时间戳的HDFS目录正在创建

检查HDFS目录

假设配置正确，运行curl命令后，HDFS上应创建出类似于hdfs://master:9000/flume/2023-10-01的目录。

数据存储验证

数据将被存储在上述目录中，文件名为interceptor-000000000000000001.json，内容为{"body":"timestamp teset 001"}。

总结

通过以上步骤，可以实现数据从HTTP源发送到HDFS存储的流程。配置Flume Agent和使用curl命令是实现此流程的关键步骤。确保所有配置参数正确无误，才能顺利完成数据存储任务。

转载地址：http://oirfk.baihongyu.com/

你可能感兴趣的文章

mysql:Can‘t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock‘解决方法