Flume monitoring

The Flume agents can be monitored individually by adding two parameters:

flume-ng agent -n agent_name -c conf -f conf config.conf -Dflume.monitoring.type=http -Dflume.monitoring.port=19256

The parameters flume.monitoring.type=http and flume.monitoring.port=24105 enable JSON monitoring.

The metrics are retrieved with following URL: http://<ip-address-agent:24105/metrics

Example of a response:

{
    "SOURCE.http_traffic":{"OpenConnectionCount":"0","Type":"SOURCE","AppendBatchAcceptedCount":"2561700","AppendBatchReceivedCount":"2561700","EventAcceptedCount":"2561700","AppendReceivedCount":"0","StopTime":"0","StartTime":"1504012941615","EventReceivedCount":"2561700","AppendAcceptedCount":"0"},
    "SINK.k4":{"Type":"SINK","ConnectionClosedCount":"0","EventDrainSuccessCount":"2561700","KafkaEventSendTimer":"17461960","ConnectionFailedCount":"0","BatchCompleteCount":"0","EventDrainAttemptCount":"0","ConnectionCreatedCount":"0","BatchEmptyCount":"679409","StopTime":"0","RollbackCount":"0","StartTime":"1504012941570","BatchUnderflowCount":"3942"},
    "CHANNEL.c4":{"EventPutSuccessCount":"2561700","ChannelFillPercentage":"0.0","Type":"CHANNEL","StopTime":"0","EventPutAttemptCount":"2561700","ChannelSize":"0","StartTime":"1504012941382","EventTakeSuccessCount":"2561700","ChannelCapacity":"5000","EventTakeAttemptCount":"3245052"},
    "CHANNEL.c1":{"EventPutSuccessCount":"2561700","ChannelFillPercentage":"0.0","Type":"CHANNEL","StopTime":"0","EventPutAttemptCount":"2561700","ChannelSize":"0","StartTime":"1504012941382","EventTakeSuccessCount":"2561700","ChannelCapacity":"5000","EventTakeAttemptCount":"3242098"},
    "CHANNEL.c3":{"EventPutSuccessCount":"2561700","ChannelFillPercentage":"0.0","Type":"CHANNEL","StopTime":"0","EventPutAttemptCount":"2561700","ChannelSize":"0","StartTime":"1504012941381","EventTakeSuccessCount":"2561700","ChannelCapacity":"5000","EventTakeAttemptCount":"3245036"},
    "SINK.k3":{"BatchCompleteCount":"22260","ConnectionFailedCount":"15","EventDrainAttemptCount":"2561701","ConnectionCreatedCount":"2228","Type":"SINK","BatchEmptyCount":"679389","ConnectionClosedCount":"2223","EventDrainSuccessCount":"2561700","StopTime":"0","StartTime":"1504012941383","BatchUnderflowCount":"3942"}
}

The source metrics are listen in the next table.

Table: Source metrics
Metric Description
EventReceivedCount The total number of events that the source has received until now.
EventAcceptedCount The total number of events where the event was successfully written out to the channel and the source returned success to the sink/RPC client/system that created the event.
AppendReceivedCount The total number of events that came in with only one event per batch (the equivalent of an append call in RPC calls).
AppendAcceptedCount The total number of events that came in individually that were written to the channel and returned successfully.
AppendBatchReceivedCount The total number of batches of events received.
AppendBatchAcceptedCount The total number of batches successfully committed to the channel.
StartTime Milliseconds since the epoch when the source was started.
StopTime Milliseconds since the epoch when the source was stopped.
OpenConnectionCount The number of connections currently open with clients/sinks (only an Avro Source currently exposes this). Type For sources, this always returns SOURCE.

The next table gives more information on the channel metrics.

Table: Channel metrics
Metric Description
ChannelSize The total number of events currently in the channel.
EventPutAttemptCount The total number of events the source(s) attempted to write to the channel.
EventPutSuccessCount The total number of events that were successfully written and committed to the channel.
EventTakeAttemptCount The total number of times the sink(s) attempted to read events from the channel. This does not mean that events were returned each time, since sinks might poll and the channel might not have any data.
EventTakeSuccessCount The total number of events that were successfully taken by the sink(s).
StartTime Milliseconds since the epoch when the channel was started.
StopTime Milliseconds since the epoch when the channel was stopped.
ChannelCapacity The capacity of the channel.
ChannelFillPercentage The percentage of the channel that is full. Type For channels, this always returns CHANNEL.

The Sink metrics given are:

Table: Sink metrics
Metric Description
ConnectionCreatedCount The number of connections created with the next hop or storage system (like when a new file is created on HDFS).
ConnectionClosedCount The number of connections closed with the next hop or storage system (like when a file on HDFS is closed).
ConnectionFailedCount The number of connections that were closed due to an error with the next hop or storage system (like when a new file on HDFS is closed because of timeouts).
BatchEmptyCount The number of batches that were empty—a high number indicates that the sources are writing data slower than the sinks are clearing it.
BatchUnderflowCount The number of batches that were smaller than the maximum batch size this sink is configured to use—this also indicates sinks are faster than sources if it’s high.
BatchCompleteCount The number of batches that were equal to the maximum batch size.
EventDrainAttemptCount The total number of events the sink tried to write out to storage.
EventDrainSuccessCount The total number of events that the sink successfully wrote out to storage.
StartTime Milliseconds since the epoch when the sink was started.
StopTime Milliseconds since the epoch when the sink was stopped. Type For sinks, this always returns SINK.

Note that Flume monitoring is also available with Cloudera Manager or Hortonworks Ganglia.

Leave a Reply