Site icon Tutorial

Classes

Package org.apache.hadoop.hbase.mapreduce – Provides HBase MapReduce Input/OutputFormats, a table indexing MapReduce job, and utility methods.

Interface

VisibilityExpressionResolver    – Interface to convert visibility expressions into Tags for storing along with Cells in HFiles.

Class

Job Configuration

The following is an example of using HBase as a MapReduce source in a read-only manner:

Configuration config = HBaseConfiguration.create();

config.set(                                 // speculative

“mapred.map.tasks.speculative.execution”, // execution will

“false”);                                 // decrease performance

// or damage the data

Job job = new Job(config, “ExampleRead”);

job.setJarByClass(MyReadJob.class); // class that contains mapper

Scan scan = new Scan();

scan.setCaching(500);         // 1 is the default in Scan,

// which will be bad for MapReduce jobs

scan.setCacheBlocks(false);   // don’t set to true for MR jobs

// set other scan attrs

TableMapReduceUtil.initTableMapperJob(

tableName,        // input HBase table name

scan,             // Scan instance to control CF and attribute selection

MyMapper.class,   // mapper

null,             // mapper output key

null,             // mapper output value

job);

job.setOutputFormatClass(NullOutputFormat.class); // because we

// aren’t emitting anything from mapper

boolean b = job.waitForCompletion(true);

if (!b) {

throw new IOException(“error with job!”);

}

The mapper instance would extend TableMapper, too, like this:

public static class MyMapper extends TableMapper<Text, Text> {

public void map(ImmutableBytesWritable row, Result value, Context context)

throws InterruptedException, IOException {

// process data for the row from the Result instance.

}

}

Map Tasks Number

When TableInputFormat is used (set by default with TableMapReduceUtil. initTableMapperJob(…)) to read an HBase table for input to a MapReduce job, its splitter will make a map task for each region of the table. Thus, if 100 regions are in the table, there will be 100 map tasks for the job, regardless of how many column families are selected in the Scan. To implement a different behavior (custom splitters), see the method getSplits in TableInputFormatBase (either override in custom-splitter class or use as example).

Writing to HBase

Job Configuration

The following is an example of using HBase both as a source and as a sink with MapReduce:

Configuration config = …; // configuring reading

Job job = …; // from HBase table

Scan scan = …; // is the same as in

TableMapReduceUtil // read-only example

.initTableMapperJob(…); // above

TableMapReduceUtil.initTableReducerJob(

targetTable, // output table

MyTableReducer.class, // reducer class

job);

job.setNumReduceTasks(1); // at least one, adjust as required

boolean b = job.waitForCompletion(true);

And the reducer instance would extend TableReducer, as shown here:

public static class MyTableReducer extends TableReducer<Text, IntWritable,

ImmutableBytesWritable> {

public void reduce(Text key, Iterable

values, Context context)

throws IOException, InterruptedException {

Put put = …; // data to be written

context.write(null, put);

}

}

Exit mobile version