Package org.apache.hadoop.hbase.mapreduce – Provides HBase MapReduce Input/OutputFormats, a table indexing MapReduce job, and utility methods.
Interface
VisibilityExpressionResolver – Interface to convert visibility expressions into Tags for storing along with Cells in HFiles.
Class
- CellCounter – A job with a a map and reduce phase to count cells in a table.
- CellCreator – Facade to create Cells for HFileOutputFormat.
- CellSerialization
- CellSortReducer – Emits sorted Cells.
- CopyTable – Tool used to copy a table to another one which can be on a different setup.
- Export – Export an HBase table.
- GroupingTableMapper – Extract grouping columns from input record.
- HFileOutputFormat2 – Writes HFiles.
- HRegionPartitioner<KEY,VALUE> – This is used to partition the output keys into groups of keys.
- IdentityTableMapper – Pass the given key and record as-is to the reduce phase.
- IdentityTableReducer – Convenience class that simply writes all values (which must be Put or Delete instances) passed to it out to the configured HBase table.
- Import – Import data written by Export.
- ImportTsv – Tool to import data from a TSV file.
- LoadIncrementalHFiles
- Deprecated
- As of release 2.0.0, this will be removed in HBase 3.0.0.
- LoadQueueItem
- Deprecated
- As of release 2.0.0, this will be removed in HBase 3.0.0.
- MultiTableHFileOutputFormat – Create 3 level tree directory, first level is using table name as parent directory and then use family name as child directory, and all related HFiles for one family are under child directory -tableName1 -columnFamilyName1 -columnFamilyName2 -HFiles -tableName2 -columnFamilyName1 -HFiles -columnFamilyName2
- MultiTableInputFormat – Convert HBase tabular data from multiple scanners into a format that is consumable by Map/Reduce.
- MultiTableInputFormatBase – A base for MultiTableInputFormats.
- MultiTableOutputFormat – Hadoop output format that writes to one or more HBase tables.
- MultiTableSnapshotInputFormat – MultiTableSnapshotInputFormat generalizes TableSnapshotInputFormat allowing a MapReduce job to run over one or more table snapshots, with one or more scans configured for each.
- MutationSerialization
- PutCombiner<K> – Combine Puts.
- PutSortReducer – Emits sorted Puts.
- ResultSerialization
- RowCounter – A job with a just a map phase to count rows.
- SimpleTotalOrderPartitioner<VALUE> – A partitioner that takes start and end keys and uses bigdecimal to figure which reduce a key belongs to.
- TableInputFormat – Convert HBase tabular data into a format that is consumable by Map/Reduce.
- TableInputFormatBase – A base for TableInputFormats.
- TableMapper<KEYOUT,VALUEOUT> – Extends the base Mapper class to add the required input key and value classes.
- TableMapReduceUtil – Utility for TableMapper and TableReducer
- TableOutputCommitter – Small committer class that does not do anything.
- TableOutputFormat<KEY> – Convert Map/Reduce output and write it to an HBase table.
- TableRecordReader – Iterate over an HBase table data, return (ImmutableBytesWritable, Result) pairs.
- TableRecordReaderImpl – Iterate over an HBase table data, return (ImmutableBytesWritable, Result) pairs.
- TableReducer<KEYIN,VALUEIN,KEYOUT> – Extends the basic Reducer class to add the required key and value input/output classes.
- TableSnapshotInputFormat – TableSnapshotInputFormat allows a MapReduce job to run over a table snapshot.
- TableSplit – A table split corresponds to a key range (low, high) and an optional scanner.
- TextSortReducer – Emits Sorted KeyValues.
- TsvImporterMapper – Write table content out to files in hdfs.
- TsvImporterTextMapper – Write table content out to map output files.
- WALInputFormat – Simple InputFormat for WAL files.
- WALPlayer – A tool to replay WAL files as a M/R job.
Job Configuration
The following is an example of using HBase as a MapReduce source in a read-only manner:
Configuration config = HBaseConfiguration.create();
config.set( // speculative
“mapred.map.tasks.speculative.execution”, // execution will
“false”); // decrease performance
// or damage the data
Job job = new Job(config, “ExampleRead”);
job.setJarByClass(MyReadJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan,
// which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don’t set to true for MR jobs
// set other scan attrs
…
TableMapReduceUtil.initTableMapperJob(
tableName, // input HBase table name
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper
null, // mapper output key
null, // mapper output value
job);
job.setOutputFormatClass(NullOutputFormat.class); // because we
// aren’t emitting anything from mapper
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException(“error with job!”);
}
The mapper instance would extend TableMapper, too, like this:
public static class MyMapper extends TableMapper<Text, Text> {
public void map(ImmutableBytesWritable row, Result value, Context context)
throws InterruptedException, IOException {
// process data for the row from the Result instance.
}
}
Map Tasks Number
When TableInputFormat is used (set by default with TableMapReduceUtil. initTableMapperJob(…)) to read an HBase table for input to a MapReduce job, its splitter will make a map task for each region of the table. Thus, if 100 regions are in the table, there will be 100 map tasks for the job, regardless of how many column families are selected in the Scan. To implement a different behavior (custom splitters), see the method getSplits in TableInputFormatBase (either override in custom-splitter class or use as example).
Writing to HBase
Job Configuration
The following is an example of using HBase both as a source and as a sink with MapReduce:
Configuration config = …; // configuring reading
Job job = …; // from HBase table
Scan scan = …; // is the same as in
TableMapReduceUtil // read-only example
.initTableMapperJob(…); // above
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
MyTableReducer.class, // reducer class
job);
job.setNumReduceTasks(1); // at least one, adjust as required
boolean b = job.waitForCompletion(true);
And the reducer instance would extend TableReducer, as shown here:
public static class MyTableReducer extends TableReducer<Text, IntWritable,
ImmutableBytesWritable> {
public void reduce(Text key, Iterable
values, Context context)
throws IOException, InterruptedException {
…
Put put = …; // data to be written
context.write(null, put);
…
}
}
Apply for HBase Certification Now!!
http://www.vskills.in/certification/Certified-HBase-Professional
