Site icon Tutorial

Pig Installation and Modes

Installation

Pig core is written in Java and it works across operating systems. Pig’s shell, which executes the commands from the user, is a bash script and requires a UNIX system. Pig can also be run on Windows using Cygwin and Perl packages.

Java 1.6 is also mandatory for Pig to run. Optionally, the following can be installed on the same machine: Python 2.5, JavaScript 1.7, Ant 1.7, and JUnit 4.5. Python and JavaScript are for writing custom UDFs. Ant and JUnit are for builds and unit testing, respectively. Pig can be executed with different versions of Hadoop by setting HADOOP_HOME to point to the directory where we have installed Hadoop. If HADOOP_HOME is not set, Pig will run with the embedded version by default, which is currently Hadoop 1.0.0.

Requirements

Mandatory – Unix and Windows users need the following:

Optional

Download Pig

To get a Pig distribution, do the following:

$ export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATH

Build Pig

To build pig, do the following:

Pig Modes

You can run Pig (execute Pig Latin statements and Pig commands) using various modes.

Local ModeTez Local ModeMapreduce ModeTez Mode
Interactive Modeyesexperimentalyesyes
Batch Modeyesexperimentalyesyes

Execution Modes

Pig has two execution modes or exectypes:

You can run Pig in either mode using the “pig” command (the bin/pig Perl script) or the “java” command (java -cp pig.jar …).

Example

This example shows how to run Pig in local and mapreduce mode using the pig command.

/* local mode */

$ pig -x local …

/* Tez local mode */

$ pig -x tez_local …

/* mapreduce mode */

$ pig …

or

$ pig -x mapreduce …

/* Tez mode */

$ pig -x tez …

Interactive Mode

You can run Pig in interactive mode using the Grunt shell. Invoke the Grunt shell using the “pig” command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.

Example

These Pig Latin statements extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, invoke the Grunt shell by typing the “pig” command (in local or hadoop mode). Then, enter the Pig Latin statements interactively at the grunt prompt (be sure to include the semicolon after each statement). The DUMP operator will display the results to your terminal screen.

grunt> A = load ‘passwd’ using PigStorage(‘:’);

grunt> B = foreach A generate $0 as id;

grunt> dump B;

Local Mode

$ pig -x local

… – Connecting to …

grunt>

Tez Local Mode

$ pig -x tez_local

… – Connecting to …

grunt>

Mapreduce Mode

$ pig -x mapreduce

… – Connecting to …

grunt>

or

$ pig

… – Connecting to …

grunt>

Tez Mode

$ pig -x tez

… – Connecting to …

grunt>

Batch Mode

You can run Pig in batch mode using Pig scripts and the “pig” command (in local or hadoop mode).

Example

The Pig Latin statements in the Pig script (id.pig) extract all user IDs from the /etc/passwd file. First, copy the /etc/passwd file to your local working directory. Next, run the Pig script from the command line (using local or mapreduce mode). The STORE operator will write the results to a file (id.out).

/* id.pig */

A = load ‘passwd’ using PigStorage(‘:’); — load the passwd file

B = foreach A generate $0 as id; — extract the user IDs

store B into ‘id.out’; — write the results to a file name id.out

Local Mode

$ pig -x local id.pig

Tez Local Mode

$ pig -x tez_local id.pig

Mapreduce Mode

$ pig id.pig

or

$ pig -x mapreduce id.pig

Tez Mode

$ pig -x tez id.pig

Exit mobile version