9781449302641
grunt.html

Chapter 3. Grunt

Grunt[3] is Pig's interactive shell. It enables users to enter Pig Latin interactively, as well as provides a shell for users to interact with HDFS.

To enter Grunt invoke Pig with no script or command to run. Entering

pig -x local

will result in the prompt:

grunt>

This gives you a grunt shell to interact with your local file system. If you omit the -x local and have a cluster configuration set in PIG_CLASSPATH then this will put you in a grunt shell that will interact with HDFS on your cluster.

As you would expect with a shell, Grunt provides command line history and editing as well as command tab completion. It does not provide filename completion via the tab key. That is, if you type kil Tab it will complete the command as kill. But if you have a file foo in your local directory and you type ls fo Tab it will not complete it as ls foo. This is because the response time from HDFS to connect and find whether the file exists is too slow to be useful.

While Grunt is a useful shell, it should be remembered that it is not a full featured shell. It does not provide a number of commands found in standard Unix shells, such as pipes, redirection, and background execution.

To exit Grunt you can type quit or enter Ctrl+D.

Entering Pig Latin Scripts in Grunt

One of the main uses of Grunt is to enter Pig Latin in an interactive session. This can be particularly useful for quick sampling of your data and for prototyping new Pig Latin scripts.

You can enter Pig Latin directly into Grunt. Pig will not start executing the Pig Latin you enter until it sees either a store or dump. It will however do basic syntax and semantic checking to help you catch errors quickly. If you do make a mistake while entering a line of Pig Latin in Grunt, you can re-enter the line, using the same alias. Pig will take the last instance of the line you enter. For example:

pig  -x local
grunt> dividends = load 'NYSE_dividends' as (exchange, symbol, date, dividend);
grunt> symbols = foreach dividends generate symbl;
...Error during parsing. Invalid alias: symbl ...
grunt> symbols = foreach A generate symbol;
...

HDFS Commands in Grunt

Besides entering Pig Latin interactively, the other major use for Grunt is to act as a shell for HDFS. In versions 0.5 and later of Pig all hadoop fs shell commands are available. They are accessed using the keyword fs. The dash - used in the hadoop fs is also required.

grunt>fs -ls

You can see a complete guide to the commands available at http://hadoop.apache.org/common/docs/r0.20.2/hdfs_shell.html. A number of the commands come directly from Unix shells, and will operate in ways that are familiar: chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, and stat. A few of them look like Unix commands you are used to but behave slightly differently or are not familiar:

cat filename

Print the contents of a file to stdout. You can apply this command to a directory and it will apply itself in turn to each file in the directory.

copyFromLocal localfile hdfsfile

Copy a file from your local disk to HDFS. This is done serially, not in parallel.

copyToLocal hdfsfile localfile

Copy a file from HDFS to your local disk. This is done serially, not in parallel.

rmr filename

Remove files recursively. This is equivalent to rm -r in Unix. Use this with caution.

In versions of Pig before 0.5, hadoop fs commands were not available. Instead, Grunt had its own implementation of some of these commands: cat, cd, copyFromLocal, copyToLocal, cp, ls, mkdir, mv, pwd, rm (which acted like Hadoop's rmr, not Hadoop's rm), and rmf. As of Pig 0.8, all of these commands are still available. However, with the exception of cd and pwd these commands are deprecated in favor of using hadoop fs and may be removed at some point in the future.

In version 0.8 a new command was added to Grunt, sh. This command gives you access to the local shell, just as fs gives you access to HDFS. Simple shell commands that do not involve pipes or redirects can be executed. It is better to work with absolute paths as sh does not always properly track the current working directory.

Controlling Pig from Grunt

Grunt also provides commands for controlling Pig and MapReduce.

kill jobid

Kill the MapReduce job associated with jobid. You can find jobid by looking at the output of the pig command that spawned the job. It will list the jobid of each job it spawns. You can also find it by looking at Hadoop's JobTracker GUI, which lists all jobs currently running on the cluster. Note that this command kills a particular MapReduce job. If your Pig job contains other MapReduce jobs that do not depend on the killed MapReduce job, these jobs will still continue. If you want all your MapReduce jobs associated with a particular Pig job to be killed it is best to terminate the process running Pig, and then use this command to kill any MapReduce jobs that are still running. Make sure to terminate the Pig process with a Ctrl+C or a Unix kill, not a Unix kill -9. The latter does not give Pig the chance to clean up temporary files it is using and can leave garbage in your cluster.

exec [-param param_name = param_value] [-param_file filename] script

Execute the Pig Latin script script. Aliases defined in script are not imported into Grunt. This command is useful for testing your Pig Latin scripts while inside a Grunt session. For information on the -param and -param_file options see the section called “Parameter Substitution”.

run [-param param_name = param_value] [-param_file filename] script

Execute the Pig Latin script script in the current grunt shell. Thus all aliases referenced in script are available to Grunt and the commands in script are accessible via the shell history. This is another option for testing Pig Latin scripts while inside a Grunt session. For information on the -param and -param_file options see the section called “Parameter Substitution”.



[3] According to Ben Reed, one of the researchers at Yahoo! who helped start Pig, they named the shell “Grunt” because they felt the initial implementation was so limited that it was not worthy even of the name “oink”.

Site last updated on: August 10, 2011 at 10:50:07 AM PDT