Chapter 3. Grunt
Grunt[3] is Pig's interactive shell. It enables users to enter Pig Latin interactively, as well as provides a shell for users to interact with HDFS.
To enter Grunt invoke Pig with no script or command to run. Entering
pig -x local
will result in the prompt:
grunt>
This gives you a grunt shell to interact with your
local file system. If you omit the -x local and have a
cluster configuration set in PIG_CLASSPATH then this will
put you in a grunt shell that will interact with HDFS on your
cluster.
As you would expect with a shell, Grunt provides
command line history and editing as well as command tab completion. It
does not provide filename completion via the tab key. That is, if you
type kil Tab it will complete the
command as kill. But if you have a file
foo in your local directory and you type ls
fo Tab it will not complete it as ls
foo. This is because the response time from HDFS to connect and
find whether the file exists is too slow to be useful.
While Grunt is a useful shell, it should be remembered that it is not a full featured shell. It does not provide a number of commands found in standard Unix shells, such as pipes, redirection, and background execution.
To exit Grunt you can type quit or enter Ctrl+D.
Entering Pig Latin Scripts in Grunt
One of the main uses of Grunt is to enter Pig Latin in an interactive session. This can be particularly useful for quick sampling of your data and for prototyping new Pig Latin scripts.
You can enter Pig Latin directly into Grunt.
Pig will not start executing the Pig Latin you enter until it sees
either a store or dump. It will however do
basic syntax and semantic checking to help you catch errors quickly. If
you do make a mistake while entering a line of Pig Latin in Grunt, you
can re-enter the line, using the same alias. Pig will take the last
instance of the line you enter. For example:
pig -x local grunt> dividends = load 'NYSE_dividends' as (exchange, symbol, date, dividend); grunt> symbols = foreach dividends generate symbl; ...Error during parsing. Invalid alias: symbl ... grunt> symbols = foreach A generate symbol; ...
HDFS Commands in Grunt
Besides entering Pig Latin interactively, the other major use for Grunt is to act as a shell for HDFS. In versions 0.5 and later of Pig all hadoop fs shell commands are available. They are accessed using the keyword fs. The dash - used in the hadoop fs is also required.
grunt>fs -ls
You can see a complete guide to the commands available at http://hadoop.apache.org/common/docs/r0.20.2/hdfs_shell.html. A number of the commands come directly from Unix shells, and will operate in ways that are familiar: chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, and stat. A few of them look like Unix commands you are used to but behave slightly differently or are not familiar:
- cat
filename Print the contents of a file to stdout. You can apply this command to a directory and it will apply itself in turn to each file in the directory.
- copyFromLocal
localfilehdfsfile Copy a file from your local disk to HDFS. This is done serially, not in parallel.
- copyToLocal
hdfsfilelocalfile Copy a file from HDFS to your local disk. This is done serially, not in parallel.
- rmr
filename Remove files recursively. This is equivalent to rm -r in Unix. Use this with caution.
In versions of Pig before 0.5, hadoop fs commands were not available. Instead, Grunt had its own implementation of some of these commands: cat, cd, copyFromLocal, copyToLocal, cp, ls, mkdir, mv, pwd, rm (which acted like Hadoop's rmr, not Hadoop's rm), and rmf. As of Pig 0.8, all of these commands are still available. However, with the exception of cd and pwd these commands are deprecated in favor of using hadoop fs and may be removed at some point in the future.
In version 0.8 a new command was added to Grunt, sh. This command gives you access to the local shell, just as fs gives you access to HDFS. Simple shell commands that do not involve pipes or redirects can be executed. It is better to work with absolute paths as sh does not always properly track the current working directory.
Controlling Pig from Grunt
Grunt also provides commands for controlling Pig and MapReduce.
- kill
jobid Kill the MapReduce job associated with
jobid. You can findjobidby looking at the output of the pig command that spawned the job. It will list the jobid of each job it spawns. You can also find it by looking at Hadoop's JobTracker GUI, which lists all jobs currently running on the cluster. Note that this command kills a particular MapReduce job. If your Pig job contains other MapReduce jobs that do not depend on the killed MapReduce job, these jobs will still continue. If you want all your MapReduce jobs associated with a particular Pig job to be killed it is best to terminate the process running Pig, and then use this command to kill any MapReduce jobs that are still running. Make sure to terminate the Pig process with a Ctrl+C or a Unix kill, not a Unix kill -9. The latter does not give Pig the chance to clean up temporary files it is using and can leave garbage in your cluster.- exec [-param
param_name=param_value] [-param_filefilename]script Execute the Pig Latin script
script. Aliases defined inscriptare not imported into Grunt. This command is useful for testing your Pig Latin scripts while inside a Grunt session. For information on the -param and -param_file options see the section called “Parameter Substitution”.- run [-param
param_name=param_value] [-param_filefilename]script Execute the Pig Latin script
scriptin the current grunt shell. Thus all aliases referenced inscriptare available to Grunt and the commands inscriptare accessible via the shell history. This is another option for testing Pig Latin scripts while inside a Grunt session. For information on the -param and -param_file options see the section called “Parameter Substitution”.
[3] According to Ben Reed, one of the researchers at Yahoo! who helped start Pig, they named the shell “Grunt” because they felt the initial implementation was so limited that it was not worthy even of the name “oink”.





Add a comment



Add a comment