{"id":20974,"date":"2013-05-11T11:08:10","date_gmt":"2013-05-11T05:38:10","guid":{"rendered":"http:\/\/vskills.in\/certification\/tutorial\/?p=20974"},"modified":"2024-04-12T14:16:19","modified_gmt":"2024-04-12T08:46:19","slug":"udf-and-data-processing-operator-3","status":"publish","type":"page","link":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/","title":{"rendered":"Hadoop &#038; Mapreduce Tutorials | UDF and data processing operator"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\">UDF and Data Processing Operator<\/h1>\n\n\n\n<p>Pig Latin also provides three statements\u2014REGISTER, DEFINE, and IMPORT\u2014that make it &nbsp;possible to incorporate macros and user-defined functions into Pig scripts. REGISTER, Registers a JAR file with the Pig runtime, DEFINE, Creates an alias for a macro, UDF, streaming script, or command specification. IMPORT, Imports macros defined in a separate file into a script.<\/p>\n\n\n\n<p>The&nbsp; DEFINE statement&nbsp; is used to assign an alias to an external executable or a UDF function. Use this statement if you want to have a crisp name for a function that has a lengthy package name.<\/p>\n\n\n\n<p>For a STREAM command, DEFINE plays an important role to transfer the executable to the task nodes of the Hadoop cluster. This is accomplished using the SHIP clause of the DEFINE operator.<\/p>\n\n\n\n<p>UDFs<\/p>\n\n\n\n<p>Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.<\/p>\n\n\n\n<p><strong>Registering UDFs<\/strong><\/p>\n\n\n\n<p><u>Registering Java UDFs:<\/u><\/p>\n\n\n\n<p>&#8212;register_java_udf.pig<\/p>\n\n\n\n<p>register &#8216;your_path_to_piggybank\/piggybank.jar&#8217;;<\/p>\n\n\n\n<p>divs &nbsp;&nbsp;= load &#8216;NYSE_dividends&#8217; as (exchange:chararray, symbol:chararray,<\/p>\n\n\n\n<p>date:chararray, dividends:float);<\/p>\n\n\n\n<p><u>Registering Python UDFs (The Python script must be in your current directory):<\/u><\/p>\n\n\n\n<p>&#8211;register_python_udf.pig<\/p>\n\n\n\n<p>register &#8216;production.py&#8217; using jython as bballudfs;<\/p>\n\n\n\n<p>players&nbsp; = load &#8216;baseball&#8217; as (name:chararray, team:chararray,<\/p>\n\n\n\n<p>pos:bag{t:(p:chararray)}, bat:map[]);<\/p>\n\n\n\n<p><strong>Writing UDFs<\/strong><\/p>\n\n\n\n<p><u>Java UDFs:<\/u><\/p>\n\n\n\n<p>package myudfs;<\/p>\n\n\n\n<p>import java.io.IOException;<\/p>\n\n\n\n<p>import org.apache.pig.EvalFunc;<\/p>\n\n\n\n<p>import org.apache.pig.data.Tuple;<\/p>\n\n\n\n<p>public class UPPER extends EvalFunc<\/p>\n\n\n\n<p>{<\/p>\n\n\n\n<p>public String exec(Tuple input) throws IOException {<\/p>\n\n\n\n<p>if (input == null || input.size() == 0)<\/p>\n\n\n\n<p>return null;<\/p>\n\n\n\n<p>try{<\/p>\n\n\n\n<p>String str = (String)input.get(0);<\/p>\n\n\n\n<p>return str.toUpperCase();<\/p>\n\n\n\n<p>}catch(Exception e){<\/p>\n\n\n\n<p>throw new IOException(&#8220;Caught exception processing input row &#8220;, e);<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p>}<\/p>\n\n\n\n<p><u>Python UDFs<\/u><\/p>\n\n\n\n<p>#Square &#8211; Square of a number of any data type<\/p>\n\n\n\n<p>@outputSchemaFunction(&#8220;squareSchema&#8221;) &#8212; Defines a script delegate function that defines schema for this function depending upon the input type.<\/p>\n\n\n\n<p>def square(num):<\/p>\n\n\n\n<p>return ((num)*(num))<\/p>\n\n\n\n<p>@schemaFunction(&#8220;squareSchema&#8221;) &#8211;Defines delegate function and is not registered to Pig.<\/p>\n\n\n\n<p>def squareSchema(input):<\/p>\n\n\n\n<p>return input<\/p>\n\n\n\n<p>#Percent- Percentage<\/p>\n\n\n\n<p>@outputSchema(&#8220;percent:double&#8221;) &#8211;Defines schema for a script UDF in a format that Pig understands and is able to parse<\/p>\n\n\n\n<p>def percent(num, total):<\/p>\n\n\n\n<p>return num * 100 \/ total<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Apply for Big Data and Hadoop Developer Certification<\/strong><\/h3>\n\n\n\n<p><a href=\"https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer\">https:\/\/www.vskills.in\/certification\/certified-big-data-and-apache-hadoop-developer<\/a><\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><a href=\"https:\/\/www.vskills.in\/certification\/tutorial\/certified-hadoop-mapreduce\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Back to Tutorials<\/strong><\/a><\/h4>\n","protected":false},"excerpt":{"rendered":"<p>UDF and Data Processing Operator Pig Latin also provides three statements\u2014REGISTER, DEFINE, and IMPORT\u2014that make it &nbsp;possible to incorporate macros and user-defined functions into Pig scripts. REGISTER, Registers a JAR file with the Pig runtime, DEFINE, Creates an alias for a macro, UDF, streaming script, or command specification. IMPORT, Imports macros defined in a separate&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"categories":[65],"tags":[],"class_list":["post-20974","page","type-page","status-publish","hentry","category-hadoop-and-mapreduce"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hadoop &amp; Mapreduce Tutorials | UDF and data processing operator<\/title>\n<meta name=\"description\" content=\"Pig provides extensive support for user defined functions (UDF) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hadoop &amp; Mapreduce Tutorials | UDF and data processing operator\" \/>\n<meta property=\"og:description\" content=\"Pig provides extensive support for user defined functions (UDF) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/\" \/>\n<meta property=\"og:site_name\" content=\"Tutorial\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vskills.in\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-12T08:46:19+00:00\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/\",\"name\":\"Hadoop & Mapreduce Tutorials | UDF and data processing operator\",\"isPartOf\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\"},\"datePublished\":\"2013-05-11T05:38:10+00:00\",\"dateModified\":\"2024-04-12T08:46:19+00:00\",\"description\":\"Pig provides extensive support for user defined functions (UDF) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hadoop &#038; Mapreduce Tutorials | UDF and data processing operator\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#website\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"name\":\"Tutorial\",\"description\":\"Vskills - A initiative in elearning and certification\",\"publisher\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#organization\",\"name\":\"Vskills\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"contentUrl\":\"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg\",\"width\":73,\"height\":55,\"caption\":\"Vskills\"},\"image\":{\"@id\":\"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vskills.in\/\",\"https:\/\/x.com\/vskills_in\",\"https:\/\/www.linkedin.com\/company-beta\/1371554\/\",\"https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hadoop & Mapreduce Tutorials | UDF and data processing operator","description":"Pig provides extensive support for user defined functions (UDF) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/","og_locale":"en_US","og_type":"article","og_title":"Hadoop & Mapreduce Tutorials | UDF and data processing operator","og_description":"Pig provides extensive support for user defined functions (UDF) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.","og_url":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/","og_site_name":"Tutorial","article_publisher":"https:\/\/www.facebook.com\/vskills.in\/","article_modified_time":"2024-04-12T08:46:19+00:00","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/","name":"Hadoop & Mapreduce Tutorials | UDF and data processing operator","isPartOf":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website"},"datePublished":"2013-05-11T05:38:10+00:00","dateModified":"2024-04-12T08:46:19+00:00","description":"Pig provides extensive support for user defined functions (UDF) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby.","breadcrumb":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/udf-and-data-processing-operator-3\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.vskills.in\/certification\/tutorial\/"},{"@type":"ListItem","position":2,"name":"Hadoop &#038; Mapreduce Tutorials | UDF and data processing operator"}]},{"@type":"WebSite","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#website","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","name":"Tutorial","description":"Vskills - A initiative in elearning and certification","publisher":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.vskills.in\/certification\/tutorial\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#organization","name":"Vskills","url":"https:\/\/www.vskills.in\/certification\/tutorial\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/","url":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","contentUrl":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-content\/uploads\/2017\/07\/vskills-min-logo.jpg","width":73,"height":55,"caption":"Vskills"},"image":{"@id":"https:\/\/www.vskills.in\/certification\/tutorial\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vskills.in\/","https:\/\/x.com\/vskills_in","https:\/\/www.linkedin.com\/company-beta\/1371554\/","https:\/\/www.youtube.com\/channel\/UCMWnscxPwRF_PqXo9B7q_Tw"]}]}},"_links":{"self":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20974","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/comments?post=20974"}],"version-history":[{"count":7,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20974\/revisions"}],"predecessor-version":[{"id":127282,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/pages\/20974\/revisions\/127282"}],"wp:attachment":[{"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/media?parent=20974"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/categories?post=20974"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.vskills.in\/certification\/tutorial\/wp-json\/wp\/v2\/tags?post=20974"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}