Book Description:
Apache Hive Pdf
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop’s data warehouse infrastructure. You’ll quickly learn how to use Hive’s SQL dialect – HiveQL – to summarize, query, and analyze large datasets stored in Hadoop’s distributed filesystem.
Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases). Next you need to unpack the tarball. This will result in the creation of a subdirectory named hive-x.y.z (where x.y.z is the release number).
This example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You’ll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data.
This section contains details about setting up and using the ODBC Connector for Hive.
- Setup Hive on Mac. This tutorial will provide step by step instruction to setup Hive on Mac OS. Download Apache Hive You can click here to download apache Hive 3.0.0.
- Apache Hive Database Query Tool and SQL Editor Features. The Apache Hive SQL Editor and Database Query Tool provided by RazorSQL includes a custom Apache Hive database browser tailored to Hive, an Apache Hive sql editor, a Hive table editor, a Hive query builder, and other tools for working with Hive such as a Hive create table tool and Hive data searching tool as well as much more.
Before You Begin
The MapR Hive ODBC Connector is an ODBC driver for Apache Hive 0.7.0 and later that complies with the ODBC 3.52 specification. You can download the Hive ODBC connector from https://package.mapr.com/tools/MapR-ODBC/MapR_Hive/. After downloading the driver, refer to documentation for Hive ODBC Driver to install and configure the driver.
The Hive ODBC Driver supports the following Advanced Options:
Download Manager For Mac
- Enable Auto Reconnect
- Driver Config Take Precedence
- Fast SQL Prepare
- Get Tables With Query
- Invalid Session Auto Recover
- Show System Table
- Socket Timeout
- Default String Column Length
- Rows Fetched Per Block
- Use Native Query
To use the ODBC driver, configure a Data Source Name (DSN), a definition that specifies how to connect to Hive. DSNs are typically managed by the operating system and may be used by multiple applications. Some applications do not use DSNs. You will need to refer to your particular application’s documentation to understand how it connects using ODBC.
The standard query language for ODBC is SQL. HiveQL, the standard query language for Hive, includes a subset of ANSI SQL-92. Applications that connect to Hive using ODBC may need queries altered if the queries use SQL features that are not present in Hive. Applications that use SQL will recognize HiveQL, but might not provide access to HiveQL-specific features such as multi-table insert.
Please refer to the Hive Language Manual for up-to-date information on HiveQL.
The SQL Connector
The SQL Connector feature translates standard SQL-92 queries into equivalent HiveQL queries. The SQL Connector performs syntactical translations and structural transformations. For example:
- Quoted Identifiers: When quoting identifiers, HiveQL uses back quotes (
`
), while SQL uses double quotes ('
). Even when a driver reports the back quote as the quote character, some applications still generate double-quoted identifiers. - Table Aliases: HiveQL does not support the AS keyword between a table reference and its alias.
- The
JOIN
,INNER JOIN
, andCROSS JOIN
SQL syntaxes are translated to the HiveQLJOIN
syntax. - SQL
TOP N
queries are transformed to HiveQLLIMIT
queries.