Hadoop: Apache Impala 4.0 with extended multithreading

Published by: MRT

Published on:

Hadoop: Apache Impala 4.0 with extended multithreading

The Apache Software Foundation (ASF) has presented a new major release of the Impala query engine, which is tailored to Hadoop, with numerous bug fixes, but also improvements and new functions. Apache Impala uses the same metadata and the same SQL syntax as Apache Hive and, with the new version 4.0, now provides users with extended options for multithreading. In addition, the update brings some fundamental changes in authentication and authorization – for example the departure from Sentry in favor of Ranger.

For analytical queries of data stored in HDFS (Hadoop Distributed File System), Kudu or in the cloud, Impala offers a different degree of parallelism, which can be achieved with the option MT_DOP for all those operations that can benefit from multithreaded execution. Previously, however, this option was limited to queries that only included scans and aggregates. From version 4.0 MT_DOP now available for all queries.

After the original Impala developer Cloudera had already announced in the course of its merger with Hortonworks that it would phase out its own Sentry project in favor of the Apache Ranger contributed by Hortonworks for the purposes of authorization and auditing, Impala 4.0 is now taking the final step: Support for Sentry is completely eliminated. Although Ranger was not compatible with Impala at the time, the greater range of functions and the strategically more promising integration with Hadoop components were decisive.

In the future, Ranger will not only be the standard tool for authorization, which is important when masking personal data in compliance with GDPR, but also contributes to the integration of Apache Knox. As a gateway, Knox provides a single central authentication and access point for Hadoop services in the cluster by encapsulating Kerberos. The stateless reverse proxy framework can intercept REST / HTTP calls on the one hand and forward requests to Hadoop’s REST APIs on the other.

Impala 4.0 also meets the compliance requirements of FIPS (Federal Information Processing Standard Publication) and understands the Security Assertion Markup Language (SAML), which is an XML framework that regulates the exchange of authentication and authorization information.

A complete overview of all new features and improvements in Apache Impala 4.0 can be found in den Release Notes as im Changelog.


Article Source

Disclaimer: This article is generated from the feed and not edited by our team.