Syncsort Looks To Improve Hadoop for Big Data Performance, Data Integration

Syncsort, a provider of data integration and data protection solutions, wants to contribute an external sort “plug-in” to Hadoop’s open source community to open up Hadoop to more users, a Syncsort exec told IDN. The company also released a commercial product optimized for Hadoop.

Tags: Hadoop, Syncsort, big data, MapReduce, data integration,

Syncsort Looks To Improve Hadoop for Big Data PerformanceSyncsort, a provider of data integration and data protection solutions, wants to contribute an external sort “plug-in” to Hadoop’s open source community to open up Hadoop to more users, a Syncsort exec told IDN. The company also released a commercial product optimized for Hadoop.

IDN spoke with spoke with Jorge Lopez, Syncsort’s senior manager for data integration to  learn more about the company’s latest Hadoop offerings.

“Syncsort’s plug-in contribution is all about enhancing the Hadoop sort framework. It is intended to benefit anyone that uses Hadoop and is interested in easily extending its sort capabilities,” Lopez told IDN. “Essentially, we are proposing changes to the Apache Hadoop distribution that will modularize the sort framework so that everyone downloading the Apache Hadoop distribution would automatically benefit from the contribution.”

Lopez is clear about Syncsort’s reasons for offering the technology. “Previously, there was no simple way to interface with the Hadoop sort framework and this is what we are trying to change by ‘opening up’ the sort capability.”

Syncsort’s open source contribution will not provide data federation capabilities, and the contribution does not include Syncsort’s DMExpress intellectual property (engine, algorithms, etc.), which will remain proprietary, Lopez added.

Inside Syncsort’s DMExpress Hadoop Edition
Aside from the Hadoop open source contribution, Syncsort is also shipping an update to its DMExpress technologies, optimized for Hadoop.

DMExpress Hadoop Edition, based on the company’s established data integration acceleration software, includes Hadoop Distributed File System (HDFS) connectivity and the ability to create jobs using the DMExpress’ GUI and run them in MapReduce. The Hadoop accelerator, which makes use of Syncsort’s proposed “plug-in” contribution to the Hadoop community, will seamlessly improve the performance of MapReduce jobs through sort. It will also invoke high performance compression as needed to deliver significant storage savings.

As a result, users can improve the performance in MapReduce by shifting data transformations to the DMExpress engine using Syncsort’s Hadoop self-tuning accelerator technology. It also simplifies ongoing development and maintenance, Lopez added. 
 
Syncsort’s approach to DMExpress Hadoop Edition followed several customer requests and requirements to help facilitate Hadoop adoption, Lopez told IDN.  

“First, there is a need to further reduce the costs of scaling. Second, there is a need to reduce the time and skills needed to code MapReduce jobs and fine-tune the system for optimal performance, he said. “Third, there is the need to be able to easily add or extend Hadoop’s capabilities. All of this [offers] a faster, more efficient and self-tuning alternative.”

“We’ve found that ‘Big Data’ is not exclusive to big corporations.”

Jorge Lopez
Senior Manager - Data Integration
Syncsort


DMExpress Hadoop Edition will leverage Syncsort’s proposed contribution to the open source community to seamlessly accelerate sort operations in all MapReduce jobs, Lopez said. “Additionally, developers can create MapReduce jobs using the DMExpress graphical user interface and shift those jobs to the DMExpress engine for even faster performance,” he added.

Where Syncsort Sees Hadoop’s Early Adopters
We asked Lopez where he is seeing interest is using Hadoop and Big Data or BI projects.

“When people think about ‘Big Data,’ they often picture big corporations with thousands of employees and billions of dollars in revenue. However, we’ve found that ‘Big Data’ is not exclusive to big corporations,” Lopez said. “While Syncsort has a footprint in large telecom, financial and insurance firms, the company is also meeting success with smaller companies looking to leverage large amounts of data in innovative ways.”

One example he mentioned was comScore, a Syncsort client in the business of providing data, analytics and on-demand software solutions for the measurement of online ads and audiences.

Despite the popular notion that business drives BI and Big Data projects, IT professionals are playing a really pivotal role when it comes to Hadoop adoption, Lopez told IDN. “When working with customers, we often find ourselves working on projects with the CTO, VP or director of development as well as data architects.”

Syncsort’s plug-in is based on its DMExpress, an ETL tool to boost performance by processes data from all major databases, including Oracle, Sybase, DB2, Informix, Microsoft SQL Server and Teradata. That said, the new plug-in is agnostic and does not require that users adopt DMExpress. 

DMExpress Hadoop Edition, now in limited beta, will be generally available later this year.

 


back

Share
Go