Jun 29, 2017

Our mid-term roadmap (2017H2-2018H1)

We're getting quite a few feature requests recently (which is a great thing!) so it looks like it would be useful to shed some light on our future development plans. We do publish our short term goals on the download page. However, our mid-term roadmap has never been disclosed. So here is what we would like to do in the next 12 months:

Persistent editable datasets in projects

This is supposed to be a very significant change to what EasyMorph is. At this point no data is stored in EasyMorph projects -- they load it from files and databases and output processed data into other files and databases. Once processing is done no data is saved in project. We're going to change that. Similarly to spreadsheets, EasyMorph will be able to have both data AND calculation logic in one file. From a technical standpoint it will be implemented as follows:

  • Sandboxes will be able to keep datasets permanently -- i.e. data in sandboxes will be stored in projects and survive re-opening projects.
  • Sandboxes will be editable -- it will be possible to edit data in sandboxes like in spreadsheets (but without formulas).
  • All import transformations will get the ability to remember (cache) the last loaded dataset, which means that a project can be sent to someone without access to the project's data sources.
While it might be not immediately obvious what the benefits are, they are actually quite significant:
  • Portability -- no need to keep mapping tables and reference tables in external files. They can be stored and edited right in project.
  • Shareability -- EM projects become a means to share data. Like with spreadsheets, it allows sending not just some calculation results to someone, but also the source data and the logic that produced the results.
  • Versatility -- the ability to edit datasets right in EasyMorph means that you can mix manually entered data with calculated data. For instance, edit planned numbers and compare them with actuals in one project. Or update planned numbers based on some actual data. Or perform what-if analysis for different scenarios.
The change would effectively make EasyMorph a hybrid between a spreadsheet and an ETL, or a relational spreadsheet, if you will.

Querybooks

We're going to keep improving queries which will transform into querybooks. A querybook is a set of queries that answer particular questions (one query per tab). Querybooks will be portable, meaning that a querybook can be sent to another user with all necessary data connectors embedded for the queries to work.

Queries themselves will become multi-table, supporting joins and unions (won't be available in the free edition).

Application data sources

So far EasyMorph has been able to import data from databases, files and spreadsheets. However, a new category of data sources will be added -- applications, including cloud applications like Google Analytics, Twitter, Wikipedia, etc.

Application data source won't be available in the free edition.

Financial and statistical transformations

With the release of version 3.6 EasyMorph has 80+ transformations that cover almost all possible basic calculations. But we're not done yet! We will be adding special transformations for financial calculations (payment schedules, compound interest rates, bond yields, etc.) and statistical calculations (e.g. variances, deviations, clustering) with elements of predictive analytics (e.g. linear regressions, extrapolations).

Visual conditions and business rules

Visual conditions that were first introduced for queries in v3.5 will make their way into transformations with conditions that currently use expressions -- Rule, Filter by condition, Trim by condition, Halt and a few more transformations.

PS. Lots of things are planned for EasyMorph Server, but that's a subject for another post.
PPS. All written above is subject to change without notice.

May 21, 2017

Meet EasyMorph Server

Today we're releasing EasyMorph Server -- a new product in EasyMorph family. Here I'd like to talk more about why we've made it and what to expect in the future.

Why

The need for a server edition appeared pretty quickly once our customers started performing more transformations on larger datasets. Basically, it was demanded by the switch of usage pattern from personal to team use:

  • Multiple projects run by multiple users
  • Data volumes required more RAM and CPU power than a typical personal computer has
  • Scheduled long-running transformations
  • Shared source files and data sources

What's this

EasyMorph Server is a Windows service with a web console and the same columnar in-memory data transformation engine which is employed by desktop editions of EasyMorph. It runs on schedule projects created in any desktop edition (including the free edition, yay!). Project parameters can be assigned (overridden) in Server tasks. Project logs can be viewed right in the web console and are updated in real-time.

Task list (click to zoom).


The Server engine performs slightly better than the engine of desktop editions. It runs 10-20% faster and utilizes 10-30% less memory due to server-specific optimizations. For instance, unlike the desktop editions, Server doesn't store some intermediate results since there is no need to display them in UI.

EasyMorph Server is designed for analytical teams with limited or no IT support. Therefore, it's very easy to install and maintain. The installer package also includes EasyMorph Server Monitor -- a simple utility to manage the service: start it, stop it, view the server log, and change HTTP port for that rare case when the default one is not available.

Documentation

While the Server doesn't allow creating or editing projects (projects are designed in desktop editions) it still can be helpful for business users that only run projects (created by someone else) because they can view auto-generated documentation on data transformation logic in projects, and understand what happens under the hood when they trigger a data transformation task. For instance, they can see expressions used to calculate columns, filtering conditions, data sources, parameters, etc.

Since the documentation has a permanent URL it can be accessed (opened) from external applications such as Tableau dashboards or Qlik applications.

Integration with Tableau dashboard through Actions.

API

Right from the beginning EasyMorph Server is built around a public RESTful API that can be used by external applications in order to create, modify and trigger Server tasks programmatically. The API enables tight integration with 3rd party tools.

Customization / White-labeling

The Server's UI is customizable. You can change colors and styles, use your own logo and modify links in the web console's header to match your corporate style. Even more than that -- if you're developing and selling an analytical application you can use EasyMorph Server as native component of your application under an OEM license. Note that desktop EasyMorph (required for creating/editing projects) is not customizable.


What's next

The version 1.0 is, as it's supposed to be for a very first release, somewhat basic :) But it doesn't mean it's going to stay like that -- we have a lot of things on the roadmap:

Soon, we will be introducing events -- a means to arrange task cascading and non-schedule triggers (e.g. running a task when a file appears/changes). Private groups will allow restricting access to particular tasks. Email integration will appear in one of the future releases and will make automated receiving and processing email attachments possible. Impact tracking will allow logging and analyzing what project created, consumed or modified what file or database table.

Licensing
EasyMorph Server is licensed per server with no limit on users which means that any number of users can access the Server, create and run tasks, and view documentation on them. Together with the ability to run projects created in the free desktop edition this makes EasyMorph Server one of the most affordable department-level ETL systems on the market. More on the licensing on our web-site: http://easymorph.com/server.html.

Dmitry Gudkov

Apr 25, 2017

Version 3.5 is out

It's happened, finally we're releasing version 3.5! It took us 2 months longer to release it than it was originally scheduled. The visual condition builder turned out to be a more complex task than I initially assumed. Databases deal with dates, non-integer numbers and Boolean values sometimes in very different ways with lots of nuances. The fact that EasyMorph operates with decimals instead of floats, and uses so called OLE Automation dates (same as in Excel) made things even more complicated. But now all the issues have been ironed out (at least those that we're aware of currently). Here is what's new and exciting in version 3.5:

Visual Condition Builder

This is the headliner feature of this version. Now it is possible to design filtering conditions in queries visually. Just drag a field from the list on the left or the preview datagrid, onto the filtering area.

Using a custom WHERE clauses is still possible.



For those who prefer writing SQL queries manually the Query Editor helps saving time -- design a draft query visually, then switch to Custom SQL and the visual query will be automatically converted to a SQL query which you can further modify and run right in EasyMorph.

In future releases, visual conditions won't be limited to queries. We are planning to introduce them in EasyMorph transformations as well in addition to formula-based conditions, where applicable.

Delete / update database rows

In the previous release (v3.3) we introduced the possibility to export (insert) rows into database tables. With this release we're adding two more transformations that allow deleting particular database rows, thus enabling delete/update workflows:
  • "Delete database rows" transformation deletes rows specified by a filtering condition of query (column selections are ignored). Using a query in this transformation provides extra convenience as it allows previewing rows which will be actually deleted, before deleting them.
  • "Delete matching database rows" transformation deletes database rows where key fields match key fields in specified EasyMorph table. Using this transformation you can calculate a list of particular IDs and then delete only database rows with these IDs.


Transformations for large data volumes

EasyMorph keeps all datasets in memory and this provides a multitude of benefits such as the ability to instantly see results of any transformation, reactive calculation, and re-calculating only transformations affected by a change. At the same time, this approach has a downside – if a dataset doesn't fit RAM entirely it would require using one of two special techniques to process it: partitioning and pre-filtering. Partitioning means that we split the source dataset into smaller parts of the same structure, which are processed one by one (typically using iterations). Pre-filtering means that instead of loading entire dataset and then filtering it, we only load a few fields that are necessary for filtering as well as IDs (primary of foreign keys). Filtering produces a subset of IDs, and for those IDs full records are fetched from the source dataset.
  • "File splitter" transformation allows splitting (partitioning) large text files into smaller ones without loading them into EasyMorph. The smaller files can later be processed in EasyMorph one by one a usual way, typically using iterations.
  • "Select matching database rows" transformation loads only database records which ID exists in an EasyMorph table. This transformation can be used for arranging scenarios with pre-filtering described above.
These two transformations allow EasyMorph users to work with tables larger than RAM -- tens and even hundreds of gigabytes.

Other new transformations and changes to existing ones

  • New transformation: "Filter by search". Allows filtering text values by a substring, or using a regular expression.
  • "Match" transformation now has Fuzzy Matching mode. In this mode text values are matched if their edit distance (as per Damerau-Levenshtein) is within a specified limit. Fuzzy Matching helps match words with typos. For instance it would match "New York" with "New Yrok" or "New Yokr".
  • "Merge" transformation now has Full Join mode which is similar to SQL full join.
  • "Export to Excel" transformation now allows creating new sheets in existing spreadsheets.

New Filter pop-up


Starting from this release double-clicking a column header brings up the new Filter pop-up which allows viewing/searching unique values in the column as well as filtering (according transformation is created automatically). The pop-up has three modes:
  • Selection – explicitly select items that should be kept (removed). 
  • Search – only items that contain particular substring are kept (removed). 
  • Condition – use expression to filter items. Only items where the expression returns TRUE are kept. 
The Unique Values pop-up used in previous versions is discontinued.

New data sources

  • Added support for DB2 (LUW) SQL dialect in ODBC connections.
  • Added support for Vertica SQL dialect in ODBC connections.

What's next

In the future release (3.6) we will continue improving usability and closing remaining gaps in functionality. It will include importing fixed width text, loading multiple files without iterations, and regular expression matching. Version 3.6 is planned for release in June this. However, before that there will be another, no less significant event in May – we're releasing the first ever version of EasyMorph Server!

Feb 11, 2017

Visual Condition Builder

Here is a brief overview of the visual condition builder available starting from version 3.5. Besides making importing data to EasyMorph simpler it can be used for generating SQL for use in another applications.


Jan 3, 2017

How to apply several filters to two tables simultaneously

Sometimes, when performing reconciliation of two tables, it is necessary to apply filters to columns with the same names in both tables simultaneously in order to select identical subset of data. For instance filter exactly the same Year, Region and Product in both tables. Having two identical sets of filters is not so convenient, especially if you need to filter by several columns and select various subsets of data in order to compare them.

Here is how you can filter data in two tables using only one set of filters:

(click to zoom)

The idea is simple:
  1. Append both tables keeping track of data origin.
  2. Apply filters to the united dataset (filtered columns should have the same names in both datasets)
  3. Split the united dataset back into two tables. When filters for the united dataset change, the derived tables will update automatically.
This trick can work with more than two datasets, of course.