Following to the overview on Endeca today we will see how to create Clover ETL graph and populating data which will be used by MDEX engine for reporting (Studio)Endeca Information discovery helps organization to answer quickly on relevant data of both structured and Un structured it helps to search and discover and analysis
Information is loaded from multiple data source systems and stored in a faceted data model that dynamically supports changing data . Information discovery enables an iterative approach
Integration features a new ETL tool, The integrator (Clover ETL) that lets you extract source records from a variety of source types flat files to databases.
The oracle Endeca Server provides support for guided exploration search and analysis on structured,semi - structured/unstructured from inside/external sources
In structured data can be easily and directly loaded into a faceted model
Each row becomes a records
Each column becomes and attribute
Unstructured elements can be stored their own records for side by side analysisStudio is highly interactive ,component-based environment for building analytic application it is built on web-based technology that enable building analytic applications delivered through use of web browsers
Integrator has below components
Graph : A graph is a pipeline of components that processes the data. . The simplest graph has one Reader component to read in the source data and one of the integrator connectors to write the data to the Oracle Endeca Server
Edge : The line that connects components ports, carries data between components
Transformer / Joins : Used to transform the data coming from source data system , and joiner is used to join data between two tables (Dim - Fact)
Once Endeca Information discovery installed you can see below components in Start Menu
Start Endeca Server --> Integrator
LoadData.grf(Graph)In the
CloverETL perspective, there is a
Navigator pane on the left side of the window. In this pane, you can expand the project folder. After that, you will be presented with the folder structure. There are sub-folders for:
Purpose | Standard folder | Standard parameter | Parameter usage |
all connections | conn | CONN_DIR | ${CONN_DIR} |
input data | data-in | DATAIN_DIR | ${DATAIN_DIR} |
output data | data-out | DATAOUT_DIR | ${DATAOUT_DIR} |
temporary data | data-tmp | DATATMP_DIR | ${DATATMP_DIR} |
graphs | graph | GRAPH_DIR | ${GRAPH_DIR} |
lookup tables | lookup | LOOKUP_DIR | ${LOOKUP_DIR} |
metadata | meta | META_DIR | ${META_DIR} |
sequences | seq | SEQ_DIR | ${SEQ_DIR} |
transformation definitions (both source files and classes) | trans | TRANS_DIR | ${TRANS_DIR} |
Creating Project:
Start Endeca Server from start menu and integrator -->File -->New Project
We can Edit the CloverETL project sub directories names while creating the project
You can look at the workspace.prm
file by clicking this item in the Navigator pane, by right-clicking and choosing → from the context menu.
Default project parameters are stored in workspace.prm file located in project root
A new graph automatically links the workspace.prm
Metadata describes record structure and format
Where structure defines what fields form a record
Format defines how to serialize a record from a data set
New database connection
File --> New Project--> Clover ETL -- > Create Connection(Connection.cfg)
Data Store creation :
Copy the data store from Quick Start to our new project and add below entries
# Configuration parameters for running Endeca Data Store
ENDECA_SERVER_HOST=localhost
ENDECA_SERVER_PORT=7770
DATA_STORE_NAME=
quickstart Where quick start would be our project name in below example it is 'Project_01'
2) Copy the InitDatastore graph from QuickStart to newly created project/graph folder and run it once(only Create Data Store last one) remove others
Metadata import :
File --> New Project--> Clover ETL--> Metadata (Extract from database) and navigate to your project related folder under workspace view
After importing the metadata (samp_offices_d) it will create a file with an extension of .fmt you can change the data type by editing .fmt files as shown below
Creating Graph:
Right click on Graph folder --->New-->ETL Graph
Drag the palettes to the surface area and connect them by using
EDGE palette
Once the palette are connected using EDGE now we have to assign METADATA(DB connection) created in above steps as shown below
Once the Metadata is created you can edit the properties of the table column by changing the data type/add any new fields for calculation purpose and it will populate on MDEX engine
Click on finish next on Edge(Data Input) we need to assign db connection and query URL,
Right click on DBInput Table -->Rename -->Products
DBInput Table --> Rename --> Revenue
Now we have to join Revenue and Products tables based on PROD_KEY by using ExtHashJoin and select the join Type
Now we have to load the data coming from both revenue & products to a single table ?
No, MDEX will do in-memory calculations and it will store the data in tmp files(RAM) not on tables in order to achieve it transform the useful columns from SAMP_PRODUCTS_D & SAMP_REVENUE_F to a new table 'WC_REVENUE_PRODUCT' as shown below
Make sure
Transformation is valid select the METADATA for the EDGE from ExtHashJoin to Bulk Add/Replace Records
Edit the Bulk Add/Replace records palette and select the
Spec Attribute -->which should be unique and save the Graph and right click Run AS --> ETL Graph if everything is correct the Graph execution should be successful
Reports creation in another post