April 02, 2021
Use Case: fusilli.IO as a DataOps Solution for Telco Companies
by Fabrizio Rocco | 6 min read
April 02, 2021
by Fabrizio Rocco | 6 min read
The client, a mobile and fixed telco operator with customers in four continents, has developed a low-latency Service Delivery Platform (SDP) based on microservices in order to embrace digital transformation. In the pursuit of improving customer experience, data team ingested 24 Tb of data daily from legacy systems, struggling to resolve a bottleneck due to lack of reusability and automation. fusilli.IO allowed users to build, schedule and monitor robust reusable pipelines with zero coding swiftly, regaining agility through streamlined processes and standardization.
The use case setup included the metadata catalog’s population, the configuration of a codeless ingestion pipeline (from source to target) and the execution of the pipeline instance. Our cloud lab configuration is clustered on Kubernetes 1.14.
The first stage of the use case deals with the Data Explorer population with metadata from legacy and SDP. Client’s legacy systems, containing data such as Billing, CRM, Finance, ERP, and more, were connected to fusilli.IO through a set of ready-to-use connectors in minutes. In this case, we used MongoDB connector for batch data ingestion, and Kafka and PostgreSQL for streaming data. The ingested data were written in target CSV files.
Once source and target data cores connected, a standard fusilli.IO template was used to populate the Data Explorer with understandable and enriched metadata.
Data Consumers explore the metadata in the Data Explorer but the actual data can be obtained only after a Data Manager approves their data request and under proper anonymisation policies. fusilli.IO also offers Data Manager and Data Consumers a possibility to communicate and collaborate.
The second stage requires the configuration of a point-and-click ingestion pipeline. Pipeline designer contains a sequence of jobs to be configured allowing for some embedded data transformation steps before ingestion of data from a source into a target. In this use case, we build a complex pipeline leveraging a Join type of job.
A pipeline can be created in Pipeline designer either ad-hoc or in association with an approved Data Request. In our case, we created a stream pipeline linked to an approved data request. fusilli.IO allows users to create, reuse, log and monitor pipelines in an automated fashion leaving more time to focus on innovation rather than manual routines.
How it works:
Once the data is obtained by the source, a first job joins data from MongoDB. All successful results are then written on the first CSV output file (By configuring the output file is possible to change the repository output).
In case no matching results found in the primary Join job, a new Join is executed. This time using data from a legacy DB to execute the Join (Legacy DB is not a MongoDB but a SQL DB). All successful results are then written on the second CSV output file (By configuring the output file is possible to change the repository output).
Finally, in case no matched data is found, a final output file will be produced to report original data ingested not matched.
The final step of the use case is the execution of the pipeline previously created.
Every run of the pipeline, that fusilli.IO calls ‘Instance’, is documented in detail in order to keep track of Data Operations. In fusilli.IO, pipelines are saved as reusable templates and can be automatically scheduled to run on a specific day or on a regular basis.
The Product fusilli.IO has a very wide spectrum of features aimed to empower Fast data teams with Data Governance and Data lifecycle management:
fusilli.IO orchestrates secure data management by defining different user roles with specific permissions.
There is also the possibility to add customized User profiles that can be created or edited through a permission management pane that lists over 60 atomised permissions.
Check our article about Team Collaboration on fusilli.IO’s blog.
fusilli.IO connects to different systems and environments through a set of prebuilt or developed ad-hoc connectors designed for applications, databases, file stores and data warehouses.
The image below lists the various data connections methods available in fusilli.io.
fusilli.IO allows users to build robust reusable pipelines with zero coding swiftly. This leaves more time to focus on innovation rather than manual routines.
Data Pipelines design is codeless and does not require any technical skills, yet it allows most common data preparation and transformation tasks, which are called Jobs.
Jobs are the data preparation tasks that are executed before delivering data into a target Data Core.
Common jobs range from Match, Join, Split, Concatenate to many others.
fusilli.IO keeps all metadata in a single place, so data stakeholders are aligned on what data is available, where it comes from, and what it means.
Users have two alternative modes to explore the metadata contained in the data sources. One is the Data Catalog that provides a positional exploration, the other is the Business Glossary, that makes exploration easy to non-technical users.
fusilli.IO delivers DataOps across Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure providing the following benefits:
According to the customer requirements, fusilli.IO can be deployed taking advantage of other architectural solutions, as well:
fusilli.IO is divided into manager and portal.
The manager is the open source component upon which fusilli.IO is built.
It enables independent orchestration and engine for both batch and streaming. It is built using modern technologies like Scala, AKKA and deployed on Kubernetes.
fusilli.IO contextualizes data at scale in real-time, enabling data engineers, data stewards and data analysts to make better decisions and improve team collaboration.