Back to Projects
IT Incident Log Analysis Pipeline
Analysis of IT incident logs generated by the ServiceNow™ platform used by an IT company. This project includes data extraction, transformation, and visualization using Azure Data Factory and Azure Databricks.
Dashboard
Tools Used
- Orchestraction:
- Azure Data Factory
- Azure Data Bricks
- Pyspark
- Spark SQL
- Azure CosmosDB
- Azure Storage Gen2
- Azure SQL
Steps
- Data Extraction from multiple sources.
- Loading Data into an Azure Cosmos Database.
- Data Cleaning and Transformations using PySpark in Azure Databricks.
- Data Analysis using Spark SQL and Databricks.
ADF Pipeline
Extraction Layer
Sources:
- Blob files in Azure Storage Gen2
- HTTP source from GitHub
- Relational data from Azure SQL Database
Transformation Layer
Transformations performed using Azure Databricks with PySpark:
- Data Cleaning and Reformatting:
- Removed columns with more than 30% null values (invalid for analysis).
- Removed duplicate rows.
- Reformatted datetime data into an appropriate format.
- Renamed columns for better understanding.
- Extracted month and year information from datetime fields.
Visualization Layer
- Visualizations performed using Azure SQL in Databricks.
- Analyses focused on SLA adherence and MTTR.