Roles and responsibilities
- Design and Develop Data Pipelines: Create and maintain robust, scalable data pipelines using Azure Data Factory, Databricks, and other related tools.
- Data Integration: Work on integrating structured and unstructured data from various data sources into a unified data warehouse using Azure SQL Database, Azure Data Lake, and Synapse Analytics.
- Salesforce Integration: Leverage knowledge of Salesforce backend and APIs to integrate Salesforce data with Azure data services.
- Optimization: Optimize data flows for performance, reliability, and scalability, ensuring data quality and consistency.
- Monitoring and Troubleshooting: Monitor data pipelines, troubleshoot issues, and implement solutions to ensure the seamless operation of data processing workflows.
- Collaboration: Collaborate with data analysts, and other stakeholders to understand their data requirements and translate them into technical solutions.
- Security and Compliance: Implement security best practices and ensure compliance with organizational policies and industry standards.
- Documentation: Create and maintain documentation for data architectures, processes, and best practices.
Required Skills & Qualifications
Experience: 3 to 5 years of experience in data engineering, with a strong focus on Azure technologies.
Technical Skills
- Proficiency in Azure Data Factory, Azure SQL Database, Azure Data Lake, and Synapse Analytics.
- Experience with big data processing tools such as Azure Databricks or HDInsight.
- Strong SQL and T-SQL skills.
- Experience with data modeling, ETL processes, and data warehousing concepts.
- Familiarity with programming languages like Python, Scala, or Java.
- Knowledge of data security and compliance practices within the Azure ecosystem.
Salesforce: Basic to intermediate knowledge of Salesforce backend and API integration.
Problem-Solving: Strong analytical and problem-solving skills, with the ability to troubleshoot complex data issues.
Communication: Excellent communication skills, both written and verbal, with the ability to collaborate effectively with cross-functional teams.
Preferred Qualifications
- Azure Data Engineer certification (e.g., Microsoft Certified: Azure Data Engineer Associate) is a plus.
- Experience with CI/CD pipelines for data engineering workflows.
- Familiarity with Power BI for data visualization.
- Salesforce Expertise: Advanced knowledge or experience in Salesforce data models and APIs is an added advantage.
Desired candidate profile
1. Data Pipeline Development and Management:
- Design and Build Data Pipelines: Create and manage ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) pipelines in Azure Data Factory or other Azure tools to move data from source systems to Azure storage or databases.
- Data Integration: Integrate data from various sources (e.g., on-premises databases, cloud services, APIs) into Azure. This includes connecting different data types, such as structured, semi-structured, and unstructured data.
- Automation and Orchestration: Automate data workflows and schedules to ensure that data processing tasks run smoothly and on time.
2. Data Storage Management:
- Azure Data Lake Storage: Manage and optimize Azure Data Lake storage for large-scale, unstructured, and structured data storage, ensuring high availability and security.
- Azure Blob Storage: Utilize Azure Blob Storage for storing large amounts of unstructured data, such as images, videos, and logs.
- Azure SQL Data Warehouse: Design and manage Azure Synapse Analytics (formerly SQL Data Warehouse) for data warehousing solutions that support large-scale analytics.
- NoSQL Databases: Work with Azure Cosmos DB or other NoSQL databases for handling unstructured or semi-structured data at scale.
- Azure SQL Database: Manage relational databases using Azure SQL Database for structured data storage and management.
3. Data Transformation and Cleaning:
- Data Processing: Use Azure Databricks, Azure HDInsight, and Spark to perform data transformation, cleaning, and enrichment at scale.
- Data Quality: Implement data validation rules and processes to ensure that the data is accurate, clean, and consistent before it is loaded into storage systems or databases.
4. Performance Optimization:
- Query Optimization: Optimize data queries for performance, especially in large datasets, by using Azure Synapse or Azure SQL Database optimization techniques.
- Scalability and Efficiency: Scale data storage and processing pipelines based on workload requirements and optimize data access for faster retrieval times.
5. Data Security and Compliance:
- Data Encryption: Implement encryption for data at rest and in transit using Azure Security Center and Azure Key Vault for managing encryption keys.
- Data Governance: Ensure that data is handled in compliance with organizational standards and external regulations, such as GDPR, HIPAA, or CCPA.
- Access Control: Manage access controls and permissions for data and storage accounts using Azure Active Directory and Role-Based Access Control (RBAC).
6. Monitoring and Maintenance:
- Monitoring and Alerts: Set up monitoring and alerts to track the health and performance of data pipelines, storage accounts, and other Azure resources using Azure Monitor and Azure Log Analytics.
- Data Pipeline Troubleshooting: Identify and resolve issues related to data flow, performance, or system failures in real-time.