Roles and responsibilities
- Overall 5-7 years of experience in data engineering and transformation on Cloud
- 3+ Years of Very Strong Experience in Azure Data Engineering, Databricks
- Expertise in supporting/developing lakehouse workloads at enterprise level
- Experience in pyspark is required – developing and deploying the workloads to run on the Spark distributed computing
- Candidate must possess at least a Graduate or bachelor’s degree in Computer Science/Information Technology, Engineering (Computer/Telecommunication) or equivalent.
- Cloud deployment: Preferably Microsoft azure
- Experience in implementing the platform and application monitoring using Cloud native tools
-
1. Data Architecture & Design
- Strong understanding of database architectures, including relational, NoSQL, and data warehousing.
- Ability to design and implement data pipelines, data lakes, and data warehouses.
- Knowledge of ETL (Extract, Transform, Load) processes.
-
2. Programming & Scripting
- Languages: Expertise in Python, Java, Scala, and SQL.
- Familiarity with data manipulation libraries like Pandas, Dask, and NumPy (for Python).
- Ability to write efficient queries and scripts for data processing and analysis.
-
3. Big Data Technologies
- Proficiency in big data frameworks like Hadoop, Spark, and Flink.
- Experience with distributed computing systems and handling large datasets.
-
4. Cloud Platforms
- Knowledge of cloud platforms such as AWS, Google Cloud Platform (GCP), and Microsoft Azure.
- Experience with cloud-based data solutions like Amazon Redshift, Google BigQuery, or Azure Data Lake.
-
5. Data Modeling
- Expertise in data modeling techniques, such as dimensional modeling, ER modeling, and star/snowflake schemas.
- Ability to design schema structures that support business intelligence and analytical needs.
-
6. Data Pipeline Management
- Experience in creating and maintaining data pipelines for large-scale data processing.
- Familiarity with tools like Apache Airflow, Luigi, or dbt for orchestrating workflows.
-
7. Data Warehousing & SQL Optimization
- Expertise in designing and optimizing data warehousing solutions.
- Strong skills in SQL tuning, optimization, and performance improvement for handling large datasets.
-
8. Data Governance & Security
- Understanding of data governance practices, such as data privacy, metadata management, and ensuring data quality.
- Ability to enforce data security standards and practices for data protection.
-
9. Team Leadership & Project Management
- Ability to lead a team of data engineers, collaborate with cross-functional teams, and manage data engineering projects.
- Familiarity with project management frameworks like Agile and Scrum.
-
10. Automation & CI/CD
- Experience in automating data workflows and deployment pipelines.
- Knowledge of CI/CD tools (e.g., Jenkins, GitLab) for version control and automation.
-
11. Machine Learning Integration (Optional)
- While not strictly necessary for all data engineering roles, experience integrating data pipelines with machine learning models or supporting the deployment of models can be valuable.
Desired candidate profile
1. Data Architecture & Design
- Strong understanding of database architectures, including relational, NoSQL, and data warehousing.
- Ability to design and implement data pipelines, data lakes, and data warehouses.
- Knowledge of ETL (Extract, Transform, Load) processes.
2. Programming & Scripting
- Languages: Expertise in Python, Java, Scala, and SQL.
- Familiarity with data manipulation libraries like Pandas, Dask, and NumPy (for Python).
- Ability to write efficient queries and scripts for data processing and analysis.
3. Big Data Technologies
- Proficiency in big data frameworks like Hadoop, Spark, and Flink.
- Experience with distributed computing systems and handling large datasets.
4. Cloud Platforms
- Knowledge of cloud platforms such as AWS, Google Cloud Platform (GCP), and Microsoft Azure.
- Experience with cloud-based data solutions like Amazon Redshift, Google BigQuery, or Azure Data Lake.
5. Data Modeling
- Expertise in data modeling techniques, such as dimensional modeling, ER modeling, and star/snowflake schemas.
- Ability to design schema structures that support business intelligence and analytical needs.
6. Data Pipeline Management
- Experience in creating and maintaining data pipelines for large-scale data processing.
- Familiarity with tools like Apache Airflow, Luigi, or dbt for orchestrating workflows.
7. Data Warehousing & SQL Optimization
- Expertise in designing and optimizing data warehousing solutions.
- Strong skills in SQL tuning, optimization, and performance improvement for handling large datasets.
8. Data Governance & Security
- Understanding of data governance practices, such as data privacy, metadata management, and ensuring data quality.
- Ability to enforce data security standards and practices for data protection.
9. Team Leadership & Project Management
- Ability to lead a team of data engineers, collaborate with cross-functional teams, and manage data engineering projects.
- Familiarity with project management frameworks like Agile and Scrum.
10. Automation & CI/CD
- Experience in automating data workflows and deployment pipelines.
- Knowledge of CI/CD tools (e.g., Jenkins, GitLab) for version control and automation.
11. Machine Learning Integration (Optional)
- While not strictly necessary for all data engineering roles, experience integrating data pipelines with machine learning models or supporting the deployment of models can be valuable.