Roles and responsibilities
Experience: 5-8 Years
Summary: Experienced Informatica PC and IDMC Developer with 5+ years in data engineering. Skilled in designing end-to-end data pipelines using PC/IDMC for effective migration and transformation across cloud platforms. The ideal candidate will have in-depth knowledge of Informatica PowerCenter and IDMC. A strong foundation in Data Warehousing concepts and proficiency in Snowflake and SQL is essential. Strong team player with agile experience, delivering timely, high-impact data solutions.
Technical Skills
- Tools: Informatica Cloud Data Integration, Informatica PowerCenter
- Data Warehousing: Snowflake, DataLake
- Programming: SQL, Python, Shell Scripting
- Data Management: Storage management, quality monitoring, governance
- Modeling: Dimensional modeling, star/snowflake schema
Core Competencies
- Design, develop, and optimize ETL workflows using Informatica PowerCenter (PC) and Informatica IDMC.
- Manage data ingestion processes from diverse data sources such as Salesforce, Oracle databases, PostgreSQL, and MySQL.
- Implement and maintain ETL processes and data pipelines to ensure efficient data extraction, transformation, and loading.
- Utilize Snowflake as the data warehouse solution for managing large volumes of structured and unstructured data.
- Maintain and optimize ETL jobs for performance and reliability, ensuring timely data availability for business users.
- Support data migration, data integration, and data consolidation efforts.
- Write and maintain basic Python scripts for data processing and automation tasks.
- Utilize Unix shell commands for data-related tasks and system management.
- Troubleshoot and resolve ETL-related issues, ensuring data integrity and availability.
- Ensure adherence to best practices for data governance and security.
Professional Experience
- Informatica Developer
- Developed ELT processes using PC/IDMC to integrate data into Snowflake.
- Implemented storage management for Azure Blob and Snowflake, enhancing data security.
- Worked on basis Python and shell scripting languages
Desired candidate profile
-
ETL Process Design and Development:
- ETL Pipeline Creation: Designing and developing ETL (Extract, Transform, Load) workflows using Informatica PowerCenter, Informatica Cloud Data Integration, or Informatica Intelligent Cloud Services (IICS).
- Data Transformation: Creating transformations to cleanse, aggregate, and manipulate data as it moves through the ETL pipeline to ensure data quality and consistency.
- Data Loading: Ensuring that the transformed data is loaded into appropriate destinations such as data warehouses, data lakes, or databases.
-
Data Integration:
- Source System Integration: Integrating various data sources, such as relational databases, flat files, APIs, cloud platforms, and on-premise systems.
- Data Migration: Moving data from legacy systems to modern data architectures using Informatica’s integration capabilities.
- Real-time Data Integration: Implementing real-time data integration using Informatica’s features such as Change Data Capture (CDC) or Informatica Data Streaming for near-instantaneous updates.
-
Data Quality and Cleansing:
- Data Profiling: Analyzing and profiling source data to identify quality issues, missing values, and inconsistencies that need to be addressed before loading into the destination system.
- Data Transformation and Cleansing: Ensuring that data is transformed and cleaned according to business rules, such as removing duplicates, standardizing formats, and applying business logic.
- Data Validation: Validating that data in the destination is accurate and consistent with the source data and business requirements.
-
Performance Optimization:
- Query Optimization: Optimizing ETL jobs for performance, ensuring that the processes are running efficiently and within required time limits.
- Parallel Processing: Implementing parallel processing in Informatica to improve performance during large-scale data transfers and transformations.
- Memory Management: Ensuring proper memory usage and optimizing workflows to minimize bottlenecks in the ETL process.
-
Data Architecture and Design:
- Data Modeling: Designing and developing logical and physical data models for efficient data storage, ensuring optimal schema design in data warehouses or data lakes.
- Data Lineage: Implementing data lineage tracking in the ETL pipelines to ensure transparency and traceability of data flow from sources to destinations.
-
Collaboration with Stakeholders:
- Working with Data Scientists and Analysts: Collaborating with data scientists, business analysts, and other stakeholders to ensure that the right data is extracted and transformed to support analytical use cases.
- Requirements Gathering: Gathering requirements from business users and technical teams to ensure the ETL process meets their needs and supports various reporting or analytics initiatives.
-
Troubleshooting and Support:
- Debugging ETL Jobs: Identifying and resolving issues with ETL jobs, such as data mismatches, failures, or performance problems.
- Error Handling and Logging: Implementing robust error handling mechanisms in ETL processes to ensure smooth operations and quick recovery from failures.
- System Monitoring: Monitoring the performance and health of ETL jobs and making adjustments as necessary.