Position Type:
Permanent

Experience Required:
A minimum of 5 years of hands-on experience working with Databricks as a data engineer, data scientist, or similar role is typically expected.

Job Category:
IT

Job Location:
Bangalore (BLR) India

Education Required:
Bachelor's Degree: A bachelor's degree in computer science, information technology, data engineering, or a related field is often required. Some employers may consider equivalent work experience or relevant certifications.

Databricks Engineer

Position Overview

The job purpose of a Databricks Engineer is to be a key contributor in ensuring that an organization’s data is processed efficiently, securely, and accurately using the Databricks platform. They support data analytics, provide technical expertise, manage platform resources, and optimize data workflows to help the organization derive valuable insights and make informed decisions.

General Responsibilities

  • Administer and manage Databricks clusters, workspaces, and resources.
  • Monitor platform health, availability, and performance.
  • Develop and maintain data pipelines for ingesting, transforming, and loading data into Databricks.
  • Optimize ETL processes for efficiency and scalability.
  • Collaborate with data scientists and analysts to build and optimize data processing and analytics workflows using Databricks notebooks.
  • Develop and implement data transformation and data analysis solutions.
  • Implement and manage data lakes within Databricks for structured and unstructured data storage.
  • Ensure data lake security and access controls.
  • Identify and address performance bottlenecks in Databricks workloads.
  • Optimize queries, data pipelines, and resource allocation for efficient data processing.
  • Implement security measures to protect data within Databricks, including access controls, encryption, and auditing.
  • Ensure compliance with data privacy regulations and industry standards.
  • Integrate Databricks with various data sources, databases, and external systems to enable seamless data flow.
  • Ensure data source connectivity and data consistency.
  • Manage cluster scaling and resource allocation to meet performance and cost requirements.
  • Optimize cluster configurations for specific workloads.
  • Identify and resolve technical issues, errors, and anomalies in Databricks workflows.
  • Provide support to users encountering problems.
  • Collaborate with data engineers, data scientists, and analysts to understand data requirements and provide technical guidance.
  • Conduct training sessions and knowledge sharing to empower users with Databricks capabilities.
  • Create and maintain documentation for Databricks workflows, configurations, and best practices.
  • Promote and enforce coding and data engineering standards.
  • Monitor and optimize Databricks costs, including cluster utilization and resource allocation.
  • Ensure cost-efficient data processing.
  • Stay current with Databricks updates and enhancements.
  • Continuously improve Databricks workflows, processes, and infrastructure.
  • Establish data governance practices within Databricks, including metadata management and data cataloging.
  • Maintain data lineage and documentation for data assets.
  • Implement data quality checks and validation processes to ensure data accuracy and reliability.
  • Develop and enforce data quality standards.
  • Plan for scalability and resilience in Databricks infrastructure to accommodate growing data volumes and ensure high availability.
  • Additional duties as assigned.

Qualifications

  • Bachelor’s Degree: A bachelor’s degree in computer science, information technology, data engineering, or a related field is often required. Some employers may consider equivalent work experience or relevant certifications.
  •  A minimum of 5 years of hands-on experience working with Databricks as a data engineer, data scientist, or similar role is typically expected.
  • Strong experience in data engineering, including designing and developing data pipelines, ETL processes, and data transformations.
  • Familiarity with big data technologies and frameworks such as Apache Spark, Hadoop, or similar platforms commonly used in conjunction with Databricks.
  • Proficiency in programming languages commonly used in Databricks, such as Python, Scala, or SQL. Knowledge of Scala is often preferred.
  • Proven ability to identify and address performance bottlenecks in Databricks workloads, optimizing query performance, and resource allocation.
  • A commitment to staying updated with the latest Databricks updates, enhancements, and emerging technologies in the data engineering and analytics field.