Build, implement and optimize highly scalable batch and streaming data ingress/egress pipelines
Collaborate with the companys Analytics / Machine Learning teams to build and implement data pipelines to feed machine learning algorithms within Companys Hadoop platform
Develop tools and automate processes to aid in data collection, analysis and monitoring
Collaborate with Product Engineering and Platform teams to make architecture design and implementation decisions atop the companys Hadoop platform
Provide engineering, installation, configuration, maintenance and support in a highly transactional 24x7 environment
Perform a mix of incident management and project work focused on automation, increasing scale and optimization of processes/system performance
Monitor platform and applications and take corrective action to prevent or minimize system downtime on Hadoop platform
Recommend best practices, and implementation strategies using Hadoop, Java, ETL tools.
Assist Hadoop admins with incident global management/resolution - light on-call rotation
Work with the global Engineering team to assess current platform configuration and make
recommendations to achieve optimal performance and horizontal scalability within Hadoop ecosystem
Lead periodic provisioning of environments, code deployments and maintenance patching in Hadoop environments globally
Collaborate with the Product Engineering team to develop proof-of-concept solutions as well as custom solutions for customers that leverage the companys cloud-hosted Engagement Data platform
Requirements
Bachelors Degree in Computer Science or related field
2-4 Years hands-on experience working with data at scale
Self-starters who collaborate well with others and take ownership of their projects.
Experience and desire to work in a Global delivery environment
Hands-on experience working within the Hadoop ecosystem (Spark, Hive, HBase, Storm) ideally in a cloud environment (AWS, GCP or Azure).
Experience with optimizing SQL/Hive queries for maximum throughput
SQL/noSQL technology. Familiar with Databases like Oracle, SQL Server, MySQL, MongoDB, Redis etc. Ideally in a cloud setting (ie. AWS RDS)
Experience operating web-scale deployments of distributed systems, e.g. Kafka, Flink, Storm, Cassandra, Kubernetes or Elasticsearch
Experience with data warehouses and building ETL workflows / data pipelines
Data application/platform instrumentation, measurement, log data processing, and monitoring.
Fluency in Python, Java, Scala, or a similar languagefamiliarity with more than one is a plus
Mastery of Unix/Linux system and shell scripting
Experience with orc, parquet, avro and other data formats
Excellent communication skills, both verbal and written
Nice to have
Strong DevOps mindset and skill set
Experience working with HDP Hive Interactive (LLAP)
Experience in performance tuning for TEZ and Spark.
Experience building data pipelines (ie. Apache Airflow, Spark Data Pipelines)
Experience doing light data science in a Hadoop/Cloud setting (visualizations, clustering, classification, regression) to help predict and pro-actively optimize performance
Experience creating dashboards with analytics tools like Looker, Tableau.
Experience with Graph databases (ie. Neo4J, AWS Neptune, Cayley, Gremlin)
Experience with Microsofts SQL Server Integration Services (SSIS) or similar tools.
Experience working with AWS services (ie. S3, EC2, Kinesis, Lambda, etc)
Amazon Web Services, Google Cloud Platform or Microsoft Azure certifications
This is a remote position.
Similar Jobs
LinkedIn-PH -
7 hours ago
35
total
views, 1 today
Similar Jobs
Never Miss a Job Opportunity
Create a job alert and get an email with job matches to your desired Job Function
You will receive the email for your email confirmation. Please check!