Walmart

Data Engineer III

Dallas, TX US
API Azure Spark Scala GCP Kafka SQL Streaming
This job is closed! Check out or
Description

What you'll do...

Position: Data Engineer III

Job Location: 603 Munger Street, Dallas, TX 75202

Duties: Identifies possible options to address the business problems through relevant analytical methodologies. Demonstrates understanding of use cases and desired outcomes. Supports the development of business cases and recommendations. Drives delivery of project activity and tasks assigned by others. Supports process updates and changes. Supports, under guidance, in solving business issues. Utilizes knowledge of data value chains; data processes and practices; regulatory and ethical requirements around data; data modeling, storage, integration, and warehousing; data value chains (identification, ingestion, processing, storage, analysis, and utilization); data quality framework and metrics; regulatory and ethical requirements around data privacy, security, storage, retention, and documentation; business implications on data usage; data strategy; enterprise regulatory and ethical policies and strategies. Supports the documentation of data governance processes and support the implementation of data governance practices. Utilizes understanding of business value and relevance of data and data enabled insights/decisions; appropriate application and understanding of data ecosystem including data management, data quality standards and data governance, accessibility, storage, and scalability; understanding of the methods and applications that unlock the monetary value of data assets. Understands, articulates, and applies principles of the defined strategy to routine business problems that involve a single function. Utilizes knowledge of functional business domain and scenarios; categories of data and where it is held; business data requirements; database technologies and distributed datastores (e.g. SQL, NoSQL); data quality; existing business systems and processes, including the key drivers and measures of success. Supports the understanding of the priority order of requirements and service level agreements. Helps identify the most suitable source for data that is fit for purpose and perform initial data quality checks on extracted data. Utilizes data transformation and integration knowledge including: internal and external data sources including how they are collected, where and how they are stored, and interrelationships, both within and external to the organization; techniques like ETL batch processing, streaming ingestion, scrapers, API and crawlers; data warehousing service for structured and semi-structured data, or to MPP databases such as Snowflake, Microsoft Azure, Presto or Google BigQuery; Pre-processing techniques such as transformation, integration, normalization, feature extraction, to identify and apply appropriate methods; techniques such as decision trees, advanced regression techniques such as LASSO methods, random forests etc.; Cloud and big data environments like EDO2 systems. Extracts data from identified databases. Creates data pipelines and transform data to a structure relevant to the problem by selecting appropriate techniques. Develops knowledge of current data science and analytics trends. Utilizes Data Modeling including Cloud data strategy, data warehouse, data lake, and enterprise big data platforms; data modeling techniques and tools (for example, dimensional design and scalability), entity relationship diagrams, Erwin, etc.; query languages SQL / NoSQL; data flows through the different systems; tools supporting automated data loads; artificial intelligent enabled metadata management tools and techniques. Analyzes complex data elements, systems, data flows, dependencies, and relationships to contribute to conceptual, physical, and logical data models.

Minimum education and experience required: Bachelor’s degree or the equivalent in Computer Science or a related field plus 2 years of experience in software engineering or a related field; OR Master’s degree or the equivalent in Computer Science or a related field.

Skills required: Must have experience with: Utilizing Google DataProc Clusters to pull all the data from the on-prem redundant data stores; Developing Spark code using Pyspark or Spark Scala based on business requirement; Building the CICD pipelines during the deployments; Assembling large, complex data sets that meet functional/non-functional business requirements; Scheduling the jobs for the data pipelines using Airflow, crontabs; Developing data analytical processes using Hive; Developing automated script to run the jobs and alert mechanisms after each failed data load; Validating the data after loading into Hive tables /GCP Big Query to ensure the data quality and correctness; Utilizing the Kafka topics to produce/consume the data into GCP buckets. Employer will accept any amount of experience with the required skills.

#LI-DNP #LI-DNIWal-Mart is an Equal Opportunity Employer.

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 241 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers