Duties & Responsibilities
Data engineers are tasked with transforming data into a format that can be easily analyzed. They do this by developing, maintaining, and testing infrastructures for data generation. Data engineers work closely with data scientists and are largely in charge of architecting solutions for data scientists that enable them to do their jobs.
- Operate and maintain data platform and data pipeline
- Gather data requirement and design solution
- Develop new data pipeline, set up connection to data source and ingest data to data platform
- Transform raw data and curate into consistent format
- Perform unit test/ integration test/ user acceptance test
- Deploy new data pipeline/dashboard/model/bug fix
- Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
- Knowledge of building and optimizing ‘big data’ data pipelines, architectures and data sets.
- Knowledge of performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
- Strong analytic skills related to working with unstructured datasets.
- Processes supporting data transformation, data structures, metadata, dependency and workload management.
- A successful history of manipulating, processing and extracting value from large disconnected datasets.
- Knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores.
- Exp. with big data tools: Hadoop, Spark, Kafka, etc.
- Exp. with relational SQL and NoSQL databases, including Postgres and Cassandra.
- Exp. with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
- Exp. with AWS cloud services: EC2, EMR, RDS, Redshift
- Exp. with stream-processing systems: Storm, Spark-Streaming, etc.
- Exp. with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.