Spark, Kafka, Airflow, ETL pipelines, and data infrastructure
21 courses available
Showing 21 courses
DataTalks.Club
DataTalks.Club - Free Data Engineering Zoomcamp is a comprehensive intermediate-level resource offered by DataTalks.Club, focused on building practical skills in data science and analytics. Whether you're a complete beginner looking to start a new career or a professional aiming to upgrade your skills, this resource provides a thorough learning experience. This is a structured online course with a carefully designed curriculum. Each module builds on the previous one, creating a logical progression from fundamentals to advanced topics. The course typically includes video lectures, reading materials, hands-on exercises, quizzes, and sometimes peer-reviewed assignments. This structured approach ensures you don't miss any critical concepts and build a solid foundation. This resource covers topics essential for success in data science and analytics, including Python, SQL, Pandas, NumPy, data visualization, statistics, and machine learning basics. The curriculum is structured to build your knowledge progressively — starting with foundational concepts and advancing to real-world applications. By the end, you should be able to: Understand the core concepts and theoretical foundations Apply your knowledge through hands-on exercises and small projects Build the practical skills employers actually screen for Develop the problem-solving approach used by working professionals Duration: Estimated duration: 50 hours of content, designed to be completed in 5-10 weeks at a comfortable pace. Basic familiarity with the subject area is recommended. You should have completed a beginner-level course or have equivalent self-taught knowledge. Comfort with using a computer and basic problem-solving skills will help. This resource is designed for a wide audience: Students (B.Tech, BCA, MCA, BSc) looking to complement their academic learning with practical, industry-relevant skills Fresh graduates preparing for campus placements or off-campus interviews Working professionals looking to upskill, switch domains, or advance their careers Career changers transitioning from non-tech backgrounds into data science and analytics Freelancers wanting to add new services to their portfolio Self-learners passionate about data science and analytics and wanting structured guidance Pricing: This resource is completely free with no hidden charges. Completing this resource and building related skills can prepare you for roles such as Data Analyst, Business Analyst, Data Scientist, Analytics Engineer. Realistic salary bands in India (2025-2026), based on Naukri/AmbitionBox data: Freshers / 0-2 years: Rs 4-8 LPA Mid-level / 2-5 years: Rs 10-22 LPA Senior / 5+ years: Rs 25-50 LPA Actual offers vary heavily by city, company tier, and how strong your portfolio or interview performance is. Companies actively hiring in this space include TCS, Infosys, Flipkart, Amazon, Swiggy, Zomato, PhonePe. The data science industry in India is projected to grow at 27% CAGR through 2028. Companies across all sectors — from banking (HDFC, ICICI) to e-commerce (Flipkart, Amazon) to healthcare (Practo, PharmEasy) — are building data teams. India currently has a shortage of 200,000+ data professionals, making this one of the best fields to enter right now. Cities like Bangalore, Hyderabad, Pune, and Gurgaon have the highest concentration of data science jobs. DataTalks.Club is a well-established platform trusted by millions of learners worldwide. This particular resource has been selected by our editorial team based on: Content quality — comprehensive coverage with clear explanations Practical focus — emphasis on hands-on skills over pure theory Student outcomes — positive reviews and career success stories Indian relevance — content applicable to the Indian job market and interview patterns Updated curriculum — material reflects current industry practices and tools We regularly review and update our recommendations to ensure they remain relevant and high-quality.
dbt Labs
dbt Fundamentals (Free Course) is a comprehensive intermediate-level resource offered by dbt Labs, focused on building practical skills in data science and analytics. Whether you're a complete beginner looking to start a new career or a professional aiming to upgrade your skills, this resource provides a thorough learning experience. This is a structured online course with a carefully designed curriculum. Each module builds on the previous one, creating a logical progression from fundamentals to advanced topics. The course typically includes video lectures, reading materials, hands-on exercises, quizzes, and sometimes peer-reviewed assignments. This structured approach ensures you don't miss any critical concepts and build a solid foundation. This resource covers topics essential for success in data science and analytics, including Python, SQL, Pandas, NumPy, data visualization, statistics, and machine learning basics. The curriculum is structured to build your knowledge progressively — starting with foundational concepts and advancing to real-world applications. By the end, you should be able to: Understand the core concepts and theoretical foundations Apply your knowledge through hands-on exercises and small projects Build the practical skills employers actually screen for Develop the problem-solving approach used by working professionals Duration: Estimated duration: 5 hours of content, designed to be completed in 1-1 weeks at a comfortable pace. Basic familiarity with the subject area is recommended. You should have completed a beginner-level course or have equivalent self-taught knowledge. Comfort with using a computer and basic problem-solving skills will help. This resource is designed for a wide audience: Students (B.Tech, BCA, MCA, BSc) looking to complement their academic learning with practical, industry-relevant skills Fresh graduates preparing for campus placements or off-campus interviews Working professionals looking to upskill, switch domains, or advance their careers Career changers transitioning from non-tech backgrounds into data science and analytics Freelancers wanting to add new services to their portfolio Self-learners passionate about data science and analytics and wanting structured guidance Pricing: This resource is completely free with no hidden charges. Completing this resource and building related skills can prepare you for roles such as Data Analyst, Business Analyst, Data Scientist, Analytics Engineer. Realistic salary bands in India (2025-2026), based on Naukri/AmbitionBox data: Freshers / 0-2 years: Rs 4-8 LPA Mid-level / 2-5 years: Rs 10-22 LPA Senior / 5+ years: Rs 25-50 LPA Actual offers vary heavily by city, company tier, and how strong your portfolio or interview performance is. Companies actively hiring in this space include TCS, Infosys, Flipkart, Amazon, Swiggy, Zomato, PhonePe. The data science industry in India is projected to grow at 27% CAGR through 2028. Companies across all sectors — from banking (HDFC, ICICI) to e-commerce (Flipkart, Amazon) to healthcare (Practo, PharmEasy) — are building data teams. India currently has a shortage of 200,000+ data professionals, making this one of the best fields to enter right now. Cities like Bangalore, Hyderabad, Pune, and Gurgaon have the highest concentration of data science jobs. dbt Labs is a well-established platform trusted by millions of learners worldwide. This particular resource has been selected by our editorial team based on: Content quality — comprehensive coverage with clear explanations Practical focus — emphasis on hands-on skills over pure theory Student outcomes — positive reviews and career success stories Indian relevance — content applicable to the Indian job market and interview patterns Updated curriculum — material reflects current industry practices and tools We regularly review and update our recommendations to ensure they remain relevant and high-quality.
Udemy
Learn Analytics Engineering with this dbt™ course covering theory & practice through a real-world Airbnb use case. Beginner-friendly Data Science & Analytics course on Udemy with 5 hours of content. Rated 4.6/5 by 617 learners. Price: $49.99.
Fivetran
Fivetran - Data Engineering Glossary is a comprehensive beginner-level resource offered by Fivetran, focused on building practical skills in data science and analytics. Whether you're a complete beginner looking to start a new career or a professional aiming to upgrade your skills, this resource provides a thorough learning experience. This is a comprehensive text-based learning resource — ideal for learners who prefer reading and reference-style learning over videos. The advantage of text-based resources is that you can easily search for specific topics, bookmark important sections, copy code snippets, and revisit concepts quickly without scrubbing through video timelines. Many working professionals prefer this format as it's easier to learn in short bursts during breaks. This resource covers topics essential for success in data science and analytics, including Python, SQL, Pandas, NumPy, data visualization, statistics, and machine learning basics. The curriculum is structured to build your knowledge progressively — starting with foundational concepts and advancing to real-world applications. By the end, you should be able to: Understand the core concepts and theoretical foundations Apply your knowledge through hands-on exercises and small projects Build the practical skills employers actually screen for Develop the problem-solving approach used by working professionals Duration: Estimated duration: 3 hours of content, designed to be completed in 1-1 weeks at a comfortable pace. No prior experience is required. This course starts from the absolute basics and gradually builds up complexity. A computer with internet access is all you need to get started. This resource is designed for a wide audience: Students (B.Tech, BCA, MCA, BSc) looking to complement their academic learning with practical, industry-relevant skills Fresh graduates preparing for campus placements or off-campus interviews Working professionals looking to upskill, switch domains, or advance their careers Career changers transitioning from non-tech backgrounds into data science and analytics Freelancers wanting to add new services to their portfolio Self-learners passionate about data science and analytics and wanting structured guidance Pricing: This resource is completely free with no hidden charges. Completing this resource and building related skills can prepare you for roles such as Data Analyst, Business Analyst, Data Scientist, Analytics Engineer. Realistic salary bands in India (2025-2026), based on Naukri/AmbitionBox data: Freshers / 0-2 years: Rs 4-8 LPA Mid-level / 2-5 years: Rs 10-22 LPA Senior / 5+ years: Rs 25-50 LPA Actual offers vary heavily by city, company tier, and how strong your portfolio or interview performance is. Companies actively hiring in this space include TCS, Infosys, Flipkart, Amazon, Swiggy, Zomato, PhonePe. The data science industry in India is projected to grow at 27% CAGR through 2028. Companies across all sectors — from banking (HDFC, ICICI) to e-commerce (Flipkart, Amazon) to healthcare (Practo, PharmEasy) — are building data teams. India currently has a shortage of 200,000+ data professionals, making this one of the best fields to enter right now. Cities like Bangalore, Hyderabad, Pune, and Gurgaon have the highest concentration of data science jobs. Fivetran is a well-established platform trusted by millions of learners worldwide. This particular resource has been selected by our editorial team based on: Content quality — comprehensive coverage with clear explanations Practical focus — emphasis on hands-on skills over pure theory Student outcomes — positive reviews and career success stories Indian relevance — content applicable to the Indian job market and interview patterns Updated curriculum — material reflects current industry practices and tools We regularly review and update our recommendations to ensure they remain relevant and high-quality.
Udemy
Learning dbt for Data Engineers. Beginner-friendly Data Science & Analytics course on Udemy with 4 hours of content. Rated 4.2/5 by 5 learners. Price: $54.99.
Udemy
Learn data build tool (dbt) in Cloud from basic to advanced, scenarios, test cases, deployment, document generation ,etc. Beginner-friendly Data Science & Analytics course on Udemy with 2 hours of content. Rated 3.5/5 by 39 learners. Price: $3.
Coursera
Modern analytics demands more than just storing data—it requires intelligent design that powers lightning-fast queries and consistent business insights. This course transforms you into a dimensional modeling expert who can architect data warehouses that scale with enterprise needs. This Short Course was created to help data management and engineering professionals accomplish robust, high-performance analytics infrastructure design. By completing this course, you'll be able to construct star-schema fact and dimension tables that eliminate query bottlenecks, identify and resolve redundant lookup paths that slow down analytics, and build semantic metrics layers that standardize business logic across your entire organization. By the end of this course, you will be able to: • Apply star-schema principles to create dimension and fact tables with surrogate keys • Analyze snowflake schema structures to identify and eliminate redundant lookups • Create semantic metrics layers that standardize business definitions and calculations This course is unique because it combines hands-on dimensional modeling with modern semantic layer architecture, bridging traditional data warehousing with contemporary analytics engineering practices. To be successful in this project, you should have a background in SQL, database design fundamentals, and experience with analytics workflows.
SkillUp
Gain practical, real-world experience in data architecture through this hands-on capstone project course, developing skills highly valued by employers. During this course, you’ll apply all that you’ve learned throughout the Data Architecture Professional Certificate. As you work through the course, you’ll evaluate, design, migrate, and integrate enterprise data systems through a case study. In the capstone project, you will assess the current data architectures of two organizations, highlighting their strengths and identifying areas for improvement. Based on this analysis, you will design and implement a unified and efficient architecture for the newly merged entity, aligning with business goals. The project includes working with both RDBMS and NoSQL databases and developing ETL pipelines to ensure smooth data integration and flow. Additionally, you will create a data governance plan that addresses regulatory compliance and outlines strategies for data protection. Overall, this real-world inspired scenario will give you plenty to talk about implementing an architecture and managing a system transition in interviews. If you’re keen to add practical experience to your portfolio that employers look for, enroll today!
EDUCBA
By the end of this course, learners will be able to apply nested queries, implement parent-child mappings, and execute relational-style joins in Elasticsearch for advanced data modeling. You will gain the skills to index event-driven data, use inner hits for detailed matches, and optimize query performance with global ordinals. This course is designed to bridge the gap between traditional relational databases and Elasticsearch’s document-oriented model. Through practical workshops and structured lessons, learners will explore application-side joins, nested queries, and indexing best practices. The course then advances into modeling complex one-to-many relationships using parent-child queries, with hands-on exercises for has_child and has_parent functionalities. What makes this course unique is its step-by-step alignment of relational concepts with Elasticsearch capabilities, ensuring you not only understand the theory but also practice real-world applications. Whether you are a data engineer, developer, or search specialist, this course equips you with essential strategies to analyze, structure, and query large-scale datasets efficiently in Elasticsearch.
Universidad Nacional Autónoma de México (via Coursera)
Welcome to the specialization course of Designing data-intensive applications. This course will be completed on four weeks, it will be supported with videos and exercises. By the end of this specialization, learners will be able to propose, design, justify and develop high reliable information systems according to type of data and volume of information, response time, type of processing and queries in order to support scalability, maintainability, security and reliability considering the last information technologies. Software to download: MySQL Workbench Rapidminer Hadoop framework Hortonworks MongoDB In case you have a Mac / IOS operating system you need to perform an action called VirtualBox.
Coursera
Master the critical skills for ensuring data reliability and building self-healing data systems. This course transforms your approach to data quality from reactive firefighting to proactive engineering driven reliability. This Short Course was created to help data management and engineering professionals accomplish systematic data quality assurance and error automation at enterprise scale. By completing this course, you'll be able to implement quantitative data quality measurements, establish monitoring systems that catch degradation trends before they impact business operations, and build intelligent SQL routines that automatically recover from data pipeline failures. By the end of this course, you will be able to: • Apply calculations to measure key data quality dimensions • Evaluate quality key performance indicators over time and recommend remediation • Create an automated SQL routine to handle and reprocess data errors. This course is unique because it blends quantitative data quality methods with practical automation engineering, enabling you to build self-healing data systems that maintain measurable quality standards at scale. To be successful in this course, you should have a background in SQL, data pipeline concepts, and basic data engineering principles.
Coursera
Analyze Agent Performance: Build and Test is an intermediate course for data analysts, ML engineers, and developers tasked with optimizing AI systems. In a world where agentic AI is increasingly common, it is not enough to build an agent—you must prove its effectiveness. This course equips you with the data-driven skills to measure, monitor, and improve AI agents built with frameworks like LangChain, Autogen, and CrewAI. You will learn to transform raw, noisy logs into actionable KPIs by applying data aggregation techniques with SQL and dbt. Through hands-on labs, you will design and execute controlled A/B experiments, comparing agent versions to identify meaningful improvements. You will master core statistical methods, including the Chi-square test, to determine whether your results are statistically significant or just random chance. You will be able to move beyond correlation to causation, making objective, evidence-based recommendations on deploying agent enhancements.
Packt
Snowpark has become an essential framework for modern data engineering, analytics, and machine learning workflows. This course equips learners with the skills to leverage Snowpark effectively and apply it across real-world data challenges. You’ll progress from foundational Snowpark concepts to efficiently processing data, building end-to-end pipelines, and developing data science solutions. By the end, you’ll be able to create scalable applications and deploy models directly within the Snowflake ecosystem. What sets this course apart is its blend of hands-on exercises and practical demonstrations, reinforcing core concepts with real Snowpark implementations. You’ll not only understand the framework but learn how to apply it to complex business needs. This course is ideal for data engineers, data scientists, and developers familiar with Python, SQL, or Snowflake who want to deepen their Snowpark capabilities.
University of California, Santa Cruz (via Coursera)
Millions of people get psychotherapy at some point in their lives, and even more are curious about what it might entail. In this course, we answer questions that many people wonder: What tools do clinicians use to diagnose mental illness? How do they decide how to treat it? What are the most evidence-based treatments out there? And how does the field figure out which ones are “evidence-based” in the first place? As more and more people seek psychotherapy around the globe, an understanding of it is increasingly important. This course enables the public to be good consumers of psychotherapy (e.g., to select evidence-based approaches that match their own difficulties), provides useful information for people considering a career in mental healthcare, and offers the chance to gain insight into one's own psychological distress. Students will learn the principles and theory behind several psychotherapeutic approaches, with a focus on evidence-based interventions including cognitive behavior therapy (CBT) and dialectical behavior therapy (DBT). Assignments include case videos and completion of evidence-based therapy-style worksheets. The course offers an introduction to the real-life practice of psychotherapy and a “behind the scenes” glimpse into how therapists develop their treatments.
Coursera
Did you know that organizations can reduce their data warehousing costs by up to 60% while improving performance through strategic architecture decisions and lifecycle management? This Short Course was created to help data engineers and architects accomplish cost-effective scaling of enterprise data warehouses. By completing this course, you'll be able to build automated SCD pipelines that preserve critical historical data, conduct sophisticated cost analysis to optimize storage strategies, and design multi-cluster architectures that eliminate resource contention while controlling expenses. By the end of this course, you will be able to: Apply techniques to implement data pipelines for managing historical data changes Analyze storage and compute cost trends to propose data archiving strategies Create a multi-cluster data warehouse architecture to isolate distinct workloads This course is unique because it combines hands-on technical implementation with financial optimization strategies, giving you both the SQL expertise and business acumen to scale warehouses intelligently. To be successful in this project, you should have a background in SQL, data warehousing concepts, and cloud computing fundamentals.
Fractal Analytics
This course equips business leaders with essential knowledge to strategically integrate Artificial Intelligence (AI) into their organizations. It emphasizes defining success, setting clear objectives, and translating vision into reality for effective AI implementation. Structured around three foundational equations, participants learn: 1. Objective + Vision = AI Adoption Success: This equation underscores the importance of clear objectives and visionary approaches through historical case studies, guiding participants to formulate coherent AI strategies. 2. Data Engineering + Design Thinking = Optimal AI Results: Highlighting the synergy between data engineering and design thinking, participants explore robust data pipeline engineering and user-centric AI implementations. 3. Accuracy + Ethics + Governance = Trustworthy AI Implementation: Emphasizing the significance of accuracy, ethical considerations, and governance, this equation stresses building societal trust in AI technologies. Additionally, the course covers talent acquisition, fostering an experimental culture, and ethical practices. It offers insights into focused AI strategies, user-centered design, and decision-making aligned with organizational goals. Ideal for leaders across startups to large corporations, this course is a vital resource for harnessing AI to drive growth and competitiveness in today's business environment. Whether leading a startup, SME, or large corporation, this course serves as an indispensable guide for leveraging AI technologies to enhance productivity, drive growth, and gain a competitive edge in today's dynamic business landscape.
Coursera
Did you know that hidden data anomalies can cascade through pipelines and corrupt entire dashboards, models, and business decisions? Finding the source of a data issue quickly is essential for maintaining trustworthy analytics and automated workflows. This Short Course was created to help professionals in this field build reliable data quality monitoring and debugging capabilities for maintaining trustworthy automated data workflows. By completing this course, you will be able to trace data anomalies back to their origin, inspect upstream and downstream dependencies, and diagnose quality failures inside complex pipelines—skills that dramatically reduce downtime and improve overall data reliability. By the end of this course, you will be able to: Investigate data quality issues by tracing anomalies to their source within a data pipeline. This course is unique because it connects data engineering principles with hands-on debugging techniques, giving you the practical skills needed to keep pipelines accurate, resilient, and ready for production demands. To be successful in this project, you should have: Basic SQL knowledge Understanding of data pipeline concepts Familiarity with ETL and ELT workflows
Coursera
Transform your data engineering expertise with advanced validation and historization techniques that ensure bulletproof data integrity. This course equips you with the critical skills to programmatically verify transformation accuracy through automated checksum validation and build enterprise-grade reusable logic for tracking historical changes in dimensional data. This Short Course was created to help data management and engineering professionals accomplish reliable, auditable data transformations that maintain complete historical accuracy. By completing this course, you'll be able to implement automated data validation workflows that catch discrepancies before they impact downstream systems, and architect modular SCD2 logic that can be deployed across multiple dimensional tables with confidence. By the end of this course, you will be able to: Evaluate data transformation accuracy by comparing aggregate checksums and flagging discrepancies Create reusable transformation logic to track historical changes in dimensional data This course is unique because it combines practical validation techniques with enterprise-scalable historical tracking patterns, focusing on real-world implementation challenges that data engineers face daily. To be successful in this project, you should have a background in advanced SQL, data warehousing concepts, ETL/ELT processes, and experience with dimensional modeling.
Yonsei University (via Coursera)
Spatial (map) is considered as a core infrastructure of modern IT world, which is substantiated by business transactions of major IT companies such as Apple, Google, Microsoft, Amazon, Intel, and Uber, and even motor companies such as Audi, BMW, and Mercedes. Consequently, they are bound to hire more and more spatial data scientists. Based on such business trend, this course is designed to present a firm understanding of spatial data science to the learners, who would have a basic knowledge of data science and data analysis, and eventually to make their expertise differentiated from other nominal data scientists and data analysts. Additionally, this course could make learners realize the value of spatial big data and the power of open source software's to deal with spatial data science problems. This course will start with defining spatial data science and answering why spatial is special from three different perspectives - business, technology, and data in the first week. In the second week, four disciplines related to spatial data science - GIS, DBMS, Data Analytics, and Big Data Systems, and the related open source software's - QGIS, PostgreSQL, PostGIS, R, and Hadoop tools are introduced together. During the third, fourth, and fifth weeks, you will learn the four disciplines one by one from the principle to applications. In the final week, five real world problems and the corresponding solutions are presented with step-by-step procedures in environment of open source software's.
University of Washington (via Coursera)
Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales. In this course, you will learn the landscape of relevant systems, the principles on which they rely, their tradeoffs, and how to evaluate their utility against your requirements. You will learn how practical systems were derived from the frontier of research in computer science and what systems are coming on the horizon. Cloud computing, SQL and NoSQL databases, MapReduce and the ecosystem it spawned, Spark and its contemporaries, and specialized systems for graphs and arrays will be covered. You will also learn the history and context of data science, the skills, challenges, and methodologies the term implies, and how to structure a data science project. At the end of this course, you will be able to: Learning Goals: 1. Describe common patterns, challenges, and approaches associated with data science projects, and what makes them different from projects in related fields. 2. Identify and use the programming models associated with scalable data manipulation, including relational algebra, mapreduce, and other data flow models. 3. Use database technology adapted for large-scale analytics, including the concepts driving parallel databases, parallel query processing, and in-database analytics 4. Evaluate key-value stores and NoSQL systems, describe their tradeoffs with comparable systems, the details of important examples in the space, and future trends. 5. “Think” in MapReduce to effectively write algorithms for systems...
Coursera
Data quality failures cost organizations millions in bad decisions and lost trust. This advanced course transforms you into a data quality architect who can prevent these failures before they happen. This Short Course was created to help data engineers and analysts accomplish bulletproof data validation automation that catches issues before they impact business decisions. By completing this course, you'll be able to embed automated quality checks directly into your data pipelines, systematically diagnose validation failures to their root cause, and build reusable SQL frameworks that scale across your entire data ecosystem. By the end of this course, you will be able to: Apply automated data quality tests to data models Analyze validation failures to pinpoint the root cause Create a reusable SQL validation framework based on table statistics This course is unique because it focuses on building systematic, code-based validation solutions rather than manual testing approaches, giving you the skills to automate data governance at enterprise scale. To be successful in this project, you should have a background in SQL, data pipeline concepts, and database system fundamentals.