DE int

 0    18 Datenblatt    guest3164346
mp3 downloaden Drucken spielen überprüfen
 
Frage English Antworten English
ETL (Extract, Transform, Load)
Lernen beginnen
A process where data is extracted from a source, transformed (e.g. cleaned or aggregated), and then loaded into a database or data warehouse.
ELT (Extract, Load, Transform)
Lernen beginnen
Raw data is first loaded into the destination (like BigQuery), and then transformed using SQL or other tools inside the warehouse.
DAG (Directed Acyclic Graph – Airflow)
Lernen beginnen
A structure used in Airflow to define workflows. It represents a sequence of tasks that must run in a specific, non-circular order.
Partitioning (BigQuery)
Lernen beginnen
Dividing a large table into parts (usually by date) to make queries faster and cheaper by scanning only relevant partitions.
JOIN (SQL)
Lernen beginnen
A way to combine data from two or more tables based on a related column (e.g. user_id).
HAVING vs WHERE (SQL)
Lernen beginnen
WHERE filters rows before aggregation; HAVING filters after. Example: HAVING COUNT(*) > 100.
PySpark
Lernen beginnen
Python API for Apache Spark. It’s used to process very large datasets in a distributed, parallelized way.
BigQuery
Lernen beginnen
A serverless cloud data warehouse from Google, designed for running fast SQL queries on large datasets.
Data Lake
Lernen beginnen
A storage system for raw, unstructured, or semi-structured data — often used for flexible analytics or staging.
Data Warehouse
Lernen beginnen
A structured database optimized for analysis and reporting, typically holding cleaned and transformed data.
Airflow Operator
Lernen beginnen
A unit of work in Airflow DAGs – defines what each task does (e.g. PythonOperator, BashOperator).
Kafka Topic
Lernen beginnen
A named data stream in Apache Kafka where producers send and consumers receive messages.
IAM (Identity and Access Management – GCP)
Lernen beginnen
A system for managing permissions and access to resources in Google Cloud – defines who can do what.
KPI (Key Performance Indicator)
Lernen beginnen
A measurable value that shows how effectively a process or business is performing (e.g. conversion rate, average delay).
Lazy Evaluation (Spark)
Lernen beginnen
Transformations are not executed until an action (like. count() or. collect()) is called – helps optimize performance.
Retry (Airflow)
Lernen beginnen
A setting that allows a task to be automatically retried after failure, helpful for unstable operations.
Data Validation
Lernen beginnen
The process of ensuring that data is accurate and consistent – includes checking for missing values, duplicates, or wrong formats.
Window Function (SQL)
Lernen beginnen
A function that performs calculations across a "window" of rows related to the current row, without collapsing them into a single result (e.g. ROW_NUMBER(), AVG(...) OVER(...)).

Sie müssen eingeloggt sein, um einen Kommentar zu schreiben.