Dataengineeringgcp
key job skills: designing, building, and running data processing systems; and operationalizing machine-learning models.
Big Data & ML Fundamentals
Big Data Challenges
- Migrating existing data workloads
- Analyzing large datasets at scale
- Building streaming data pipelines(so that the business can make data-driven decisions more quickly)
- Applying machine learning to your data(not just reacting to data, but also predicting)
Examples
- Compute Power: Creating a VM on Compute engine
- Storage: Elastic Storage with Google Cloud Storage
- Networking: Google’s data center network speed enables the separation of compute and storage
- Security: Cloud IAM
- Big data and ML
Choosing the right approach
Compute:
- Compute Engine:individual machines running native code, Infrastructure as a service(Iaas)
- Google Kubernetes Engine: clusters of machines running containers
- App Engine: Platform as a Service (Paas)
- Cloud Function: a completely serverless execution environment, Function as a service(Faas)
Storage:
- Cloud Bigtable
- Cloud Storage
- Cloud SQL
- Cloud Spanner
- Cloud Datastore
Modernizing Data Lakes and Data Warehouses
Building Batch Data Pipelines
Extra-Load,Extract-Load-Transform or Extract-Transform-Load paradigms.
Building Resilient Streaming Analytics Systems
Processing streaming data is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations.
Smart Analytics, Machine Learning, and AI
Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data.