Azure
Note: StorageAccount is not the same as Azure Account. A StorageAccount is specific to storage services and has its own settings like redundancy and access privileges.
StorageAccount
- Redundancy policy: Determines how and where the data is replicated to ensure high availability and fault tolerance.
- Access privileges: Defines the permission levels (read, write, delete) for accessing and managing the stored data.
Storage Types
- Blob: For unstructured or binary data.
- Tiers:
- hot: For frequently accessed data.
- cool: For infrequently accessed data with rapid access needs.
- cold: For rarely accessed data.
- archive: For long-term archival with the lowest access frequency.
- Files: Shared file storage with support for SMB and NFS protocols; ideal for lift and shift scenarios.
- Tables: NoSQL storage for structured data, leveraging partition key and row ID with a default timestamp.
- Queue: Messaging service similar to RabbitMQ for asynchronous communication (not intended to replace Kafka).
- DataLake: A specialized Blob storage with namespace support for big data analytics and file system-like operations.
Table storage
Always there are columns
PartitionKey: determines the partitionRowKey: unique key i partition
=> (PartitionKey,RowKey) globaly uniqueTimestamp: last modification time
Rest of columns are dynamic as you use them.
Data types: string, int32, int64, int128 (guid), double, datetime, bool, byte[]
Filtering with OData:
- operators:
eq,gt,le,and,or,not - string:
'My string value' - bool:
true,false - timestamp:
datetime'YYYY-MM-DDTHH:MM:SSZ'
Filtering examples:
Age gt 18 and Age lt 30not( Age gt 18 )Timestamp eq datetime'2023-10-23T10:35:43.3385326Z'
You can use: select, top
You cannot use: count, orderby
Concurrency in Blob Storage
- Last writer wins: The default mode where the last write operation takes precedence.
- Optimistic concurrency: Uses the
if-matchheader to ensure data integrity; returns error 412 (precondition failed) if there is a version mismatch.etag(entity tag) is used to determinate age. - Pessimistic concurrency: Implements a lease mechanism on a blob (from 15 to 60 seconds) to prevent concurrent write conflicts.
Snapshot isolation for read: Allows the creation of snapshots ensuring a consistent view of the data during reads.
Data Redundancy in Storage
- LRS (Locally Redundant Storage): Keeps 3 copies of data within a single zone.
- ZRS (Zone Redundant Storage): Distributes 3 copies across multiple zones within a region.
- GRS (Geo Redundant Storage): Maintains 3 copies in a primary region with an additional replication in a secondary region.
- GZRS (Geo Zone Redundant Storage): Combines the benefits of ZRS with geo-replication (additional copy in another region).
Note: Data redundancy is not the same as CDN.
RPO and RTO
- RPO (Recovery Point Objective): Maximum acceptable data loss, typically less than 15 minutes.
- RTO (Recovery Time Objective): Target time to restore services after an outage.
Azure Services
- CDN (Content Delivery Network): Provides global dynamic caching for faster content delivery.
- Functions: Serverless compute platform for event-driven applications.
- Container: Services like Azure Container Instances for deploying containerized applications.
- Kubernetes Service (AKS): Managed Kubernetes for container orchestration.
- Load Balancer: Distributes incoming traffic to optimize resource utilization and availability.
- Databricks: Managed Apache Spark-based analytics platform for big data processing and machine learning.
- SQL: Managed relational database service for both operational and analytical workloads.
Key Terms
- ACID: Atomicity, Consistency, Isolation, Durability – core properties of database transactions.
- OLTP (Online Transaction Processing): Real-time transactional processing.
- OLAP (Online Analytical Processing): Complex analytical queries for business intelligence.
- RBAC (Role-Based Access Control): Access control based on user roles.
- ACL (Access Control List): List defining specific permissions for users or groups.
- SAS (Shared Access Signature): Tokens that delegate restricted access to storage resources without sharing primary keys.
Protocols
- REST: Architectural style using HTTP methods for communication.
- GraphQL: Query language that allows clients to request only the data they need.
- GRPC: A high-performance communication framework based on HTTP/2 for low latency and efficient data transfer.