ลงชื่อเข้าใช้เพื่อดูโพรไฟล์ฉบับเต็มของ Shaun
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
ลงชื่อเข้าใช้เพื่อดูโพรไฟล์ฉบับเต็มของ Shaun
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
กรุงเทพมหานคร, กรุงเทพมหานคร, ประเทศไทย
ลงชื่อเข้าใช้เพื่อดูโพรไฟล์ฉบับเต็มของ Shaun
Shaun สามารถแนะนำคุณให้รู้จักกับผู้คนมากกว่า 10 คนที่ Agoda
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
ผู้ติดตาม 2K คน
คนรู้จักมากกว่า 500 คน
ลงชื่อเข้าใช้เพื่อดูโพรไฟล์ฉบับเต็มของ Shaun
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
ดูคนรู้จักที่มีร่วมกันกับ Shaun
Shaun สามารถแนะนำคุณให้รู้จักกับผู้คนมากกว่า 10 คนที่ Agoda
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
ดูคนรู้จักที่มีร่วมกันกับ Shaun
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
ลงชื่อเข้าใช้เพื่อดูโพรไฟล์ฉบับเต็มของ Shaun
หรือ
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
กิจกรรม
ผู้ติดตาม 2K คน
-
Shaun Sit แบ่งปันสิ่งนี้Do you want to work with me at Agoda? We’re hiring! We’re looking for experienced Backend/Frontend/Fullstack Engineers at Staff level and above and Engineering Managers who want to solve complex technical challenges on a global scale. Agoda is a tech travel company based in Asia, part of the Booking Holdings group. We have open roles in Bangkok, India, and Singapore to further support the next stage of our growth. For Bangkok-based roles, we offer full relocation support. To fast-track your application: 👉 Apply here: https://lnkd.in/g4An3ntH 🕒 Apply by: Monday, 23 March Our Talent Acquisition team (Clinton Guok) will reach out to those who are selected within a week of our application deadline to organize an interview. I’m often asked what we look for on my team. Two things stand out: Curiosity (especially because operating on-prem means understanding systems end to end—no black boxes) A genuine desire to help others win (platform work is enablement by design) If that sounds like you, Agoda might be the right place. I hope to see you application soon.
-
Shaun Sit แบ่งปันสิ่งนี้Shaun Sit แบ่งปันสิ่งนี้Due to the impact of COVID-19 on our business and the travel industry, we've had to say goodbye to great people at Agoda recently; talented individuals who we believe other companies would be lucky to have. In support of our teammates departing Agoda, we've launched the Agoda Talent Directory to help connect these talented individuals with new opportunities. If you’re currently hiring and looking for energetic, intelligent and caring people to join your team, please visit ago-da.co/talentdirectory
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้This week, the Booking Holdings (NASDAQ: BKNG) Town Hall was streamed live from Bangkok to colleagues from all the Booking Holdings brands around the globe. It's so good to have Glenn Fogel and Ewout Steenbergen visiting us in Bangkok. What was special this time is that we didn't only get a view into Glenn the leader (just open CNN casually and you'll get a feel for that), we also got a view of Glenn the person, who has such strong connections with so many employees and entrepreneurs throughout the world. I could feel the deep impact on the Agoda audience, and he promised he'll come back more frequently and for a longer time! #lifeatagoda #traveltech #bookingholdings #agoda #townhall #bangkok #kayak #opentable #bookingcom #priceline
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้me.setState(State.Humbled | State.Excited | State.Thankful) me.setPosition(Position.CTO)
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้Did you know that Agoda also builds websites for other travel platforms? Do read this blog 👇 by Thammarith Likittheerameth to find out how and why we apply our codes and overcome the challenges of white-labelling. #techatagoda #agoda #lifeatagoda #whitelabellingBuilding and scaling different travel websites with one codebaseBuilding and scaling different travel websites with one codebase
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้CDO Magazine congratulates Yaron Zeidman, Chief Technology Officer at Agoda, for being named as one of our 2021 Top Data Executives from the Leisure and Travel Industry List. Click here to view the full list - https://lnkd.in/g_4sjfSr
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้Announcement for upcoming .NET Conf Thailand: Breaking the Monolith I plan to tell the story about the ongoing process of a modularization of the Agoda website. You will know details about development process and transition from old to new architecture. Join live stream or watch the record. #dotnet
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้#Tencent #Singapore Tencent Insights at Singapore 10-11AM, Nov 12
-
Shaun Sit ชอบสิ่งนี้Shaun Sit ชอบสิ่งนี้Robin Moffatt from Confluent presenting how to use #kafka and #ksql to filter and join stream of events for fraud detection during the meetup at Booking.com office in Amsterdam. Looking back, in traditional message broker world, #ksql is next generation of topic filter subscription with rich query features over the content of your data streams. Powerful with #kafkaconnect adapters and #cdc pattern.
ประสบการณ์และการศึกษา
-
Agoda
******** ** ***********
-
******** ********** ** *********
********** ****** undefined undefined
-
ดูประสบการณ์ทั้งหมดของ Shaun
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
ยินดีต้อนรับกลับมา
การคลิกดำเนินการต่อเพื่อเข้าร่วมหรือลงชื่อเข้าใช้งาน จะถือว่าคุณยอมรับข้อตกลงผู้ใช้ นโยบายสิทธิส่วนบุคคล และนโยบายคุกกี้ของ LinkedIn
เพิ่งเข้าร่วม LinkedIn ใช่หรือไม่ เข้าร่วมเลย
ได้รับหนังสือแนะนำแล้ว
บุคคล 1 คนได้แนะนำ Shaun
เข้าร่วมตอนนี้เพื่อดูดูโพรไฟล์แบบเต็มของ Shaun
-
ดูคนรู้จักของเขาหรือเธอที่คุณก็รู้จักด้วย
-
ขอให้ช่วยแนะนำตัว
-
ติดต่อ Shaun โดยตรง
โพรไฟล์อื่นที่คล้ายกัน
สำรวจโพสต์เพิ่มเติม
-
Christopher Garzon
Data Engineer Academy • ผู้ติดตาม 19K คน
2025 Data Engineer Interview Questions for Meta (Big Data + Presto Focus) 1. Compare Presto and Hive for large-scale analytics. 2. How do you manage schema evolution in Hive Metastore? 3. Describe how Meta’s real-time analytics stack might use Scuba or Presto. 4. How would you optimize queries in a Presto cluster? 5. What are best practices for partitioning and bucketing Hive tables? 6. Explain data freshness challenges in distributed Presto clusters. 7. How would you debug skewed queries in Presto? 8. Describe how to implement data quality checks at Meta scale. 9. Compare HDFS and S3-compatible storage for Meta’s workloads. 10. How do you ensure idempotency across asynchronous ETL jobs? 11. Explain the role of Airflow in orchestrating Meta-style DAGs. 12. What’s your approach to data lineage and governance in large data lakes? 13. How do you implement row-level security in analytical systems? 14. Explain the pros and cons of columnar formats like ORC vs Parquet. 15. How do you handle long-tail queries that degrade cluster performance? 16. Describe how to optimize joins across terabyte-scale tables. 17. How would you manage SLA and priority scheduling in shared clusters? 18. What metrics would you monitor for pipeline reliability? 19. Explain how caching layers improve query performance in Presto. 20. Walk through designing a Facebook-style engagement analytics pipeline.
19
-
Delebayo F.
TrailDa • ผู้ติดตาม 2K คน
[BigQuery (Cloud Layer)] I extended the pipeline to GCP using BigQuery. Goal: prove cloud reproducibility, not just local execution. Steps: - provisioned dataset - synced PostgreSQL warehouse to BigQuery - validated all tables in cloud This separates storage from compute and prepares the system for scale. Local → Cloud parity was a key design goal. #BigQuery #GCP #DataEngineering
-
Mohsin Shaikh (L-I-O-N) - He/Him
EDB • ผู้ติดตาม 21K คน
PostgreSQL 18 adds native OAuth2 authentication, and EDB Principal Software Engineer Guang Yi Xu recently dropped a three-part deep dive that previews how it works. 🔹 How the SASL OAUTHBEARER flow works in PG18 🔹 Building a custom validator in Rust with pgrx 🔹 Extending a PostgreSQL client library (Go/pgx) to send bearer tokens Excellent read if you’re curious about token-based SSO or planning a PG18 upgrade. Start with part 1 here: https://bit.ly/3IGTwFD #PostgreSQL18 #OAuth2 #SSO #PG18 #EDBPostgresAI #Security #EnterpriseIT
6
-
Mike Sukmanowsky
Elvex • ผู้ติดตาม 2K คน
Back on January 26, Model Context Protocol (MCP) added support for MCP Apps https://lnkd.in/gHQvHpZV. With MCP Apps, a service like Notion or Coda can author an MCP server and instead of just offering an API-like tool to let a provider like Elvex, Inc."read a page", they could also offer a UI that offers a fully embedded version of Notion in your favorite AI tool (like Elvex, Inc.). It opens up a ton of cool possibilities for rich and interactive experiences with integrations! Since MCP is an open spec, anyone can author MCP Apps which continues Anthropic's strategy of supporting a more open ecosystem of standards/protocols compared to OpenAI's approach to a more traditional closed app eco-system (see https://lnkd.in/gFJgfeXx). I personally am a fan of the open-ecosystem approach 😉. We've been slow at Elvex, Inc. to jump on the MCP train mostly because not many services offer officially authored and/or hosted MCP servers. This means: 1. Customers have to find an MCP server for a service somewhere on the internet (e.g. Coda has no supported MCP server, but someone has authored one https://lnkd.in/gKu8ubMg). 2. Customers have to host said MCP server (some providers also support hosting servers for you). The issue is that the code in the MCP server is effectively fully untrusted if you or the service provider didn't author it. Nothing stops an MCP server from performing malicious activities like sending your personal credentials/PII to a bad actor. For personal use, you might feel comfortable with the risks. For the enterprises we work with, this is a non-starter. MCP Apps still suffer from the same issue of trust. While they run in a "sandboxed iframe" that doesn't stop someone from authoring an MCP App with a UI that captures your SSN/credit card/etc and sends that to a bad actor. We're about to launch support for MCP servers in Elvex, Inc. with full governance controls built in. Admins add trusted MCP servers and can even restrict the specific tools from a server (so that, for example, you can allow read tools but not write). Curious what others think here. MCP apps yay or nay? Also, if you want to try out our approach to MCP servers with governance built in, let me know in the comments or send me a DM!
15
1 ความคิดเห็น -
Bheem Reddy Gopanpally
OrbisIQ Inc • ผู้ติดตาม 2K คน
🚀 System Design Mastery: Day 3 – Horizontal vs Vertical Scaling Continuing the series with another core system design concept that every backend or cloud engineer must understand: 📌 Scaling Systems: Horizontal vs Vertical When traffic grows, how do we scale our systems to keep up? 🔸 Vertical Scaling (Scale Up) Add more power (CPU, RAM, SSD) to your existing server. 🧪 Example: Upgrading a 4-core machine to an 8-core machine. ✅ Simple to implement ⚠️ Limited by hardware limits; single point of failure 🔸 Horizontal Scaling (Scale Out) Add more machines to your system and distribute the load. 🧪 Example: Going from 1 server to 10 servers behind a load balancer. ✅ Highly scalable and fault-tolerant ⚠️ Requires smart architecture (e.g., stateless services, distributed storage) 💡 Analogy: Vertical scaling is like upgrading your restaurant’s kitchen. Horizontal scaling is like opening more branches of the same restaurant. Both help you serve more customers, but one has a ceiling — the other has limitless potential. 🎯 In real-world architecture, we often start with vertical scaling for simplicity, then shift to horizontal scaling for scalability and resilience. 📅 Up next tomorrow: Proxy vs Reverse Proxy #SystemDesign #HorizontalScaling #VerticalScaling #BackendEngineering #SoftwareArchitecture #DistributedSystems #TechLearning #EngineeringGrowth #CloudComputing #SystemDesignMastery #InterviewPrep
13
1 ความคิดเห็น -
Amal Poulose
Lumi | لومي • ผู้ติดตาม 3K คน
This before vs after highlights how database changes can scale safely. Moving from ad-hoc production updates to Database CI/CD introduces: • Review and approval workflows • Automated rollback • Clear auditability • GitOps integration from non-prod to prod 𝗕𝘆𝘁𝗲𝗯𝗮𝘀𝗲 shows how database changes can follow the same discipline as application code with governance, security, and collaboration built in. The key shift isn’t tooling. It’s treating database changes as a first-class part of the platform. #PlatformEngineering #DatabaseCI #DevOps #GitOps #KnowledgeSharing
10
2 ความคิดเห็น -
Confluent
ผู้ติดตาม 686K คน
Schema drift is one of the most underestimated cost drivers in Kafka. A small change, renamed field, updated data type, can ripple across consumers and break downstream systems. The result? • Reprocessing terabytes of data • Debugging marathons • SLA penalties Here’s how to prevent it before it scales 👉 https://cnfl.io/3OtCcXm
39
-
John Maeda
Microsoft • ผู้ติดตาม 471K คน
SQL <3 AI: AI Chef 🧑🍳 Davide Mauri shares his recipe for combining AI with SQL Server 2025 in the https://lnkd.in/gciRjSWM Cozy AI Kitchen (🎂CAIK) that we set up at my house. Chef 🧑🍳 Mauri explains how to leverage SQL Server 2025 to integrate AI capabilities, focusing on generating and using embeddings for SQL queries and data retrieval. HT 🧑🍳 Ross Heise 🧑🍳 Matt Scholz What you'll learn: - Storing and generating embeddings in SQL Server 2025. - Utilizing Azure OpenAI models for AI capabilities directly within SQL queries. - Implementing vector similarity search to improve data retrieval. - Exploring chunking strategies for handling large text inputs effectively. - Exposing SQL functionalities as REST endpoints for easy integration. --- Full 🍰 CAIK episode: https://lnkd.in/gciRjSWM All 50 CAIK 📺 episodes: https://lnkd.in/g6upvbGX
62
7 ความคิดเห็น -
Andrew Anokhin
ผู้ติดตาม 10K คน
Amazon Web Services (AWS) just launched 𝗦𝟯 𝗩𝗲𝗰𝘁𝗼𝗿𝘀 (Preview) — native vector storage inside Amazon S3.🚀 🔎 𝗪𝗵𝗮𝘁 𝗶𝘁 𝗶𝘀 𝗦𝟯 𝗩𝗲𝗰𝘁𝗼𝗿𝘀 introduces a new vector bucket type plus APIs to upsert embeddings and run k-NN similarity search directly in S3. 𝗡𝗼 𝘀𝗲𝗿𝘃𝗲𝗿𝘀 𝘁𝗼 𝗺𝗮𝗻𝗮𝗴𝗲, S3’s durability, and 𝘀𝘂𝗯-𝘀𝗲𝗰𝗼𝗻𝗱 𝗾𝘂𝗲𝗿𝗶𝗲𝘀 at 𝗺𝘂𝗹𝘁𝗶𝗺𝗶𝗹𝗹𝗶𝗼𝗻-𝘃𝗲𝗰𝘁𝗼𝗿 𝘀𝗰𝗮𝗹𝗲. It’s the first mainstream object store with vectors baked in. 💰 𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 Vector infrastructure is often the silent budget-eater in RAG and AI-agent workloads. AWS says 𝗦𝟯 𝗩𝗲𝗰𝘁𝗼𝗿𝘀 can slash upload, storage and query 𝗰𝗼𝘀𝘁𝘀 𝗯𝘆 𝘂𝗽 𝘁𝗼 𝟵𝟬%, turning your existing data lake into an ultra-affordable “vector lake.” 🔗 𝗪𝗵𝗲𝗿𝗲 𝗶𝘁 𝗳𝗶𝘁𝘀 𝗶𝗻 𝘆𝗼𝘂𝗿 𝘀𝘁𝗮𝗰𝗸 • 𝗕𝗲𝗱𝗿𝗼𝗰�� 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 Bases & 𝗦𝗮𝗴𝗲𝗠𝗮𝗸𝗲𝗿 Studio can point straight at a vector bucket—no external vector DB required. • 𝗢𝗽𝗲𝗻𝗦𝗲𝗮𝗿𝗰𝗵 now offers an “S3 Vectors” engine: keep “cold” embeddings inexpensive in S3, then export hot subsets to OpenSearch Serverless for ~10 ms latency when QPS spikes. Best of both worlds. 🎯 𝗜𝗱𝗲𝗮𝗹 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲𝘀 • Retrieval-Augmented Generation over large, mostly idle corpora (policies, manuals, media archives). • AI agents that need long-term, ever-growing memory but only moderate query rates. • Periodic or bursty semantic search across S3-resident data lakes. ⚙️ 𝗛𝗼𝘄 𝗶𝘁 𝗱𝗶𝗳𝗳𝗲𝗿𝘀 𝗳𝗿𝗼𝗺 𝘀𝘁𝗮𝗻𝗱-𝗮𝗹𝗼𝗻𝗲 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 You manage vectors like objects: create a vector bucket, add vector indexes (up to 10 k per bucket, each holding tens of millions of vectors), then query. All the familiar S3 goodies—IAM, KMS encryption, lifecycle policies, cross-region replication—still apply. When latency SLOs tighten, simply hydrate OpenSearch. 🛠️ 𝗚𝗲𝘁𝘁𝗶𝗻𝗴 𝘀𝘁𝗮𝗿𝘁𝗲𝗱 (𝗣𝗿𝗲𝘃𝗶𝗲𝘄 𝗮𝗻𝗻𝗼𝘂𝗻𝗰𝗲𝗱 𝟭𝟱 𝗝𝘂𝗹 𝟮𝟬𝟮𝟱) 1️⃣ Create a vector bucket 2️⃣ Define an index & embedding schema 3️⃣ Upsert ▲ (PUT Vector) 4️⃣ Query ▼ (POST SimilaritySearch) or plug the bucket into Bedrock KB. ⚠️ 𝗥𝗲𝗮𝗹𝗶𝘁𝘆 𝗰𝗵𝗲𝗰𝗸 The preview is tuned for 𝗰𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗱 𝘀𝘁𝗼𝗿𝗮𝗴𝗲 and 𝘀𝘂𝗯-𝘀𝗲𝗰𝗼𝗻𝗱 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲𝘀 at moderate QPS. If you need single-digit-millisecond latency at thousands of requests per second, layer 𝗢𝗽𝗲𝗻𝗦𝗲𝗮𝗿𝗰𝗵 or another real-time store on top. 𝗕𝗼𝘁𝘁𝗼𝗺 𝗹𝗶𝗻𝗲 👉 𝗦𝟯 𝗩𝗲𝗰𝘁𝗼𝗿𝘀 turns the world’s most popular object store into a durable, 𝗹𝗼𝘄-𝗰𝗼𝘀𝘁 𝘃𝗲𝗰𝘁𝗼𝗿 𝗹𝗮𝗸𝗲, while letting you dial up performance only when required. This could be the tipping point for making enterprise-scale 𝗥𝗔𝗚, 𝘀𝗲𝗮𝗿𝗰𝗵, and 𝗮𝗴𝗲𝗻𝘁 𝗺𝗲𝗺𝗼𝗿𝘆 both simple and affordable. #AWS #S3Vectors #GenerativeAI #RAG #VectorSearch #Bedrock #OpenSearch #DataEngineering #MLOps #AIInfrastructure #AgentiAI #AI
29
1 ความคิดเห็น -
Vishal Priyadarshi
UKG • ผู้ติดตาม 5K คน
A Kafka question that seems straightforward but often surprises candidates is: “If Kafka is so fast, why can your consumer still lag behind?” Many respond with, “Because the data volume is high.” However, the underlying reasons are typically more straightforward: 🔹 Consumer processing is slower than message production 🔹 Wrong partition strategy 🔹 Too few consumer instances 🔹 Heavy I/O inside the consumer logic This seemingly simple question highlights the depth of understanding required to effectively work with Kafka in real-world scenarios. #Kafka #PerformanceEngineering
89
8 ความคิดเห็น -
MOSS TECH SERVICES
ผู้ติดตาม 151 คน
❄️Stop treating Snowflake like a legacy database. ❄️ When I first moved to Snowflake, I kept trying to optimize it like I was still on-prem. I was worried about storage space, I was over-indexing, and I was terrified of JSON. Then I realized the architecture is fundamentally different. Separating storage from compute changes the game. You don't optimize for space anymore; you optimize for compute efficiency. I put together a "Snowflake Cheat Sheet" covering the features that actually matter in daily engineering: 🔹 Time Travel: How to fix mistakes without backups. 🔹 Zero-Copy Cloning: Creating dev environments in seconds. 🔹 Variant Type: Querying JSON without complex parsing. 🔹 Warehouse Sizing: When to Scale Up vs. Scale Out. If you are migrating to the cloud or just want to optimize your credit usage, this one is for you. The most common complaint I hear about cloud data platforms? "It gets expensive fast." But usually, that's not the platform's fault—it's the configuration. Snowflake is powerful, but you have to know which levers to pull. I created a quick cheat sheet on the essentials of Snowflake architecture and commands. Key takeaways for efficiency: 🔹 Auto-Suspend: If your warehouse is running while no one is querying, you are burning money. 🔹 Caching: Understand the Result Cache vs. Local Disk Cache to write cheaper queries. 🔹 Cloning: Don't duplicate data (and pay for storage) if you don't have to. Use CLONE. 1. The Core Architecture 🔹 Decoupled Storage & Compute: ➕Storage Layer: Holds data (micro-partitions). Cheap & scalable. ➕Compute Layer (Virtual Warehouses): Processes queries. Expensive & ephemeral. ➕Cloud Services: The "Brain" (Security, metadata, optimization). 2. Virtual Warehouses (Compute) 🔹 Scale UP (Resizing): Moving from XS to XL. Makes complex queries run faster. 🔹 Scale OUT (Multi-Cluster): Adding more clusters. Handles high concurrency (many users). 🔹 Auto-Suspend/Resume: Crucial for cost. set AUTO_SUSPEND = 60 (seconds) to stop billing when idle. 3. The "Superpowers" (Unique SQL) 🔹 Time Travel: Query data as it looked in the past. ➕SELECT * FROM table AT(OFFSET => -60*5); (Data 5 mins ago) 🔹 Zero-Copy Cloning: Create instant copies of DBs/Tables without duplicating storage. ➕CREATE TABLE dev_table CLONE prod_table; 🔹 Undrop: Accidentally deleted a table? ➕UNDROP TABLE my_table; 4. Semi-Structured Data (JSON) Stop struggling with NoSQL. 🔹 VARIANT: The data type to store JSON/XML/Avro. 🔹 Querying: Use dot notation. SELECT src:customer.name::string FROM json_table; 🔹 FLATTEN: Turn nested JSON arrays into SQL rows. 5. Performance & Caching 🔹 Metadata Cache: Instant counts (SELECT COUNT(*)) come from metadata, not a table scan. 🔹 Result Cache: If you run the exact same query within 24 hours (and data hasn't changed), result is instant and free. 🔹 Local Disk Cache: Warehouses cache data on SSDs. Warming the cache speeds up subsequent queries.
2
-
Jesper Moselund Christensen
Google • ผู้ติดตาม 3K คน
Moving Apache Iceberg tables from Hive Metastore (HMS) to the BigLake REST Catalog on GCP? 🧊 The BigLake Iceberg REST Catalog is an alternative to managing your own HMS or using the basic Project-level catalog. It follows the Iceberg REST OpenAPI spec, making it easier to share metadata across BigQuery, Spark, and Trino. The migration is a metadata-only operation; you don't need to rewrite your Parquet files 🎉 🛠 The Migration Process: 1️⃣ Setup Spark Session: Configure your environment with both the source (HMS) and destination (BigLake REST) catalogs 2️⃣ Identify Metadata: Locate the current metadata_json_location for your Iceberg tables in HMS 3️⃣ Register Tables: Use the Spark register_table procedure to point the BigLake REST Catalog to the existing metadata path 4️⃣ Validation: Verify the table schema and snapshot history are intact in the new catalog Full technical walkthrough and code snippets here: https://lnkd.in/gf9nDGJQ
3
-
Danish Ahmed Khan
Unilever • ผู้ติดตาม 370 คน
🚀 Rethinking Data Backups in Databricks: Why Declarative LakeFlow Pipelines Are Becoming the New Default As data engineers, many of us start backups the “classic” way: Spark Structured Streaming readStream + writeStream manual checkpointing, custom triggers, and a lot of glue code. “It works, but the operational overhead grows quickly, especially when you need long‑term data retention or continuous history for your tables.” Over the last few months, I’ve been exploring simpler and more scalable approaches for backing up data in Databricks. During this process, I revisited a pattern many of us rely on: 👉 Spark Structured Streaming with manual checkpointing It’s powerful — but not always the simplest or most maintainable. While researching, I discovered something interesting-> ➡️ Spark Declarative Lakeflow + Streaming Tables, there’s now a much cleaner option for a lot of simple append‑only backup scenarios – not just for system tables, but for any tables where you want to maintain continuous history. Key benefits I found: 💡 Automatic checkpointing & state management No need to maintain checkpoint paths, failure recovery logic, or job orchestration manually. 💡 Cleaner scheduling Simply declare the refresh interval — no clusters, triggers, or job wiring required. 💡 Stronger governance Everything is tracked in Unity Catalog with built‑in lineage. 💡 Less code, less failure surface Perfect for straightforward, long‑term replication. And yes — Structured Streaming is still useful for advanced scenarios (custom triggers, multi-sink writes, foreachBatch, etc.) that are not yet supported in Spark Declarative LakeFlow. But for most backup pipelines, LakeFlow Streaming Tables offer a simpler approach that “just works.” Where this approach fits best ✔️ Append‑only datasets ✔️ Tables where retention limits are too short ✔️ Teams who want durable, low‑ops backup pipelines ✔️ Any data source where simplicity > tuning micro‑batch internals If you're still using manual Spark Structured Streaming for simple ingestion/backups, it’s worth looking at Streaming Tables in LakeFlow. The operational simplicity is a game changer. Happy to share more details or code patterns if anyone is exploring similar use cases. #Databricks #DeltaLake #Lakehouse #DataEngineering #Spark #LakeFlow #StreamingTables
16
-
Gopi Kiran Gogineni
Podimo • ผู้ติดตาม 717 คน
Stop Failing Your Queries: BigQuery’s LAX JSON Functions 🚀 Tired of "Type Mismatch" errors crashing your BigQuery pipelines? If one JSON row has a string where a number should be, the whole query fails. The LAX family—LAX_BOOL, LAX_INT64, LAX_FLOAT64, and LAX_STRING—is the "forgiving" solution every Data Engineer needs. The Breakdown LAX_INT64 / FLOAT64: Converts JSON numbers and numeric strings (e.g., "123") to SQL numbers. LAX_BOOL: Handles JSON booleans and strings like "true". LAX_STRING: Converts any scalar (bool, number, or string) into a SQL String. ✅ The Advantages Resilience: They return NULL instead of erroring out when conversion fails. Automatic Coercion: No more SAFE_CAST(JSON_VALUE(...)) chains; they handle quoted numbers automatically. Cleaner SQL: Shorter, more readable code for extraction. ❌ The Disadvantages Silent Errors: Because they don’t fail, data quality issues might stay hidden as NULLs. Scalar Only: These don't work on complex Arrays or Objects—just simple values. Ambiguity: Hard to tell if a NULL was a missing value or a failed conversion. 💡 Pro-Tip Use LAX functions in your Bronze-to-Silver ELT layers. It keeps the data flowing even when upstream APIs send messy types. You can audit the NULL rates later! Still using strict casting? Give LAX a try. #BigQuery #GoogleCloud #DataEngineering #SQL #JSON
7
-
Federico Bongiovanni
Google • ผู้ติดตาม 3K คน
Kubernetes is continuously evolving to stay the #1 container orchestration platform for AI/ML workloads. The AI Conformance certification is a strong signal for any vendor that offers Kubernetes services. Take a look to the program, especially heading to Kubecon NA 2025!
28
-
Haziq Hakimi Mazlisham
PayNet (Payments Network… • ผู้ติดตาม 1K คน
Building Near Real-Time Bus Tracking App with Aiven Kafka Tan Ze Han Dylan UPM and I built near real-time pipeline for tracking Rapid KL Bus, powered by Kafka on Aiven #FreeKafka and a Lambda Architecture pipeline. 📍 Kafka (Aiven) for real-time ingestion 🤖 Spark Streaming + Kafka Connector for processing 🚦 AWS S3 + Iceberg for the lakehouse ⚡Cache & serve near real-time state with Redis ⚙️ Kubernetes + Prefect for orchestration 🗺️ LeafletJS + Next.js for live map visualization This will be our submission for Kafka competition hosted by Aiven. Enjoy the Demo!
225
12 ความคิดเห็น