数据工程基础(影印版)
Joe Reis, Matt Housley
出版时间:2023年03月
页数:422
“数据世界已经演变了有一段时间。首先是设计师,然后是数据库管理员,接着是首席信息官和数据架构师。本书标志着该行业演变和成熟的下一步。对那些忠于自己职业和事业的人来说,这是一本必读之书。”
——Bill Inmon
数据仓库的创建人
“本书很好地介绍了数据移动、处理和操作。我强烈将它推荐给任何想要快速掌握数据工程或分析的读者,或者想要填补理解上的空白的现有从业者。”
——Jordan Tigani
MotherDuck 创始人兼首席执行官,
BigQuery的创始工程师和共同创始人

数据工程在过去十年间发展迅速,许多软件工程师、数据科学家和分析师都在寻找相关实践的全面观点。通过这本实践用书,你将学习如何通过评估数据工程生命周期框架中可用的最佳技术来规划和构建系统,以满足你的组织和客户的需求。
作者Joe Reis和Matt Housley将为你介绍数据工程的生命周期,向你展示如何综合运用各种云技术,以满足下游数据消费者的需求。你将理解如何应用数据生成、摄取、编排、转换、存储和治理的概念,无论底层技术是什么,这些概念在任何数据环境中都至关重要。
本书将帮助你:
● 简要了解整个数据工程领域
● 使用端到端的最佳实践框架评估数据工程问题
● 在选择数据技术、架构和流程时避开市场营销炒作
● 使用数据工程生命周期来设计和构建稳健的架构
● 在数据工程生命周期中融入数据治理和安全性
  1. Preface
  2. Part I. Foundation and Building Blocks
  3. 1. Data Engineering Described
  4. What Is Data Engineering?
  5. Data Engineering Skills and Activities
  6. Data Engineers Inside an Organization
  7. Conclusion
  8. Additional Resources
  9. 2. The Data Engineering Lifecycle
  10. What Is the Data Engineering Lifecycle?
  11. Major Undercurrents Across the Data Engineering Lifecycle
  12. Conclusion
  13. Additional Resources
  14. 3. Designing Good Data Architecture
  15. What Is Data Architecture?
  16. Principles of Good Data Architecture
  17. Major Architecture Concepts
  18. Examples and Types of Data Architecture
  19. Who’s Involved with Designing a Data Architecture?
  20. Conclusion
  21. Additional Resources
  22. 4. Choosing Technologies Across the Data Engineering Lifecycle
  23. Team Size and Capabilities
  24. Speed to Market
  25. Interoperability
  26. Cost Optimization and Business Value
  27. Today Versus the Future: Immutable Versus Transitory Technologies
  28. Location
  29. Build Versus Buy
  30. Monolith Versus Modular
  31. Serverless Versus Servers
  32. Optimization, Performance, and the Benchmark Wars
  33. Undercurrents and Their Impacts on Choosing Technologies
  34. Conclusion
  35. Additional Resources
  36. Part II. The Data Engineering Lifecycle in Depth
  37. 5. Data Generation in Source Systems
  38. Sources of Data: How Is Data Created?
  39. Source Systems: Main Ideas
  40. Source System Practical Details
  41. Whom You’ll Work With
  42. Undercurrents and Their Impact on Source Systems
  43. Conclusion
  44. Additional Resources
  45. 6. Storage
  46. Raw Ingredients of Data Storage
  47. Data Storage Systems
  48. Data Engineering Storage Abstractions
  49. Big Ideas and Trends in Storage
  50. Whom You’ll Work With
  51. Undercurrents
  52. Conclusion
  53. Additional Resources
  54. 7. Ingestion
  55. What Is Data Ingestion?
  56. Key Engineering Considerations for the Ingestion Phase
  57. Batch Ingestion Considerations
  58. Message and Stream Ingestion Considerations
  59. Ways to Ingest Data
  60. Whom You’ll Work With
  61. Undercurrents
  62. Conclusion
  63. Additional Resources
  64. 8. Queries, Modeling, and Transformation
  65. Queries
  66. Data Modeling
  67. Transformations
  68. Whom You’ll Work With
  69. Undercurrents
  70. Conclusion
  71. Additional Resources
  72. 9. Serving Data for Analytics, Machine Learning, and Reverse ETL
  73. General Considerations for Serving Data
  74. Analytics
  75. Machine Learning
  76. What a Data Engineer Should Know About ML
  77. Ways to Serve Data for Analytics and ML
  78. Reverse ETL
  79. Whom You’ll Work With
  80. Undercurrents
  81. Conclusion
  82. Additional Resources
  83. Part III. Security, Privacy, and the Future of Data Engineering
  84. 10. Security and Privacy
  85. People
  86. Processes
  87. Technology
  88. Conclusion
  89. Additional Resources
  90. 11. The Future of Data Engineering
  91. The Data Engineering Lifecycle Isn’t Going Away
  92. The Decline of Complexity and the Rise of Easy-to-Use Data Tools
  93. The Cloud-Scale Data OS and Improved Interoperability
  94. “Enterprisey” Data Engineering
  95. Titles and Responsibilities Will Morph...
  96. Moving Beyond the Modern Data Stack, Toward the Live Data Stack
  97. Conclusion
  98. A. Serialization and Compression Technical Details
  99. B. Cloud Networking
  100. Index
书名:数据工程基础(影印版)
作者:Joe Reis, Matt Housley
国内出版社:东南大学出版社
出版时间:2023年03月
页数:422
书号:978-7-5766-0551-8
原版书书名:Fundamentals of Data Engineering
原版书出版商:O'Reilly Media
Joe Reis
 
Joe Reis是一名“恢复中的数据科学家”,也是一名数据工程师和架构师。
 
 
Matt Housley
 
Matt Housley是一名数据工程顾问和云专家。
 
 
The animal on the cover of Fundamentals of Data Engineering is the white-eared puffbird (Nystalus chacuru).
So named for the conspicuous patch of white at their ears, as well as for their fluffy plumage, these small, rotund birds are found across a wide swath of central South America, where they inhabit forest edges and savanna.
White-eared puffbirds are sit-and-wait hunters, perching in open spaces for long periods and feeding opportunistically on insects, lizards, and even small mammals that happen to come near. They are most often found alone or in pairs and are relatively quiet birds, vocalizing only rarely.
The International Union for Conservation of Nature has listed the white-eared puffbird as being of least concern, due, in part, to their extensive range and stable population。
购买选项
定价:136.00元
书号:978-7-5766-0551-8
出版社:东南大学出版社