×
超值优惠券
¥50
100可用 有效期2天

全场图书通用(淘书团除外)

关闭
Hadoop应用架构-(影印版)

Hadoop应用架构-(影印版)

¥25.6 (2.9折) ?
1星价 ¥36.5
2星价¥36.5 定价¥89.0

温馨提示:5折以下图书主要为出版社尾货,大部分为全新(有塑封/无塑封),个别图书品相8-9成新、切口有划线标记、光盘等附件不全详细品相说明>>

暂无评论
图文详情
  • ISBN:9787564170011
  • 装帧:暂无
  • 册数:暂无
  • 重量:暂无
  • 开本:32开
  • 页数:371
  • 出版时间:2017-02-01
  • 条形码:9787564170011 ; 978-7-5641-7001-1

本书特色

在使用Apache Hadoop设计端到端数据管理解决方案时获得专家级指导。当其他很多渠道还停留在解释Hadoop生态系统中该如何使用各种纷繁复杂的组件时,这本专注实践的书已带领你从架构的整体角度思考,它对于你的特别应用场景而言是必不可少的,将所有组件紧密结合在一起,形成完整有针对性的应用程序。
为了增强学习效果,本书第二部分提供了各种详细的架构案例.涵盖部分*常见的Hadoop应用场景。
无论你是在设计一个新的Hadoop应用还是正计划将 Hadoop整合到现有的数据基础架构中,Mark Grover 、Ted Malaska、Jonathan Seidman、Gwen Shapira编*的《Hadoop应用架构(影印版)(英文版) 》都将在这整个过程中提供技巧性的指导。
使用Hadoop存放数据和建模数据时需要考虑的要素 在系统中导入数据和从系统中导出数据的*佳实践指导 数据处理的框架,包括MapReduce、Spark和 Hive 常用Hadoop处理模式,例如移除重复记录和使用窗口分析 Giraph,GraphX以及其他Hadoop上的大图片处理工具 使用工作流协作和调度工具,例如Apache Oozie 使用Apache Storm、Apache Spark Streaming 和Apache Flume处理准实时数据流 点击流分析、欺诈防止和数据仓库的架构实例

内容简介

在使用 Apache Hadoop 设计端到端数据管理解决方案时,获得专家级指导。当其它很多渠道还停留在解释 Hadoop 生态系统中该如何使用各种纷纭复杂的组件时,这本专注实践的书已带领您从架构的整体角度思考,这样的角度对于您的特别应用场景而言,是必不可少的。它将所有组件紧密结合在一起,形成完整有针对性的应用程序。为了增强学习效果,本书第二部分提供了各种详细的架构案例,涵盖部分*常见的 Hadoop 应用场景。无论您在设计一个新的 Hadoop 应用,或者正计划将 Hadoop 整合到现有的数据基础架构中,本书都将在整个过程中提供技巧性的导引。

目录

Foreword Preface Part Ⅰ. Architectural Considerations for Hadoop Applications 1. Data Modeling in HadoopData Storage OptionsStandard File FormatsHadoop File TypesSerialization FormatsColumnar FormatsCompressionHDFS Schema DesignLocation of HDFS FilesAdvanced HDFS Schema DesignHDFS Schema Design SummaryHBase Schema DesignRow KeyTimestampHopsTables and RegionsUsing ColumnsUsing Column FamiliesTime-to-LiveManaging MetadataWhat Is Metadata?Why Care About Metadata?Where to Store Metadata?Examples of Managing MetadataLimitations of the Hive Metastore and HCatalogOther Ways of Storing MetadataConclusion 2. Data MovementData Ingestion ConsiderationsTimeliness of Data IngestionIncremental UpdatesAccess PatternsOriginal Source System and Data StructureTransformationsNetwork BottlenecksNetwork SecurityPush or PullFailure HandlingLevel of ComplexityData Ingestion OptionsFile TransfersConsiderations for File Transfers versus Other Ingest MethodsSqoop: Batch Transfer Between Hadoop and Relational DatabasesFlume: Event-Based Data Collection and ProcessingKafkaData ExtractionConclusion 3. Processing Data in HadoopMapReduceMapReduce OverviewExample for MapReduceWhen to Use MapReduceSparkSpark OverviewOverview of Spark ComponentsBasic Spark ConceptsBenefits of Using SparkSpark ExampleWhen to Use SparkAbstractionsPigPig ExampleWhen to Use PigCrunchCrunch ExampleWhen to Use CrunchCascadingCascading ExampleWhen to Use CascadingHiveHive OverviewExample of Hive CodeWhen to Use HiveImpalaImpala OverviewSpeed-Oriented DesignImpala ExampleWhen to Use ImpalaConclusion 4. Common Hadoop Processing PatternsPattern: Removing Duplicate Records by Primary KeyData Generation for Deduplication ExampleCode Example: Spark Deduplication in ScalaCode Example: Deduplication in SQLPattern: Windowing AnalysisData Generation for Windowing Analysis ExampleCode Example: Peaks and Valleys in SparkCode Example: Peaks and Valleys in SQLPattern: Time Series ModificationsUse HBase and VersioningUse HBase with a RowKey of RecordKey and StartTimeUse HDFS and Rewrite the Whole TableUse Partitions on HDFS for Current and Historical RecordsData Generation for Time Series ExampleCode Example: Time Series in SparkCode Example: Time Series in SQLConclusion 5. Graph Processing on HadoopWhat Is a Graph?What Is Graph Processing?How Do You Process a Graph in a Distributed System?The Bulk Synchronous Parallel ModelBSP by ExampleGiraphRead and Partition the DataBatch Process the Graph with BSPWrite the Graph Back to DiskPutting It All TogetherWhen Should You Use Giraph?GraphXJust Another RDDGraphX Pregel Interfacevprog0sendMessage0mergeMessage0Which Tool to Use?Conclusion 6. OrchestrationWhy We Need Workflow OrchestrationThe Limits of ScriptingThe Enterprise Job Scheduler and HadoopOrchestration Frameworks in the Hadoop EcosystemOozie TerminologyOozie OverviewOozie WorkflowWorkflow PatternsPoint-to-Point WorkflowFan- Out WorkflowCapture-and-Decide WorkflowParameterizing WorkflowsClasspath DefinitionScheduling PatternsFrequency SchedulingTime and Data TriggersExecuting WorkflowsConclusion 7. Near-Real-Time Processing with HadoopStream ProcessingApache StormStorm High-Level ArchitectureStorm TopologiesTuples and StreamsSpouts and BoltsStream GroupingsReliability of Storm ApplicationsExactly-Once ProcessingFault ToleranceIntegrating Storm with HDFSIntegrating Storm with HBaseStorm Example: Simple Moving AverageEvaluating StormTridentTrident Example: Simple Moving AverageEvaluating TridentSpark StreamingOverview of Spark StreamingSpark Streaming Example: Simple CountSpark Streaming Example: Multiple InputsSpark Streaming Example: Maintaining StateSpark Streaming Example: WindowingSpark Streaming Example: Streaming versus ETL CodeEvaluating Spark StreamingFlume InterceptorsWhich Tool to Use?Low-Latency Enrichment, Validation, Alerting, and IngestionNRT Counting, Rolling Averages, and Iterative ProcessingComplex Data PipelinesConclusion Part Ⅱ. Case Studies 8. Clickstream AnalysisDefining the Use CaseUsing Hadoop for Clickstream AnalysisDesign OverviewStorageIngestionThe Client TierThe Collector TierProcessingData DeduplicationSessionizationAnalyzingOrchestrationConclusion 9. Fraud DetectionContinuous ImprovementTaking ActionArchitectural Requirements of Fraud Detection SystemsIntroducing Our Use CaseHigh-Level DesignClient ArchitectureProfile Storage and RetrievalCachingHBase Data DefinitionDelivering Transaction Status: Approved or Denied?IngestPath Between the Client and FlumeNear-Real-Time and Exploratory AnalyticsNear-Real-Time ProcessingExploratory AnalyticsWhat About Other Architectures?Flume InterceptorsKafka to Storm or Spark StreamingExternal Business Rules EngineConclusion 10. Data WarehouseUsing Hadoop for Data WarehousingDefining the Use CaseOLTP SchemaData Warehouse: Introduction and TerminologyData Warehousing with HadoopHigh-Level DesignData Modeling and StorageIngestionData Processing and AccessAggregationsData ExportOrchestrationConclusionA. Joins in Impala Index
展开全部

预估到手价 ×

预估到手价是按参与促销活动、以最优惠的购买方案计算出的价格(不含优惠券部分),仅供参考,未必等同于实际到手价。

确定
快速
导航