Publish to my blog (weekly)
-
Streaming in Spark, Flink, and Kafka - DZone Big Data
- that every record will be processed exactly once, thereby eliminating any duplicates that might be available.
- Kafka
- Building real-time streaming data pipelines that reliably get data between systems or applications.
- Building real-time streaming applications that transform or react to the streams of data.
- the interchange of signals between cellphone towers
- financial transactions
- cars in motion emitting GPS signals
- web traffic including things like session tracking and understanding user behavior on websites
- measurements from industrial sensors.
- bogged down
- 수렁에 빠진...
- While Spark has adopted micro batches
- Flink has adopted a continuous flow operative-based streaming model
- Flink has record-based or any custom user-defined window criteria
- Spark has a time-based window criteria
- are usable for dozens of Big Data scenarios
- are using them on top of Hadoop (YARN, HDFS)
- standalone mode
- Spark is an open-source cluster computing framework with a large global user base
- its cluster manager (YARN)
- storage (HDFS, HBase, etc.)
- storage platforms (the likes of Cassandra and Amazon S3
- Apache Flink is an open-source platform for distributed stream and batch data processing.
- data distribution
- communication
- fault tolerance for distributed computations over data streams
- Apache Spark is considered a replacement for the batch-oriented Hadoop system. But it includes a component called Apache Spark Streaming, as well. Contrast this with Apache Flink, which is a Big Data processing tool and it is known to process big data quickly with low data latency and high fault tolerance on distributed systems on a large scale.
- Apache Kafka is a distributed streaming platform
- Spark processes chunks of data, known as RDDs while Flink can process rows after rows of data in real time
- Spark processes data in batch mode while Flink processes streaming data in real time.
- Spark needs to optimize and adjust its jobs manually to individual datasets
- Flink can automatically adapt to varied datasets
- Spark does manual partitioning and caching
- Flink pages out to disk when memory is full, which is what happens with Windows and Linux too
- Flink is able to provide intermediate results on its data processing whenever required
- Flink follows a distributed data flow approach
- Spark follows a procedural programming system,
- Flink is a cluster framework, which means that the framework takes care of deploying the application, either in standalone Flink clusters, or using YARN, Mesos, or containers (Docker, Kubernetes).
- The Streams API is a library that any standard Java application can embed and hence does not attempt to dictate a deployment method; you can thus deploy applications with essentially any deployment technology.
- Flink has a dedicated master node for coordination
- the Streams API relies on the Kafka broker for distributed coordination and fault tolerance.
- In Apache Flink, fault tolerance, scaling, and even distribution of state are globally coordinated by the dedicated master node.
- easier to use
- much faster
- has a more flexible windowing system
- about twice as fast as Apache Spark with NAS.
- no latency in processing elements from a stream
- The Streams API makes stream processing accessible as an application programming model, that applications built as microservices can avail from
- Flink, on the other hand, is a great fit for applications that are deployed in existing clusters and benefit from throughput, latency, batch processing.
-
-
- Loaders can transform files from a different language like, CoffeeScript to JavaScript, or inline images as data URLs
- Loaders even allow you to do things like
require()
css files right in your JavaScript!
-
-
http://www.reactive-streams.org/
- Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure.
-
-
Using GraphQL with MongoDB - Compose Articles
- GraphQL is an alternative to REST endpoints for handling queries and database updates
- the link between front-end applications and back-end servers can become pretty rigid
- with GraphQL, the UI gets the data it needs in a form that is useful for the UI.
- With GraphQL as an abstraction layer, we hide the complications of web services and let back-end developers create a menu of both available data structures and items available to retrieve (and how to retrieve them).
- Doing so allows the front-end application - and its developers - the ability to select from the items on the menu as they need.
- The front-end application doesn't need to worry about new items being added to the menu or where the data for that menu is coming from; that's a task for the GraphQL server.
-
-
Basics and working with Flows — Akka Documentation
- Akka Streams is a library to process and transfer a sequence of elements using bounded buffer space.
- An active process that involves moving and transforming data.
- An element is the processing unit of streams. All operations transform and transfer elements from upstream to downstream. Buffer sizes are always expressed as number of elements independently form the actual size of the elements.
- A means of flow-control, a way for consumers of data to notify a producer about their current availability, effectively slowing down the upstream producer to match their consumption speeds. In the context of Akka Streams back-pressure is always understood as non-blocking and asynchronous.
- Means that a certain operation does not hinder the progress of the calling thread, even if it takes long time to finish the requested operation.
- The common name for all building blocks that build up a Flow or FlowGraph. Examples of a processing stage would be operations like
map()
,filter()
, stages added bytransform()
likePushStage
,PushPullStage
,StatefulStage
and graph junctions likeMerge
orBroadcast
. For the full list of built-in processing stages see Overview of built-in stages and their semantics - emitting data elements whenever downstream processing stages are ready to receive them.
- A processing stage with exactly one output
- requesting and accepting data elements possibly slowing down the upstream producer of elements
- A processing stage with exactly one input
- which connects its up- and downstreams by transforming the data elements flowing through it.
- A processing stage which has exactly one input and output
- A Flow that has both ends "attached" to a Source and Sink respectively, and is ready to be
run()
. - Materialization is the process of allocating all resources needed to run the computation described by a Flow (in Akka Streams this will often involve starting up Actors).
-
-
Concurrency: Java Futures and Kotlin Coroutines - DZone Java
-
- It was designed to support multi-project builds that are expected to be quite huge.
- It also allows for incrementally adding to your build, because it knows which parts of your project are updated.
- Tasks that are dependent on updated parts are no longer re-executed.
- Gradle is based on a graph of task dependencies
- Maven is based on a fixed and linear model of phases.
- Gradle allows for incremental builds because it checks which tasks are updated or not.
- both allow for multi-module builds to run in parallel
- Incremental compilations for Java classes
- Compile avoidance for Java
- The use of APIs for incremental subtasks
- A compiler daemon that also makes compiling a lot faster
- your own private company repository as well
- Maven has Maven Central
- Gradle has JCenter
- implementation dependencies
- API
- keeps repository metadata along with cached dependencies
- concurrent safe caches
- two or more projects using the same cache will not overwrite each other
- synchronize cache with the repository
- it has a checksum-based cache
- define custom rules
- Gradle is compatible with IVY Metadata
- The use of substitution rules for compatible libraries
- The use of ReplacedBy rule
- Better metadata resolution
- The ability to dynamically replace project dependencies with external ones, and vice versa
- a fully configurable DAG,
- transitive exclusions
- allows task exclusions
- task dependency inference
- Administering build infrastructure
- accept auto provisioning
- configure version-based build environments without having to set these up manually
- custom distributions.
-
-
- These language tags will be validated used to create
Lang
instances. To access the languages supported by your application, you can inject aLangs
instance into your component.
-
-
-
- Seeing if the
Context
’slang
field has been set explicitly. - Looking for a
PLAY_LANG
cookie in the request. - Looking at the
Accept-Language
headers of the request. - Using the application’s default language.
- Seeing if the
-
changeLang
orsetTransientLang
.
-
-
Ad hoc polymorphism in Scala for the mere mortals - codecentric AG Blog : codecentric AG Blog
- arbitrary
- 임의의...
- Although this approach works, it has some serious drawbacks. First, we must use implicit conversion, even if we just want to perform a single operation
- Second, it is not natural to define a binary operation on a single value.
- view bounds are deprecated in Scala since version 2.11.
- Type classes to the rescue
- This is just simple dependency injection (parameter injection, to be precise).
- We create Adder implementations for Integers and Strings:
- we can invoke our function, passing the right Adder to it:
- Next, we are going to make our Adder dependency implicit and define Adder implementations as implicit objects, which we will import into the scope when we need them.
- This would allow us to omit the second argument in the function invocation, since the compiler will pick up the right implementation and pass it through automatically:
- summon
- 소환하다. 소집하다. ~에 권유하다.
- (implicit adder: Adder[A])
- implicitly[Adder[A]].add(x, y)
- Scala provides syntactic sugar for declaring implicit parameters like this, which is called a context bound:
- [A: Adder]
- implicitly[Adder[A]].add(x, y)
- The compiler translates the clause A: Adder into an additional implicit parameter list containing a parameter of type Adder[A], exactly as we defined previously by hand.
- the clause A: Adder conveys the idea that the type A must “belong” to the type class Adder, much like the requirement that parameters x and y must belong the the type A. We are going to explore this idea further down.
- a companion object of Adder:
- trait Adder의 companion object
- succinctly
- 간결하게...
- [A: Adder]
- Adder[A].add(x, y)
- Semigroup.
- A Semigroup is just a collection of objects – for example integers or strings – with a defined binary operation on them producing another object of the same type.
- two strings can be concatenated producing another string.
- two integers can be added producing another integer,
- This property is called associativity.
- lonesome
- 쓸쓸한...
- intimidate
- 협박하다. 위협하다.
- if we would not require our Adder to obey the associativity law, mathematicians would call it a Magma. Add associativity to it and you get a Semigroup. Further down in the text we are going to add a function to get a zero value from our Adder, effectively extending it into something called a Monoid.
- redefine it to aggregate an arbitrary number of values.
- (xs: Iterable[A])
- xs.fold(???)(Semigroup[A].add)
- A Semigroup with a zero value is called a Monoid:
- the companion object in order to omit the call to implicitly in our implementation:
- Monoid[A].zero
- Monoid[Int]
- override val zero = 0
- Monoid[String]
- override val zero = ""
- Why Type Class Pattern?
- providing a blueprint for constructing values (instances) of the class
- defining a data type.
- As a data type a class describes a collection of properties an object must have in order to belong to this specific type
- A type class lifts this same concept to a higher level, applying it to types. It describes a collection of properties a type must have in order to belong to this specific type class.
- if it is known that a type belongs to a Semigroup type class, then it is known that instances of that type can be combined according to an associative binary operation, producing another instance of the same type (for example addition for integers).
- In our case, that gap is filled by the type class pattern: the information that a type belongs to a certain type class is expressed implicitly by providing an implementation of a trait defining the properties of that type class, sometimes called evidence.
-
댓글