Emilie  Okumu

Emilie Okumu

1658518440

Annotated Notes and Summaries Of The TensorFlow White Paper

TensorFlow White Paper Notes

Features

  • Notes broken down section by section, as well as subsection by subsection
  • Relevant links to documentation, resources, and references throughout
  • SVG versions of figures/graphs
  • So many bullet points!

To-do list

  • Create and utilize anchor tags throughout notes for self-navigating

White Paper available at this link


TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Abstract


1 Introduction

  • Google Brain started in 2011, and DistBelief was its first-generation scalable, distributed machine learning system
  • DistBelief was used for a large number of research and commercial tasks
  • TensorFlow, Google's second-generation machine learning system, was designed from lessons learned in the process of engineering and using DistBelief
  • The TensorFlow API is used to describe a dataflow-like model, and the implementation then maps those models onto the underlying machine hardware
  • This allows users to have a single system that runs on a broad spectrum of machines, reducing overhead caused from rewriting code for different hardware
  • Focus of development was to maintain flexibility for research purposes while attaining enough performance to be used in production
  • Can express various types of parallelism by replicating the dataflow model across multiple machines and running them in parallel
    • Some functions within TensorFlow allow for less consistency in parallelism if desired
      • Larger, multiple machine uses of TensorFlow can take advantage of less-strict synchronization requirements
  • TensorFlow is more flexible, faster, and supports more machine learning models than DistBelief

2 Programming Model and Basic Concepts

  • TensorFlow computations are represented by directed graphs, which are composed of nodes
  • Some nodes are able to maintain and update a persistent state and/or have some sort of branching and looping structures
    • This branching/looping is modeled similarly to MSR's Naiad
  • Graphs are constructed using supported front-end languages (C++/Python as of writing)
  • A Node has zero or more inputs/outputs, and it represents an operation
  • Values of 'normal' edges (the connection between one node's output to another node's input) are tensors, n-dimensional arrays.
    • The type of each element in the tensor is inferred while the graph is being constructed, prior to execution
  • There are 'special' edges, called control dependencies: no model data is transferred on these edges, rather they indicate that the source node must finish execution before the destination node begins execution
    • Can be thought of as a baton in a relay race. Attaching a control dependency means that the next node can't begin running until the previous node 'hands off' the baton.
    • Used by client to enforce happens-before relations
    • Used in reference implementation to manage memory usage

Operations and Kernels

  • Operations have names and represent an abstract computation, such as "matrix multiply" or "add"
  • Operations can optionally require attributes. Attributes must be explicitly provided or be possible to infer prior to running the graph
    • A common use of attributes is to declare which data type the operation is being performed with (i.e. float tensors vs. int32 tensors)
  • A kernel is an implementation of an operation designed for specific types of devices, such as CPU or GPU
  • The TensorFlow library includes several built-in operations/kernels. The table below lists some of them:
CategoryExamples
Element-wise mathematical operationsAdd, Sub, Mul, Div, Exp, Log, Greater, Less, Equal
Array operationsConcat, Slice, Split, Constant, Rank, Shape, Shuffle
Matrix operationsMatMul, MatrixInverse, MatrixDeterminant
Stateful operationsVariable, Assign, AssignAdd
Neural-net building blocksSoftMax, Sigmoid, ReLU, Convolution2D, MaxPool
Checkpointing operationsSave, Restore
Queue and synchronization operationsEnqueue, Dequeue, MutexAcquire, MutexRelease
Control flow operationsMerge, Switch, Enter, Leave, NextIteration

Check out this directory in the TensorFlow repository for kernel implementations

Sessions

  • Clients interact with TensorFlow by creating a Session, which supports two main functions: Extend and Run
    • The Extend method adds additional nodes and edges to the existing dataflow model
      • Note: Extend is called automatically by TensorFlow, not directly by the client
    • Run takes as argument a set of named nodes to be computed as well as an optional set of tensors to be used in place of certain node outputs. It then uses the graph to figure all nodes required to compute the requested outputs, and performs them in a order that respects their dependencies.
  • Most TensorFlow programs setup a graph within a Session once, and then run the full graph or subsets of the graph multiple times.

Variables

  • A Variable is a handle to a persistent and mutable tensor which survives each execution of a graph
  • For ML tasks, learned parameters are usually held in TensorFlow Variables

See the official How-To to learn more about TensorFlow Variables


3 Implementation

  • There are three primary components in a TensorFlow system: the client, the master, and worker processes
    • The client uses a Session interface to communicate with the master
    • The master schedules and coordinates worker processes and relays results back to the client
    • Worker processes are responsible for maintaining access to devices such as CPU/GPU cores and execute graph nodes on their respective devices
  • There are both local and distributed implementations of TensorFlow, but only the local version has been open-sourced as of writing
    • Update as of February 2016: The initial open-source implementation of the TensorFlow distributed runtime is available on the TensorFlow GitHub repository. However, using it at this time requires building TensorFlow from source, and full API documentation is not yet available.

Devices

  • Each device has both a device type and a name
    • Names are composed of the device's type, its index in a worker process, and (when used in a distributed setting) an identification of the job and task of the worker process
    • Example device names:
      Local: /job:localhost/device:cpu:0
      Distributed: /job:worker/task:17/device:gpu:3
  • A device object manages its device's memory and executes kernels as requested

Tensors

  • Typed, multi-dimensional array
  • Memory management of tensors is handled automatically
  • Available types (from the TensorFlow documentation):
Data typePython typeDescription
DT_FLOATtf.float3232 bits floating point
DT_DOUBLEtf.float6464 bits floating point
DT_INT8tf.int88 bits signed integer
DT_INT16tf.int1616 bits signed integer
DT_INT32tf.int3232 bits signed integer
DT_INT64tf.int6464 bits signed integer
DT_UINT8tf.uint88 bits unsigned integer
DT_STRINGtf.stringVariable length byte arrays. Each element of a Tensor is a byte array
DT_BOOLtf.boolBoolean
DT_COMPLEX64tf.complex64Complex number made of two 32 bits floating points: real and imaginary parts
DT_QINT8tf.qint88 bits signed integer used in quantized Ops
DT_QINT32tf.qint3232 bits signed integer used in quantized Ops
DT_QUINT8tf.quint88 bits unsigned integer used in quantized Ops

3.1 Single-Device Execution

NOTE: To reiterate- in this context, "single device" means using a single CPU core or single GPU, not a single machine. Similarly, "multi-device" does not refer to multiple machines, but to multiple CPU cores and/or GPUs. See "3.3 Distributed Execution" for multiple machine discussion.

  • Overview of the execution of a single-worker process, single-device job:
    1. All nodes required to compute the desired output node(s) are determined
    2. Each node is given a count of dependencies that need to be completed before it can begin execution
    3. When a node's dependency count is zero, it is added to a ready queue
    4. The ready queue delegates node kernel execution to device objects
    5. When a node completes execution, the counts of all dependant nodes are decremented
    6. Repeat steps 3-5 until the desired output is computed

3.2 Multi-Device Execution

  • There are two main challenges introduced when using multiple devices:
    • Deciding which device should process each node
    • Managing communication between devices as necessary after assigning nodes

Node Placement

  • One of the main responsibilities of the TensorFlow implementation is to map computation onto available devices
  • The following is a simplified version of this mapping algorithm:
    1. A cost model is input into the algorithm
      • The cost model contains estimates of of the input/output tensors (in bytes) and estimated computation time for each node in the graph
    2. Using the cost model, the algorithm simulates an execution of the graph to make node-placement decisions as described below:
      1. Starting with the source nodes, a set of feasible devices is considered for each node ready to be executed
        • A "feasible" device is one that has a kernel implementation for the given operation
        • A node is ready for execution once its dependencies have finished running
      2. If a node has multiple feasible devices, the computation time of the node is examined with respect to placing the node on each possible device
        • This examination takes into account the execution time of the operation (given the device type), as well as the costs of possibly introducing communication between devices.
      3. The device that would finish the operation the soonest is selected as the node's device.
      4. Repeat steps 1-3 for each node in the graph execution until all nodes have been allocated to devices
    3. After the simulation, the real execution runs using the node-placement decisions made during the simulation
  • Section 4.3 will describe some extensions to help guide the placement algorithm
  • Improving the placement algorithm's development is an ongoing process as of writing

NOTE: At the moment, node placement is done by a simple_placer class which only considers explicit placement requirements provided by the user and implicit colocation constraints based on node type (see documentation comments for details

Cross-Device Communication

  • After the nodes have been placed onto their respective devices, the execution graph is split into subgraphs- one per device
  • Any edge between nodes on different devices is replaced by two new edges:
    • The outputing node will have an edge between it and a new Send node, placed within the subgraph of its device
    • The recieving node will have an edge between it and a new Receive node, placed within the subgraph of its device
    • See Figure 4 for illustration of adding Send/Receive nodes
  • The Send and Receive nodes coordinate data transfer across devices
    • This isolates cross-device communication to the implementation of the Send and Receive nodes
  • All users of a particular tensor on a particular device use a single Receive node, as opposed to having one Receive node per user per device. This minimizes data transmission between devices as well as memory allocated on the receiving device
    • This means that any given device should receive the output of any given operation only once, and should store that output only once in memory
  • This method of communication also allows individual node scheduling to be handled by the worker processes as opposed to the master
    • The Send and Receive nodes provide synchronization between worker processes and devices, which enables the master to only issue a single Run request per graph execution per worker process
    • This improves scalability and fine-grain control over node execution

3.3 Distributed Execution

  • Similar to multi-device execution. Send/Receive nodes that communicate across worker processes use mechanisms such as TCP or RDMA to transmit data from machine to machine

Fault Tolerance

  • There are two main ways failures are detected:
    • Errors between Send and Receive nodes
    • Periodic health-checks from the master process to the worker processes
  • When a failure is detected, the entire graph execution is aborted and restarted
  • Because TensorFlow Variables persist across each graph execution, there are mechanisms to save and restore their state
    • Each Variable node is connected to a Save node, which periodically write the contents of the Variable to persistent storage
    • Additionally, each Variable is connected to a Restore node, which is enabled when the graph execution is restarted. The Restore node reads state data from persistent storage and applies it to the Variable node

4 Extensions

The following subsections describe advanced features and extensions of the programming model introduced in Section 2

4.1 Gradient Computation

  • TensorFlow has built-in gradient computation
  • If a tensor, C, depends on a set of previous nodes, the gradient of C with respect to those previous nodes can be automatically computed with a built-in function, even if there are many layers in between them
  • Gradients are computed by creating additional nodes and edges in the graph as described below (see Figure 5):
    • When computing the gradient of a tensor, C, with respect to some dependency, I, TensorFlow first finds the forward path from I to C in the model graph. This is shown as the left-hand side of Figure 5
    • Once the path between the two is found, TensorFlow starts at C and moves backward toward I. For every operation on this backward path, a node is added to the graph, composing the partial gradients of each added node via the chain rule. This is shown as the right-hand side of Figure 5
      • Partial gradients are computed via a "gradient function", which corresponds to an operation on the forward path. These gradient functions are provided alongside operation kernels
      • The gradient function takes as input the partial derivatives already computed along the backwards path and, optionally, the inputs and/or outputs of the corresponding operation on the forward path
      • For example, in Figure 5, the dReLU operation (gradient function for the Rectified Linear Unit operation) takes in the previously computed partial derivatives (indicated by arrows coming from "..."), as well as the inputs from the ReLU operation (indicated by arrows coming from Add, as the outputs of Add are the inputs of ReLU). dReLU does not, however, take in the outputs from the ReLU function (indicated by the grayed out arrows coming from _ReLU). Once the partial gradients are computed, dReLU outputs them to the next gradient function, in this case dAdd
    • The partial gradients for any node outputs that are not dependencies of C are set to zero. This can occur when a node has multiple outputs, but only some of them connect to C
    • This process continues until the partial derivatives of C with respect to I are found
  • Memory management for is an active area of improvement for the automatic gradient computation algorithm.
    • Tensors early in the computation graph may be needed at the end of gradient calculation, causing them to stick around in GPU memory
    • Current options for memory management improvements include improved heuristics to determine graph execution order, recomputing tensors as opposed to storing them in memory, and utilizing host CPU memory instead of leaving long-lived tensors in GPU memory

4.2 Partial Execution

  • TensorFlow has built-in functionality to run smaller chunks of a defined execution graph, as well as the ability to insert pre-defined data as a replacement for any edge in the graph
  • Each node in the graph is given a name upon instantiation, and each output of a node is referred to by number starting from zero
    • e.g. "bar:0" is the first output of node "bar"
  • The Session's run method takes two arguments, fetches and feed_dict, which define the subgraph to be executed:
    • fetches is a list of desired operation nodes and/or output tensors to be executed. If outputs are requested, the Run function will return the calculated tensors to the client (assuming the Run function is successful)
    • feed_dict is a dictionary of optional inputs, which map named node outputs (name:port) to defined tensor values. This allows a user to effectively define the 'start' of a subgraph. Additionally, feed_dict is used to define data in Placeholder objects
  • The execution graph is then transformed based on the values fed to fetches and feed_dict
    • Each output tensor specified in fetches is connected to a fetch node, which stores and returns its value to the client once Run is successfully completed
      • Note: no fetch nodes are created for operation nodes named in fetches, as TensorFlow makes a distinction between opererations and the outputs of those operations
      • An example of an operation a user may specify as a fetches parameter to a Run command is the operation returned by tf.initialize_all_variables. The operation doesn't provide an output, but it is run in the execution graph
    • Each named node output (node:port) specified in feed_dict is replaced with a feed node, which takes on the value of the tensor mapped to the named output. Each node in the execution graph that depends on the named output will take in data from the feed node in its place
      • Because a feed node has no dependencies, it is the start of its own execution chain
  • Once the fetch and feed nodes have been inserted, TensorFlow determines which nodes need to be executed. It moves backwards, starting at the fetch nodes, and uses the dependencies of the graph to determine all nodes that must be executed in the modified graph in order to compute the desired outputs

4.3 Device Constraints

  • Users can provide partial constraints on nodes about which devices they can run on
    • Examples: only allowing a node to run on GPUs, specifying a specific worker process/task, or ensuring that it is grouped with specific nodes
    • Note: By default, GPUs are given priority for device placement if the given operation has both a CPU and a GPU kernel implementation
  • These constraints require modifications to the placement algorithm described in Section 3.2:
    • After finding the feasible set of devices for each node, TensorFlow uses the constraints to determine which nodes must be grouped on the same device
    • For each of these groups, TensorFlow computes the intersection of feasible devices for each node in the group
    • Final selection of devices is determined using similar heuristics as described in Section 3.2, ensuring fast execution while taking device restrictions, such as memory, into account

Aside: I'm not sure if this functionality is available in the open source implementation of TensorFlow yet. As of now I can only find information regarding placing nodes on specific devices. Read more about manual device placement here. Let me know if you can find the documentation for this feature! It is possible to provide partial constraints (https://www.tensorflow.org/versions/r0.11/how_tos/variables/index.html#device-placement) e.g. with tf.device("/job:ps/task:7") or with tf.device("/gpu:0").

4.4 Control Flow

  • TensorFlow incorporates a few primitive control flow operators which allow for the skipping of subgraph execution and the expression of iteration. Using these primitive operators, higher-level constructs such as if and while statements can be compiled into TensorFlow graphs
  • Each iteration in a loop is identified with a unique tag, and the present value of that execution state is represented by a frame
  • Inputs can enter an iteration whenever they are available
    • This allows multiple iterations to be executed concurrently
  • Because a loop may contain nodes placed of separate devices, managing the state of loops becomes a problem of distributed termination detection. That is, there needs to be a way to detect the termination of nodes across devices.
  • TensorFlow solves this by adding additional control nodes to the graph. These nodes manage the beginning and end of each loop iteration and determine when the loop should end.
    • At every iteration, the device that owns the control node for a loop sends out control messages to the other devices used in that loop
  • The implementation of TensorFlow also takes if statements into account when computing gradients, as it must include or omit nodes as necessary to properly step backwards through the execution graph

4.5 Input Operations

4.6 Queues

  • TensorFlow includes Queues, which allow for the enqueuing and dequeuing of tensor objects. This enables asynchronous graph execution and the handing off of data between concurrent nodes
    • Enqueue operations can block until there is space available in the queue
    • Dequeue operations can block until a minimum number of elements are placed in the queue
  • FIFOQueue is a standard 'first-in, first-out' queue
  • RandomShuffleQueue is a queue that randomly shuffles its elements periodically, which can be useful for machine learning algorithms that want to randomize training data
  • An example use of queues is to allow input data to be prefetched from the storage system while previous data is still being processed

4.7 Containers

  • A Container is the mechanism that manages long-lived (i.e. survives multiple executions of a graph) mutable state
    • Container objects hold the values of Variable objects
  • There is a default container that persists until the process terminates
  • Other named containers may be initialized
  • Containers allow for the sharing of state between otherwise unrelated computation graphs defined on separate Sessions

5 Optimizations

This section describes certain performance/resource usage optimizations used in the implementation of TensorFlow

5.1 Common Subexpression Elimination

  • Before execution, TensorFlow does a pass over the computation graph and reduces nodes with identical inputs and operations down to a single node.
  • This prevents redundant execution of the same computation

5.2 Controlling Data Communication and Memory Usage

  • Proper scheduling of operations can create dramatic improvements on data transfer and memory usage rates by reducing the amount of time intermediate data needs to be stored in memory
    • GPUs benefit from this a great deal, as they have scarce memory
    • Can also reduce competition amongst operations to use network resources
  • One particular example used in the implementation TensorFlow is the scheduling of Recieve nodes (see "3.2 Cross Device Communication")
    • Recieve nodes, without thoughtful scheduling, may execute much earlier than necessary
      • This could cause data to be stored in the memory of devices for much longer
    • TensorFlow attempts to delay the execution of Recieve nodes until just before their results are needed.

5.3 Asynchronous Kernels

  • TensorFlow supports non-blocking kernels, which are good for environments that can't afford having many active threads, and they allow nodes to wait for I/O or other events without blocking an execution thread
    • Normal, synchronous kernels complete their execution at the end of the Compute method
    • Asynchronous kernels use a slightly different interface: the Compute method is passed a lambda/callback that should be invoked at the end of the kernel's execution
  • Examples of asynchronous kernels built into TensorFlow: Recieve, Enqueue, and Dequeue

5.4 Optimized Libraries for Kernel Implementations

  • TensorFlow makes use of several existing, optimized libraries for many kernel implementations
  • Library for linear algebra:
  • Libraries for matrix multiplification on different devices:
  • Libraries for convolutional kernels for deep neural networks:

5.5 Lossy Compression

  • Because some machine learning algorithms still work well with less precise arithmetic, TensorFlow often uses lossy compression on higher-precision numbers when sending data between devices
    • This most often happens when sending data between devices on different machines, but it sometimes compresses data sent on the same machine
  • For example, a 32-bit floating point number may be converted into a (for all intents and purposes) 16-bit floating point number before being sent to another device, where it is converted into a lossy version of the original 32-bit number
    • The 16-bit floating number is really just a 32-bit floating point with 16-bits less precision after the decimal
    • The conversion back to 32-bit floating point just fills in zeros to replace the lost precision after the decimal
      • 'Filling in with zeros' is used as the 16-bit -> 32-bit conversion menthod because it is faster than using stochastic rounding, even though the latter is more mathematically correct

6 Status and Experience

  • The system, documentation, and examples for TensorFlow can be found at tensorflow.org
  • Currently, there are front-ends for Python and C++, and it's expected that more will be added over time (created both from internal Google users and the open-source community)

Advice and Lessons Learned

The following are "words of wisdom" coming from the experience of porting Google's Inception neural network into TensorFlow. After successfully doing so, the team was rewarded with a 6-fold improvement on training time over DistBelief's implementation. This advice will hopefully be useful to others as they build their own models.

  1. Build tools to gain insight into the exact number of parameters in a given model
    • This can help you catch subtle flaws in a complex network architecture, such as operations and variables instantiated incorrectly
  2. Start small and scale up
    • The TensorFlow team started by importing a small network used by DistBelief
    • Debugging a small network gave insight into the edge cases of certain operations, while having to do the same on a larger network would be nearly impossible to figure out
  3. Always ensure that the objective (loss function) matches between machine learning systems when learning is turned off
    • By setting the learning rate to zero (i.e. turning off learning), the TensorFlow team was able to identify unexpected behavior stemming from randomly initialized variables in the model
  4. Make a single machine implementation match before debugging a distributed implementation
    • This helped the TensorFlow team separate and debug differences in training performance between DistBelief and TensorFlow
    • Once the single machine implementation worked, they were able to find bugs related to race conditions and non-atomic operations in the distributed model
  5. Guard against numerical errors
    • Different numerical libraries handle non-finite floating point numbers differently
    • Checking for non-finite floating point values can allow one to detect errors in real time, guarding against numerical instability
  6. Analyze pieces of a network and understand the magnitude of numerical error
    • By running subsections of the neural network on both DistBelief and TensorFlow in parallel, the team was able to ensure that the implemented algorithms were indeed identical
    • Note that because the networks used floating point numbers, there is a given amount of numerical error that should be expected and taken into account when comparing the two systems

7 Common Programming Idioms

This section describes how TensorFlow's basic dataflow graphs can be used to speed up training neural network models on large datasets using techniques developed by the TensorFlow team.

The techniques presented here assume that the model is using stochastic gradient descent with mini-batches of around 100-1000 examples

Data Parallel Training

  • Users can parallelize the computation of the gradient, separating mini-batch elements onto different devices
    • For example, a mini-batch size of 1000 elements can be split into 10 smaller, parallel computations of 100 elements. After they all finish running, their results can be combined to achieve the same result as if the entire calculation was performed in a single, sequential computation
  • This translates to having many replicas of a computational subgraph and using a single client thread to coordinate the training loop of those replicas
  • The above approach can be modified further by making it asynchronous. Instead of having a single client thread, there are multiple clients (one for each subgraph replica), and each replica updates the trained parameters asynchronously

Model Parallel Training

  • Can run separate portions of the computation graph on different devices simultaneously on the same batch of examples
    • See Figure 8 for a visual example of sequence-to-sequence learning parallelized across three devices

Concurrent Steps for Model Computation Pipelining

  • Can also run a small set of concurrent training steps on a single device
    • Similar concept to asynchronous data parallelism, but the parallelism is only on a single device
    • This can "fill in the gaps" of device utilization, when parallel execution on all devices might not make full use of computation cycles
    • See Figure 9 for a visual example

8 Performance

Stay tuned for future versions of the TensorFlow white paper, which will include performance evaluations for single machine and distributed implementations of TensorFlow


9 Tools

This section discusses additional tools, developed by the TensorFlow team, that work alongside the graph modeling and execution features described above.

9.1 TensorBoard: Visualization of Graph Structures and Summary Statistics

TensorBoard was designed to help users visualize the structure of their graphs, as well as understand the behavior of their models

Visualization of Computation Graphs

  • TensorBoard includes features that allow for more digestible visualizations
    • TensorBoard can take models with tens of thousands of nodes and collapse them into high-level blocks, highlighting subgraphs with identical structures
    • It separates out "high-degree" nodes from the rest of the graph to further reduce visual clutter
      • Note: I haven't found a proper definition of "high-degree" nodes in TensorFlow. The paper says they "often serve book-keeping functions". I imagine they are operations similar to tf.initialize_all_variables, which are necessary to run the execution graph in TensorFlow but aren't really part of the mathematical model
  • The TensorBoard visualization is interactive
    • Users can pan, zoom, and expand the collapsed blocks in order to get a lower-level view of the model

Visualization of Summary Data

  • TensorBoard supports Summary operations that can be inserted into the execution graph to examine and track various values over time
    • Scalar summaries: e.g. average loss function over set of examples, total execution time of the model
    • Histogram based summaries: e.g. distribution of parameter values in a neural network layer
    • Image-based summaries: e.g. visualization of filter weights learned in a convolutional neural network
  • Typically, Summary nodes are setup to monitor specific interesting values and are executed periodically during normal graph execution
  • After Summary nodes are executed, the client writes the summary data to a log file. TensorBoard monitors this log to display summary information over time
    • The "time" used in the visualization of summary data can be: wall-clock time; absoute time; or "steps", the number of graph executions that have occurred since the first execution in the TensorFlow program

9.2 Performance tracing

  • A tool called EEG is used to examine fine-grained information about the ordering/performance of TensorFlow graphs
    • Works for both single machine and distributed implementations of TensorFlow
    • Note: EEG is not open sourced as of writing
  • Helps users understand bottlenecks in a TensorFlow program

The following is a brief overview of what EEG does under the hood

  • Traces are collected via a number of sources including:
  • The trace logs enable EEG to recreate the execution of the graph with microsecond-level precision. Events from the traces, within a timerange, are extracted and visualized according to the resolution of the client's user interface
    • The user can zoom in on portions of the graph, and EEG will update the visualization to show finer details
  • Any significant delays due to communication, synchronization, or direct memory access issues are identified and highlighted in the UI

Please see pages 14 and 15 of the November 2015 white paper to see a specific example of EEG visualization along with descriptions of the current UI


10 Future Work

This section lists areas of improvement and extension for TensorFlow identified for consideration by the TensorFlow team

Extensions:

  • A "function" mechanism, where users can specify subgraphs of TensorFlow execution to be reusable
    • In the TensorFlow team's design of this mechanism (not yet implemented), these kinds of functions can be reusable across front-end languages. That is, a user could define a subgraph function in Python and use it in C++ without redefining it

Improvements:

  • Continue work on a just-in-time compiler that can take in an execution subgraph and output an optimized routine for that subgraph
    • Such a compiler might also take in some runtime profiling information as input
    • The compiler should be able to perform loop fusion, block and tile for locality, and specialize routines for particular shapes and sizes of tensors, along with other optimizations
  • Improving the node placement and node scheduling algorithms
    • Instead of using man-made heuristics, have the system learn how to make good placement/scheduling decisions

11 Related Work

Open source, single machine systems with portions of similar functionality

Systems designed primarily for neural networks:

Systems that support symbolic differentiation:

Systems with a core written in C++:

Comparisons with DistBelief and Project Adam

Similarities shared with DistBelief and Project Adam:

  • They allow computations to be distributed across many devices on many machines
  • Users can specify models using relatively high-level descriptions

Differences between TensorFlow and DistBelief/Project Adam:

  • TensorFlow's dataflow graph model is more flexible and better at expressing more types of machine learning models and algorithms
  • Allows for the expression of stateful Parameter nodes as variables
  • Variable update operations are simply additional nodes in the graph
    • Both DistBelief and Project Adam have subsystems dedicated to handling parameters

Comparison with Hallide image processing system

  • Both use similar intermediate data representation to their dataflow graphs
  • However, Hallide has additional high level knowledge of its operations, which it uses to create optimized code that combine multiple operations
  • Hallide is single-machine only
    • The TensorFlow team hopes to extend TensorFlow's capabilities to incorporate Hallide's techniques into a distrubted setting

Related distributed dataflow graph systems

Systems that represent complex workflows as dataflow graphs

Systems that support data-dependent control flow

  • CIEL
    • Iteration is implemented as a directed acyclic graph (DAG) that dynamically unfolds
  • Naiad
    • Iteration is implemented as a static graph with cycles

Systems optimized for accessing the same data repeatedly

  • Spark
    • Uses resilient distributed datasets (RDDs) to cache outputs of earlier operations, in case they are needed again

Systems that execute dataflow graphs across heterogenous devices, including GPUs

Features that TensorFlow incorporates from the above distributed systems

Feature implementations that are most similar to TensorFlow are listed after the feature

  • The dataflow scheduler (i.e. the module that selects the next node to run)
    • CIEL, Dryad, Flume, Spark
  • Distributed architecture: using a single, optimized dataflow graph and caching information about the graph to lower coordination overhead
    • Naiad
  • Works best when there is enough RAM in the cluster to hold all working variables/computations
    • Naiad, Spark
  • Iteration: multiple replicas of the same graph can execute at once while sharing the same set of persistent variables. Replicas can share the variables asynchronously or use mechanisms such as queues to access them synchronously
    • Hybrid of many approaches
  • Iteration: a node only executes when all of its dependencies have completed
    • CIEL
  • Graph iteration is represented as a static, cyclic graph

12 Conclusions

  • TensorFlow is a flexible dataflow graph programming model
  • There are both single machine and distributed implementations of TensorFlow
  • TensorFlow was developed using prior experience in Google, as well as methods used in other previous systems
  • An open source implementation of TensorFlow is available
    • As of writing, only a single-device implementation has been released from Google

Figures

Figure 1: Example TensorFlow code fragment

    import tensorflow as tf
    
    # 100-d vector, init to zeros
    b = tf.Variable (tf.zeros([100])
    
    # 784x100 matrix with random values
    W = tf.Variable(tf.random_uniform([784,100], -1, 1))
    
    # Placeholder for input
    x = tf.placehoder(name="x")
    
    # Rectified linear unit of (W*x +b)
    relu = tf.nn.relu(tf.matmul(W, x) + b)
    
    # Cost computed as a function of relu
    C = [...]
    
    # Instantiate a Session
    s = tf.Session()
    
    for step in xrange(0, 10):
        # Create a 100-d vector for input
        input = ...construct 100-D input array ...
        
        # Find the cost, using the constructed vector as the placeholder input
        result = s.run(C, feed_dict = {x: input})
        print step, result

Figure 2: Corresponding computation graph for Figure 1

Figure 3: Single machine (left) and distributed system (right) structure

Figure 4: Before and after insertion of Send/Recieve nodes

Figure 5: Gradients computed for graph in figure 2

Figure 6: Before and after graph transformation for partial execution

Figure 7: Synchronous and asynchronous data parallel training

Figure 8: Model parallel training

Figure 9: Concurrent steps


Author: samjabrahams
Source code: https://github.com/samjabrahams/tensorflow-white-paper-notes
License: MIT license

#tensorflow 

What is GEEK

Buddha Community

Annotated Notes and Summaries Of The TensorFlow White Paper
james right

james right

1606540809

ICO White Paper Writing Services, Crypto White Paper Development

An initial coin offering (ICO) is a vehicle that can materialize the vision of any business. It reaches every prospective investor out there. However, this instrument has to be made that effective and it is possible only through a white paper. With this document, your business gets the power to get established with flying colors. If you have reliable white paper writing services, you can get things done very easily. It is very easy for you to come out with a perfect pitch and allure the target audience to make a move that gives you benefits.

ICO White Paper

How can I get a befitting white paper for my business project?

To do that, you need to work with writers who are familiar with the working of your domain. Also, they need to have great writing skills and content creation skills so your proposal could get all the attention it requires. When you introduce your idea with this document, everything gets very easy and there is absolutely no need to find add-ons. It works as a panacea for every single entrepreneur and gives them the ability to do something great. Once you have a perfect whitepaper, you don’t need extreme marketing moves as well.

That’s correct, a perfect whitepaper can give you all the eyeballs that you need, it helps you build strong traction among the investors. The criteria, however, is to stick with the basics of your industry and focus on giving value to the readers. When you do that, it is easier for you to delve deeper into many areas and to amaze everyone. The facts written in this doc have to be very precise and you have to sure about absolutely everything. As soon as you have a particular way to go, you don’t have to waste time on one topic.

You can switch from one niche to another and involve examples that could explain a whole concept very clearly. When you are ready with such a perfect whitepaper, you get to make things fluid for the traders. No matter how you want to execute your business, you get a more effective structure that helps in optimizing the entire mechanism. It is quite possible that you belong to a domain that is very flexible and ductile. Even with these conditions, you need to have precise about the changes you want to bring. This approach keeps you ahead in various ways and it gives you time to strategize too.

What is the right way to approach ICO whitepaper development?

When it comes to drafting a document that explains your startup in an impeccable manner, you have to be very choosy. Whether you want to come up with a certain plan or not, you get to bring the changes in your plan. As soon as you are clear about the project, you must start thinking about the content. It is very important that you remain one step ahead on the different fronts so there is absolutely no need for a backup structure and you can begin the process.

At the time of finalizing the prospect of your enterprise, it is very important that things get more descriptive with time. Also, you get to think of some additional measures that could expedite the creation of such tactics. Whether you understand the significance of this tool or not, you cannot simply underestimate it entirely. The focal point of your company gets clearer to every single entity and you get to work on things with better control. Also, you get to protect the entire thing with a foolproof system that covers all the risks with absolutely no repercussions.

Once you have made up your mood about this solution and ready to hire a writer, you must come up with a reliable team. That is important because you have to share many ideas and insights about your operations with them. You have to ensure a good scope of sharing ideas so anyone could add value to your project. It is vital that you keep every single member stuck to their goals, this way, you get a more appropriate response from your audience. Besides that, you get a more protective layer of information that keeps all the data secure sans any loopholes.

What should I keep in mind while choosing ICO whitepaper writers?

For choosing the most efficient writer for your project, you need to have a more planned approach. Also, you have to come up with something that could help in the ceaseless growth of your company. The pain points of your customers have to be understood, so you don’t mistake in any phase of making the whitepaper. Whether you like it nor not, you can always give a more reasonable answer to the questions asked on the forums. The open-source framework gives you better fixes and it also keeps you ahead in terms of your objectives.

Just by selecting the right people, it is possible for you to manage the expansive work at every stage. The creation of such aspects gives you insights about everything, it also helps you in giving a proper shape to the proximal attitudes. By optimizing every attribute, you get to make all the factors sublime and the readers get impressed by your efforts. It does not matter how you minimize the cost and increase the effect, you get prolific results. It makes you a better planner so you could pave the way to permanent success.

The selection of writers has to start with the thorough checking of profiles and every time you do it, you extend the chances of success. Regardless of the size and nature of your startup, you get to check a large number of solutions in a very minimum duration. Through this elaborate document, it is possible for you to induct pioneering solutions that protect your enterprise against any risk or volatility. The whole point of appointing writers is to ensure that you present your proposal in an unmatched matter. By doing it strategically, you make certain that there are no flaws.

Conclusion

With Coin Developer India, it is possible for any enterprise to come up with revolutionary ideas every time it is going to do something important. Our experts make certain that you can do something really exceptional to obtain the attention of the investors. When it comes to making an ICO successful, our entire team collaborates to give you the best results. Our writers come from all walks of life and they realize the power of content. We help your startup get nothing but the best so it could be on the frontline of its niche.

The solutions given by us are very direct in nature, they always strike the chord with people you want to affect. At the time of making this document, we give a proper treatment that makes your enterprise a strong contender on every front. No matter what you want to achieve, we make it possible through a broad spectrum of services. We make whitepaper so powerful that investors cannot overlook and your idea gets materialized in the best possible way. Our writers give your business what it truly deserves, we perpetuate your business’ position.
Want your business to be successful? Make it possible with us!

Get matchless ICO whitepaper writing services and make your project an absolute success. The expert writers of Coin Developer India make this possible easily.

Contact Details:

Call and Whatsapp : +91-7014607737
Email: cryptodeveloperjaipur@gmail.com
Telegram : @vipinshar

#cryptocurrency white paper writing #white paper writing #cryptocurrency white paper #ico white paper #white paper development #hire white paper writer

Akshara Singh

Akshara Singh

1616751555

White Label Crypto Exchange | Cryptocurrency Exchange Software Solutions

Hai folks, I hope you all are doing good.

Recently any entrepreneurs interested in starting their bitcoin exchange business, because of the last year 2020, covid -19 totally reshaped the business industry. At that time only businesses were successfully run without any interpretation like “Crypto Exchange”. So many entrepreneurs & startups were interested to launch their own exchange platform. But they all have one question “How much does it cost to build a crypto exchange with very good security?”.

Generally, there are 2 possible way to launch your exchange platform

  • Developed from scratch
  • Launch your exchange by using white label crypto exchange software
    In those 2 methods, white label crypto exchange software is a cost-effective & secured way to launch your exchange platform. Let me explain.

If you develop your exchange from scratch, you might face some technical & non-technical issues like crypto wallet integration, partnering with the bank, no beta test, high-cost ($ 50k or more than that), need more time, & a lot of security risk [Sometimes dark world peoples may hack your exchange platform] so most of the crypto enthusiast doesn’t prefer this method.

If you go with white label crypto exchange software, you have a lot of benefits.

Benefits of White Label Crypto Exchange:

  • Customizing options - They will help you to build your cryptocurrency exchange platform based on your business needs.
  • Monitor and Engage - You can easily monitor the work process
  • **Beta module **- You can test your exchange in the Beta module
  • Cost-effective - White label crypto exchange cost of development will be around $ 8k - 15k (It may vary based on the requirements.)
  • Time-Period - You can launch your exchange within 1 week
  • Fully secured, bug-free & adv trading features.

Best Trading & Security Features:

  • Multi-language
  • IEO launchpad,
  • Crypto wallet,
  • Instant buying/selling cryptocurrencies
  • Staking and lending
  • Live trading charts with margin trading API and futures 125x trading
  • Stop limit order and stop-loss orders
  • Limit maker orders
  • Multi-cryptocurrencies Support
  • Referral options
  • Admin panel
  • Perpetual swaps
  • Advanced UI/UX
  • Security Features [HTTPs authentication, Biometric authentication, Jail login, Data encryption, Two-factor authentication, SQL injection prevention, Anti Denial of Service(DoS), Cross-Site Request Forgery(CSRF) protection, Server-Side Request Forgery(SSRF) protection, Escrow services, Anti Distributed Denial of Service]

It’s very impressive right!!! But all white label crypto exchange software providers are not offering all these services. Only noted crypto exchange software providers like Coinsclone, offering all these services at a budget-friendly price, to know the exact price check here >>> cost to build a cryptocurrency exchange

Or else, you can speak with their business experts directly. Get in touch via,

Whatsapp/Telegram: +919500575285

Mail: hello@coinsclone.com

Skype: live:hello_20214

For a Free Live Demo, visit @ White Label Crypto Exchange Software

#white label crypto exchange software #white label bitcoin exchange software #white label bitcoin exchange #white label crypto exchange #white label cryptocurrency exchange software

Emilie  Okumu

Emilie Okumu

1658518440

Annotated Notes and Summaries Of The TensorFlow White Paper

TensorFlow White Paper Notes

Features

  • Notes broken down section by section, as well as subsection by subsection
  • Relevant links to documentation, resources, and references throughout
  • SVG versions of figures/graphs
  • So many bullet points!

To-do list

  • Create and utilize anchor tags throughout notes for self-navigating

White Paper available at this link


TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Abstract


1 Introduction

  • Google Brain started in 2011, and DistBelief was its first-generation scalable, distributed machine learning system
  • DistBelief was used for a large number of research and commercial tasks
  • TensorFlow, Google's second-generation machine learning system, was designed from lessons learned in the process of engineering and using DistBelief
  • The TensorFlow API is used to describe a dataflow-like model, and the implementation then maps those models onto the underlying machine hardware
  • This allows users to have a single system that runs on a broad spectrum of machines, reducing overhead caused from rewriting code for different hardware
  • Focus of development was to maintain flexibility for research purposes while attaining enough performance to be used in production
  • Can express various types of parallelism by replicating the dataflow model across multiple machines and running them in parallel
    • Some functions within TensorFlow allow for less consistency in parallelism if desired
      • Larger, multiple machine uses of TensorFlow can take advantage of less-strict synchronization requirements
  • TensorFlow is more flexible, faster, and supports more machine learning models than DistBelief

2 Programming Model and Basic Concepts

  • TensorFlow computations are represented by directed graphs, which are composed of nodes
  • Some nodes are able to maintain and update a persistent state and/or have some sort of branching and looping structures
    • This branching/looping is modeled similarly to MSR's Naiad
  • Graphs are constructed using supported front-end languages (C++/Python as of writing)
  • A Node has zero or more inputs/outputs, and it represents an operation
  • Values of 'normal' edges (the connection between one node's output to another node's input) are tensors, n-dimensional arrays.
    • The type of each element in the tensor is inferred while the graph is being constructed, prior to execution
  • There are 'special' edges, called control dependencies: no model data is transferred on these edges, rather they indicate that the source node must finish execution before the destination node begins execution
    • Can be thought of as a baton in a relay race. Attaching a control dependency means that the next node can't begin running until the previous node 'hands off' the baton.
    • Used by client to enforce happens-before relations
    • Used in reference implementation to manage memory usage

Operations and Kernels

  • Operations have names and represent an abstract computation, such as "matrix multiply" or "add"
  • Operations can optionally require attributes. Attributes must be explicitly provided or be possible to infer prior to running the graph
    • A common use of attributes is to declare which data type the operation is being performed with (i.e. float tensors vs. int32 tensors)
  • A kernel is an implementation of an operation designed for specific types of devices, such as CPU or GPU
  • The TensorFlow library includes several built-in operations/kernels. The table below lists some of them:
CategoryExamples
Element-wise mathematical operationsAdd, Sub, Mul, Div, Exp, Log, Greater, Less, Equal
Array operationsConcat, Slice, Split, Constant, Rank, Shape, Shuffle
Matrix operationsMatMul, MatrixInverse, MatrixDeterminant
Stateful operationsVariable, Assign, AssignAdd
Neural-net building blocksSoftMax, Sigmoid, ReLU, Convolution2D, MaxPool
Checkpointing operationsSave, Restore
Queue and synchronization operationsEnqueue, Dequeue, MutexAcquire, MutexRelease
Control flow operationsMerge, Switch, Enter, Leave, NextIteration

Check out this directory in the TensorFlow repository for kernel implementations

Sessions

  • Clients interact with TensorFlow by creating a Session, which supports two main functions: Extend and Run
    • The Extend method adds additional nodes and edges to the existing dataflow model
      • Note: Extend is called automatically by TensorFlow, not directly by the client
    • Run takes as argument a set of named nodes to be computed as well as an optional set of tensors to be used in place of certain node outputs. It then uses the graph to figure all nodes required to compute the requested outputs, and performs them in a order that respects their dependencies.
  • Most TensorFlow programs setup a graph within a Session once, and then run the full graph or subsets of the graph multiple times.

Variables

  • A Variable is a handle to a persistent and mutable tensor which survives each execution of a graph
  • For ML tasks, learned parameters are usually held in TensorFlow Variables

See the official How-To to learn more about TensorFlow Variables


3 Implementation

  • There are three primary components in a TensorFlow system: the client, the master, and worker processes
    • The client uses a Session interface to communicate with the master
    • The master schedules and coordinates worker processes and relays results back to the client
    • Worker processes are responsible for maintaining access to devices such as CPU/GPU cores and execute graph nodes on their respective devices
  • There are both local and distributed implementations of TensorFlow, but only the local version has been open-sourced as of writing
    • Update as of February 2016: The initial open-source implementation of the TensorFlow distributed runtime is available on the TensorFlow GitHub repository. However, using it at this time requires building TensorFlow from source, and full API documentation is not yet available.

Devices

  • Each device has both a device type and a name
    • Names are composed of the device's type, its index in a worker process, and (when used in a distributed setting) an identification of the job and task of the worker process
    • Example device names:
      Local: /job:localhost/device:cpu:0
      Distributed: /job:worker/task:17/device:gpu:3
  • A device object manages its device's memory and executes kernels as requested

Tensors

  • Typed, multi-dimensional array
  • Memory management of tensors is handled automatically
  • Available types (from the TensorFlow documentation):
Data typePython typeDescription
DT_FLOATtf.float3232 bits floating point
DT_DOUBLEtf.float6464 bits floating point
DT_INT8tf.int88 bits signed integer
DT_INT16tf.int1616 bits signed integer
DT_INT32tf.int3232 bits signed integer
DT_INT64tf.int6464 bits signed integer
DT_UINT8tf.uint88 bits unsigned integer
DT_STRINGtf.stringVariable length byte arrays. Each element of a Tensor is a byte array
DT_BOOLtf.boolBoolean
DT_COMPLEX64tf.complex64Complex number made of two 32 bits floating points: real and imaginary parts
DT_QINT8tf.qint88 bits signed integer used in quantized Ops
DT_QINT32tf.qint3232 bits signed integer used in quantized Ops
DT_QUINT8tf.quint88 bits unsigned integer used in quantized Ops

3.1 Single-Device Execution

NOTE: To reiterate- in this context, "single device" means using a single CPU core or single GPU, not a single machine. Similarly, "multi-device" does not refer to multiple machines, but to multiple CPU cores and/or GPUs. See "3.3 Distributed Execution" for multiple machine discussion.

  • Overview of the execution of a single-worker process, single-device job:
    1. All nodes required to compute the desired output node(s) are determined
    2. Each node is given a count of dependencies that need to be completed before it can begin execution
    3. When a node's dependency count is zero, it is added to a ready queue
    4. The ready queue delegates node kernel execution to device objects
    5. When a node completes execution, the counts of all dependant nodes are decremented
    6. Repeat steps 3-5 until the desired output is computed

3.2 Multi-Device Execution

  • There are two main challenges introduced when using multiple devices:
    • Deciding which device should process each node
    • Managing communication between devices as necessary after assigning nodes

Node Placement

  • One of the main responsibilities of the TensorFlow implementation is to map computation onto available devices
  • The following is a simplified version of this mapping algorithm:
    1. A cost model is input into the algorithm
      • The cost model contains estimates of of the input/output tensors (in bytes) and estimated computation time for each node in the graph
    2. Using the cost model, the algorithm simulates an execution of the graph to make node-placement decisions as described below:
      1. Starting with the source nodes, a set of feasible devices is considered for each node ready to be executed
        • A "feasible" device is one that has a kernel implementation for the given operation
        • A node is ready for execution once its dependencies have finished running
      2. If a node has multiple feasible devices, the computation time of the node is examined with respect to placing the node on each possible device
        • This examination takes into account the execution time of the operation (given the device type), as well as the costs of possibly introducing communication between devices.
      3. The device that would finish the operation the soonest is selected as the node's device.
      4. Repeat steps 1-3 for each node in the graph execution until all nodes have been allocated to devices
    3. After the simulation, the real execution runs using the node-placement decisions made during the simulation
  • Section 4.3 will describe some extensions to help guide the placement algorithm
  • Improving the placement algorithm's development is an ongoing process as of writing

NOTE: At the moment, node placement is done by a simple_placer class which only considers explicit placement requirements provided by the user and implicit colocation constraints based on node type (see documentation comments for details

Cross-Device Communication

  • After the nodes have been placed onto their respective devices, the execution graph is split into subgraphs- one per device
  • Any edge between nodes on different devices is replaced by two new edges:
    • The outputing node will have an edge between it and a new Send node, placed within the subgraph of its device
    • The recieving node will have an edge between it and a new Receive node, placed within the subgraph of its device
    • See Figure 4 for illustration of adding Send/Receive nodes
  • The Send and Receive nodes coordinate data transfer across devices
    • This isolates cross-device communication to the implementation of the Send and Receive nodes
  • All users of a particular tensor on a particular device use a single Receive node, as opposed to having one Receive node per user per device. This minimizes data transmission between devices as well as memory allocated on the receiving device
    • This means that any given device should receive the output of any given operation only once, and should store that output only once in memory
  • This method of communication also allows individual node scheduling to be handled by the worker processes as opposed to the master
    • The Send and Receive nodes provide synchronization between worker processes and devices, which enables the master to only issue a single Run request per graph execution per worker process
    • This improves scalability and fine-grain control over node execution

3.3 Distributed Execution

  • Similar to multi-device execution. Send/Receive nodes that communicate across worker processes use mechanisms such as TCP or RDMA to transmit data from machine to machine

Fault Tolerance

  • There are two main ways failures are detected:
    • Errors between Send and Receive nodes
    • Periodic health-checks from the master process to the worker processes
  • When a failure is detected, the entire graph execution is aborted and restarted
  • Because TensorFlow Variables persist across each graph execution, there are mechanisms to save and restore their state
    • Each Variable node is connected to a Save node, which periodically write the contents of the Variable to persistent storage
    • Additionally, each Variable is connected to a Restore node, which is enabled when the graph execution is restarted. The Restore node reads state data from persistent storage and applies it to the Variable node

4 Extensions

The following subsections describe advanced features and extensions of the programming model introduced in Section 2

4.1 Gradient Computation

  • TensorFlow has built-in gradient computation
  • If a tensor, C, depends on a set of previous nodes, the gradient of C with respect to those previous nodes can be automatically computed with a built-in function, even if there are many layers in between them
  • Gradients are computed by creating additional nodes and edges in the graph as described below (see Figure 5):
    • When computing the gradient of a tensor, C, with respect to some dependency, I, TensorFlow first finds the forward path from I to C in the model graph. This is shown as the left-hand side of Figure 5
    • Once the path between the two is found, TensorFlow starts at C and moves backward toward I. For every operation on this backward path, a node is added to the graph, composing the partial gradients of each added node via the chain rule. This is shown as the right-hand side of Figure 5
      • Partial gradients are computed via a "gradient function", which corresponds to an operation on the forward path. These gradient functions are provided alongside operation kernels
      • The gradient function takes as input the partial derivatives already computed along the backwards path and, optionally, the inputs and/or outputs of the corresponding operation on the forward path
      • For example, in Figure 5, the dReLU operation (gradient function for the Rectified Linear Unit operation) takes in the previously computed partial derivatives (indicated by arrows coming from "..."), as well as the inputs from the ReLU operation (indicated by arrows coming from Add, as the outputs of Add are the inputs of ReLU). dReLU does not, however, take in the outputs from the ReLU function (indicated by the grayed out arrows coming from _ReLU). Once the partial gradients are computed, dReLU outputs them to the next gradient function, in this case dAdd
    • The partial gradients for any node outputs that are not dependencies of C are set to zero. This can occur when a node has multiple outputs, but only some of them connect to C
    • This process continues until the partial derivatives of C with respect to I are found
  • Memory management for is an active area of improvement for the automatic gradient computation algorithm.
    • Tensors early in the computation graph may be needed at the end of gradient calculation, causing them to stick around in GPU memory
    • Current options for memory management improvements include improved heuristics to determine graph execution order, recomputing tensors as opposed to storing them in memory, and utilizing host CPU memory instead of leaving long-lived tensors in GPU memory

4.2 Partial Execution

  • TensorFlow has built-in functionality to run smaller chunks of a defined execution graph, as well as the ability to insert pre-defined data as a replacement for any edge in the graph
  • Each node in the graph is given a name upon instantiation, and each output of a node is referred to by number starting from zero
    • e.g. "bar:0" is the first output of node "bar"
  • The Session's run method takes two arguments, fetches and feed_dict, which define the subgraph to be executed:
    • fetches is a list of desired operation nodes and/or output tensors to be executed. If outputs are requested, the Run function will return the calculated tensors to the client (assuming the Run function is successful)
    • feed_dict is a dictionary of optional inputs, which map named node outputs (name:port) to defined tensor values. This allows a user to effectively define the 'start' of a subgraph. Additionally, feed_dict is used to define data in Placeholder objects
  • The execution graph is then transformed based on the values fed to fetches and feed_dict
    • Each output tensor specified in fetches is connected to a fetch node, which stores and returns its value to the client once Run is successfully completed
      • Note: no fetch nodes are created for operation nodes named in fetches, as TensorFlow makes a distinction between opererations and the outputs of those operations
      • An example of an operation a user may specify as a fetches parameter to a Run command is the operation returned by tf.initialize_all_variables. The operation doesn't provide an output, but it is run in the execution graph
    • Each named node output (node:port) specified in feed_dict is replaced with a feed node, which takes on the value of the tensor mapped to the named output. Each node in the execution graph that depends on the named output will take in data from the feed node in its place
      • Because a feed node has no dependencies, it is the start of its own execution chain
  • Once the fetch and feed nodes have been inserted, TensorFlow determines which nodes need to be executed. It moves backwards, starting at the fetch nodes, and uses the dependencies of the graph to determine all nodes that must be executed in the modified graph in order to compute the desired outputs

4.3 Device Constraints

  • Users can provide partial constraints on nodes about which devices they can run on
    • Examples: only allowing a node to run on GPUs, specifying a specific worker process/task, or ensuring that it is grouped with specific nodes
    • Note: By default, GPUs are given priority for device placement if the given operation has both a CPU and a GPU kernel implementation
  • These constraints require modifications to the placement algorithm described in Section 3.2:
    • After finding the feasible set of devices for each node, TensorFlow uses the constraints to determine which nodes must be grouped on the same device
    • For each of these groups, TensorFlow computes the intersection of feasible devices for each node in the group
    • Final selection of devices is determined using similar heuristics as described in Section 3.2, ensuring fast execution while taking device restrictions, such as memory, into account

Aside: I'm not sure if this functionality is available in the open source implementation of TensorFlow yet. As of now I can only find information regarding placing nodes on specific devices. Read more about manual device placement here. Let me know if you can find the documentation for this feature! It is possible to provide partial constraints (https://www.tensorflow.org/versions/r0.11/how_tos/variables/index.html#device-placement) e.g. with tf.device("/job:ps/task:7") or with tf.device("/gpu:0").

4.4 Control Flow

  • TensorFlow incorporates a few primitive control flow operators which allow for the skipping of subgraph execution and the expression of iteration. Using these primitive operators, higher-level constructs such as if and while statements can be compiled into TensorFlow graphs
  • Each iteration in a loop is identified with a unique tag, and the present value of that execution state is represented by a frame
  • Inputs can enter an iteration whenever they are available
    • This allows multiple iterations to be executed concurrently
  • Because a loop may contain nodes placed of separate devices, managing the state of loops becomes a problem of distributed termination detection. That is, there needs to be a way to detect the termination of nodes across devices.
  • TensorFlow solves this by adding additional control nodes to the graph. These nodes manage the beginning and end of each loop iteration and determine when the loop should end.
    • At every iteration, the device that owns the control node for a loop sends out control messages to the other devices used in that loop
  • The implementation of TensorFlow also takes if statements into account when computing gradients, as it must include or omit nodes as necessary to properly step backwards through the execution graph

4.5 Input Operations

4.6 Queues

  • TensorFlow includes Queues, which allow for the enqueuing and dequeuing of tensor objects. This enables asynchronous graph execution and the handing off of data between concurrent nodes
    • Enqueue operations can block until there is space available in the queue
    • Dequeue operations can block until a minimum number of elements are placed in the queue
  • FIFOQueue is a standard 'first-in, first-out' queue
  • RandomShuffleQueue is a queue that randomly shuffles its elements periodically, which can be useful for machine learning algorithms that want to randomize training data
  • An example use of queues is to allow input data to be prefetched from the storage system while previous data is still being processed

4.7 Containers

  • A Container is the mechanism that manages long-lived (i.e. survives multiple executions of a graph) mutable state
    • Container objects hold the values of Variable objects
  • There is a default container that persists until the process terminates
  • Other named containers may be initialized
  • Containers allow for the sharing of state between otherwise unrelated computation graphs defined on separate Sessions

5 Optimizations

This section describes certain performance/resource usage optimizations used in the implementation of TensorFlow

5.1 Common Subexpression Elimination

  • Before execution, TensorFlow does a pass over the computation graph and reduces nodes with identical inputs and operations down to a single node.
  • This prevents redundant execution of the same computation

5.2 Controlling Data Communication and Memory Usage

  • Proper scheduling of operations can create dramatic improvements on data transfer and memory usage rates by reducing the amount of time intermediate data needs to be stored in memory
    • GPUs benefit from this a great deal, as they have scarce memory
    • Can also reduce competition amongst operations to use network resources
  • One particular example used in the implementation TensorFlow is the scheduling of Recieve nodes (see "3.2 Cross Device Communication")
    • Recieve nodes, without thoughtful scheduling, may execute much earlier than necessary
      • This could cause data to be stored in the memory of devices for much longer
    • TensorFlow attempts to delay the execution of Recieve nodes until just before their results are needed.

5.3 Asynchronous Kernels

  • TensorFlow supports non-blocking kernels, which are good for environments that can't afford having many active threads, and they allow nodes to wait for I/O or other events without blocking an execution thread
    • Normal, synchronous kernels complete their execution at the end of the Compute method
    • Asynchronous kernels use a slightly different interface: the Compute method is passed a lambda/callback that should be invoked at the end of the kernel's execution
  • Examples of asynchronous kernels built into TensorFlow: Recieve, Enqueue, and Dequeue

5.4 Optimized Libraries for Kernel Implementations

  • TensorFlow makes use of several existing, optimized libraries for many kernel implementations
  • Library for linear algebra:
  • Libraries for matrix multiplification on different devices:
  • Libraries for convolutional kernels for deep neural networks:

5.5 Lossy Compression

  • Because some machine learning algorithms still work well with less precise arithmetic, TensorFlow often uses lossy compression on higher-precision numbers when sending data between devices
    • This most often happens when sending data between devices on different machines, but it sometimes compresses data sent on the same machine
  • For example, a 32-bit floating point number may be converted into a (for all intents and purposes) 16-bit floating point number before being sent to another device, where it is converted into a lossy version of the original 32-bit number
    • The 16-bit floating number is really just a 32-bit floating point with 16-bits less precision after the decimal
    • The conversion back to 32-bit floating point just fills in zeros to replace the lost precision after the decimal
      • 'Filling in with zeros' is used as the 16-bit -> 32-bit conversion menthod because it is faster than using stochastic rounding, even though the latter is more mathematically correct

6 Status and Experience

  • The system, documentation, and examples for TensorFlow can be found at tensorflow.org
  • Currently, there are front-ends for Python and C++, and it's expected that more will be added over time (created both from internal Google users and the open-source community)

Advice and Lessons Learned

The following are "words of wisdom" coming from the experience of porting Google's Inception neural network into TensorFlow. After successfully doing so, the team was rewarded with a 6-fold improvement on training time over DistBelief's implementation. This advice will hopefully be useful to others as they build their own models.

  1. Build tools to gain insight into the exact number of parameters in a given model
    • This can help you catch subtle flaws in a complex network architecture, such as operations and variables instantiated incorrectly
  2. Start small and scale up
    • The TensorFlow team started by importing a small network used by DistBelief
    • Debugging a small network gave insight into the edge cases of certain operations, while having to do the same on a larger network would be nearly impossible to figure out
  3. Always ensure that the objective (loss function) matches between machine learning systems when learning is turned off
    • By setting the learning rate to zero (i.e. turning off learning), the TensorFlow team was able to identify unexpected behavior stemming from randomly initialized variables in the model
  4. Make a single machine implementation match before debugging a distributed implementation
    • This helped the TensorFlow team separate and debug differences in training performance between DistBelief and TensorFlow
    • Once the single machine implementation worked, they were able to find bugs related to race conditions and non-atomic operations in the distributed model
  5. Guard against numerical errors
    • Different numerical libraries handle non-finite floating point numbers differently
    • Checking for non-finite floating point values can allow one to detect errors in real time, guarding against numerical instability
  6. Analyze pieces of a network and understand the magnitude of numerical error
    • By running subsections of the neural network on both DistBelief and TensorFlow in parallel, the team was able to ensure that the implemented algorithms were indeed identical
    • Note that because the networks used floating point numbers, there is a given amount of numerical error that should be expected and taken into account when comparing the two systems

7 Common Programming Idioms

This section describes how TensorFlow's basic dataflow graphs can be used to speed up training neural network models on large datasets using techniques developed by the TensorFlow team.

The techniques presented here assume that the model is using stochastic gradient descent with mini-batches of around 100-1000 examples

Data Parallel Training

  • Users can parallelize the computation of the gradient, separating mini-batch elements onto different devices
    • For example, a mini-batch size of 1000 elements can be split into 10 smaller, parallel computations of 100 elements. After they all finish running, their results can be combined to achieve the same result as if the entire calculation was performed in a single, sequential computation
  • This translates to having many replicas of a computational subgraph and using a single client thread to coordinate the training loop of those replicas
  • The above approach can be modified further by making it asynchronous. Instead of having a single client thread, there are multiple clients (one for each subgraph replica), and each replica updates the trained parameters asynchronously

Model Parallel Training

  • Can run separate portions of the computation graph on different devices simultaneously on the same batch of examples
    • See Figure 8 for a visual example of sequence-to-sequence learning parallelized across three devices

Concurrent Steps for Model Computation Pipelining

  • Can also run a small set of concurrent training steps on a single device
    • Similar concept to asynchronous data parallelism, but the parallelism is only on a single device
    • This can "fill in the gaps" of device utilization, when parallel execution on all devices might not make full use of computation cycles
    • See Figure 9 for a visual example

8 Performance

Stay tuned for future versions of the TensorFlow white paper, which will include performance evaluations for single machine and distributed implementations of TensorFlow


9 Tools

This section discusses additional tools, developed by the TensorFlow team, that work alongside the graph modeling and execution features described above.

9.1 TensorBoard: Visualization of Graph Structures and Summary Statistics

TensorBoard was designed to help users visualize the structure of their graphs, as well as understand the behavior of their models

Visualization of Computation Graphs

  • TensorBoard includes features that allow for more digestible visualizations
    • TensorBoard can take models with tens of thousands of nodes and collapse them into high-level blocks, highlighting subgraphs with identical structures
    • It separates out "high-degree" nodes from the rest of the graph to further reduce visual clutter
      • Note: I haven't found a proper definition of "high-degree" nodes in TensorFlow. The paper says they "often serve book-keeping functions". I imagine they are operations similar to tf.initialize_all_variables, which are necessary to run the execution graph in TensorFlow but aren't really part of the mathematical model
  • The TensorBoard visualization is interactive
    • Users can pan, zoom, and expand the collapsed blocks in order to get a lower-level view of the model

Visualization of Summary Data

  • TensorBoard supports Summary operations that can be inserted into the execution graph to examine and track various values over time
    • Scalar summaries: e.g. average loss function over set of examples, total execution time of the model
    • Histogram based summaries: e.g. distribution of parameter values in a neural network layer
    • Image-based summaries: e.g. visualization of filter weights learned in a convolutional neural network
  • Typically, Summary nodes are setup to monitor specific interesting values and are executed periodically during normal graph execution
  • After Summary nodes are executed, the client writes the summary data to a log file. TensorBoard monitors this log to display summary information over time
    • The "time" used in the visualization of summary data can be: wall-clock time; absoute time; or "steps", the number of graph executions that have occurred since the first execution in the TensorFlow program

9.2 Performance tracing

  • A tool called EEG is used to examine fine-grained information about the ordering/performance of TensorFlow graphs
    • Works for both single machine and distributed implementations of TensorFlow
    • Note: EEG is not open sourced as of writing
  • Helps users understand bottlenecks in a TensorFlow program

The following is a brief overview of what EEG does under the hood

  • Traces are collected via a number of sources including:
  • The trace logs enable EEG to recreate the execution of the graph with microsecond-level precision. Events from the traces, within a timerange, are extracted and visualized according to the resolution of the client's user interface
    • The user can zoom in on portions of the graph, and EEG will update the visualization to show finer details
  • Any significant delays due to communication, synchronization, or direct memory access issues are identified and highlighted in the UI

Please see pages 14 and 15 of the November 2015 white paper to see a specific example of EEG visualization along with descriptions of the current UI


10 Future Work

This section lists areas of improvement and extension for TensorFlow identified for consideration by the TensorFlow team

Extensions:

  • A "function" mechanism, where users can specify subgraphs of TensorFlow execution to be reusable
    • In the TensorFlow team's design of this mechanism (not yet implemented), these kinds of functions can be reusable across front-end languages. That is, a user could define a subgraph function in Python and use it in C++ without redefining it

Improvements:

  • Continue work on a just-in-time compiler that can take in an execution subgraph and output an optimized routine for that subgraph
    • Such a compiler might also take in some runtime profiling information as input
    • The compiler should be able to perform loop fusion, block and tile for locality, and specialize routines for particular shapes and sizes of tensors, along with other optimizations
  • Improving the node placement and node scheduling algorithms
    • Instead of using man-made heuristics, have the system learn how to make good placement/scheduling decisions

11 Related Work

Open source, single machine systems with portions of similar functionality

Systems designed primarily for neural networks:

Systems that support symbolic differentiation:

Systems with a core written in C++:

Comparisons with DistBelief and Project Adam

Similarities shared with DistBelief and Project Adam:

  • They allow computations to be distributed across many devices on many machines
  • Users can specify models using relatively high-level descriptions

Differences between TensorFlow and DistBelief/Project Adam:

  • TensorFlow's dataflow graph model is more flexible and better at expressing more types of machine learning models and algorithms
  • Allows for the expression of stateful Parameter nodes as variables
  • Variable update operations are simply additional nodes in the graph
    • Both DistBelief and Project Adam have subsystems dedicated to handling parameters

Comparison with Hallide image processing system

  • Both use similar intermediate data representation to their dataflow graphs
  • However, Hallide has additional high level knowledge of its operations, which it uses to create optimized code that combine multiple operations
  • Hallide is single-machine only
    • The TensorFlow team hopes to extend TensorFlow's capabilities to incorporate Hallide's techniques into a distrubted setting

Related distributed dataflow graph systems

Systems that represent complex workflows as dataflow graphs

Systems that support data-dependent control flow

  • CIEL
    • Iteration is implemented as a directed acyclic graph (DAG) that dynamically unfolds
  • Naiad
    • Iteration is implemented as a static graph with cycles

Systems optimized for accessing the same data repeatedly

  • Spark
    • Uses resilient distributed datasets (RDDs) to cache outputs of earlier operations, in case they are needed again

Systems that execute dataflow graphs across heterogenous devices, including GPUs

Features that TensorFlow incorporates from the above distributed systems

Feature implementations that are most similar to TensorFlow are listed after the feature

  • The dataflow scheduler (i.e. the module that selects the next node to run)
    • CIEL, Dryad, Flume, Spark
  • Distributed architecture: using a single, optimized dataflow graph and caching information about the graph to lower coordination overhead
    • Naiad
  • Works best when there is enough RAM in the cluster to hold all working variables/computations
    • Naiad, Spark
  • Iteration: multiple replicas of the same graph can execute at once while sharing the same set of persistent variables. Replicas can share the variables asynchronously or use mechanisms such as queues to access them synchronously
    • Hybrid of many approaches
  • Iteration: a node only executes when all of its dependencies have completed
    • CIEL
  • Graph iteration is represented as a static, cyclic graph

12 Conclusions

  • TensorFlow is a flexible dataflow graph programming model
  • There are both single machine and distributed implementations of TensorFlow
  • TensorFlow was developed using prior experience in Google, as well as methods used in other previous systems
  • An open source implementation of TensorFlow is available
    • As of writing, only a single-device implementation has been released from Google

Figures

Figure 1: Example TensorFlow code fragment

    import tensorflow as tf
    
    # 100-d vector, init to zeros
    b = tf.Variable (tf.zeros([100])
    
    # 784x100 matrix with random values
    W = tf.Variable(tf.random_uniform([784,100], -1, 1))
    
    # Placeholder for input
    x = tf.placehoder(name="x")
    
    # Rectified linear unit of (W*x +b)
    relu = tf.nn.relu(tf.matmul(W, x) + b)
    
    # Cost computed as a function of relu
    C = [...]
    
    # Instantiate a Session
    s = tf.Session()
    
    for step in xrange(0, 10):
        # Create a 100-d vector for input
        input = ...construct 100-D input array ...
        
        # Find the cost, using the constructed vector as the placeholder input
        result = s.run(C, feed_dict = {x: input})
        print step, result

Figure 2: Corresponding computation graph for Figure 1

Figure 3: Single machine (left) and distributed system (right) structure

Figure 4: Before and after insertion of Send/Recieve nodes

Figure 5: Gradients computed for graph in figure 2

Figure 6: Before and after graph transformation for partial execution

Figure 7: Synchronous and asynchronous data parallel training

Figure 8: Model parallel training

Figure 9: Concurrent steps


Author: samjabrahams
Source code: https://github.com/samjabrahams/tensorflow-white-paper-notes
License: MIT license

#tensorflow 

White Label Studio Review & Bonuses

**WHITE LABEL STUDIO SOFTWARE REVIEW: YOUR OWN 100% DONE FOR YOU SOFTWARE BUSINESS
White Label Studio Software By Mario Brown & Med Amine

WHITE LABEL STUDIO WHAT IS IT With White Label Studio, you will get FOUR Battle Tested Software Solution WITH White Label and Resell Rights. You can access FOUR different apps, create accounts, manage clients and even upload their own logo and business name etc. The four apps are MyVirtualTours, VideozAgency, VideoMatic and Easy Banners Pro.

Software sells better than anything else and it’s very profitable, but it’s also very expensive to create and maintain. So why not leverage the same team, knowledge, and experience responsible for generating over 2 million a year in software sales. I’ve NEVER Done This Before For the FIRST TIME EVER they’re giving an opportunity to anyone who wants to start a 7-figure SaaS Business by giving away White Label Rights to FOUR next-gen software. Which means you can rebrand them, sell them and keep 100% of the profits (…And Just To Be Clear: This is NOT a Reseller Offer, you get Whitelabel Rights To ALL FOUR APPS… So they are yours to rebrand and sell as your own software)

You Get Complete Control & Power To: Re-Brand, Change Price, Change Subscription. 2021 Is The PERFECT Time To Start A New SaaS Business. In 2022, SaaS will generate close to $141 billion. By 2021, 73% of organizations will be using all or mostly SaaS solutions. Nearly 85% of small companies have already invested in SaaS options. Organizations with 250+ employees use more than 100 SaaS apps. Small firms of up to 50 employees use between 25-50 SaaS solutions, on average.

IF YOU WANT TO SEE MY VIDEO REVIEW ON YOUTUBE CHECK THE LINK BELOW :
CHANNEL - [“THE REVIEW CENTER”](https://youtu.be/ketbNQpGAfQ ““THE REVIEW CENTER””)

Imagine Starting A Successful SaaS Business With FOUR Incredible Video & Banner Apps… WITHOUT: Spending months on market research & analysis, Doing a full competition feature analysis, Running a price comparison with other products, Wasting time & resources on product creation, Hiring a team of coders, designers & copywriters, Incurring Overheads & other maintenance expenses, Hiring & training customer & technical support teams, Spending thousands of dollars on marketing, Being unsure of whether the product will even sell do none of this. They’ve Done All The Hard Work For You And they’ve Paid For Everything. Introducing White Label Studio.

With White Label Studio, you start Profiting In Just 3 Simple Steps:

**STEP 1: Get Access
STEP 2: Point-n-Click To Rebrand ALL FOUR Software As Your OWN
STEP 3: Sell & Pocket 100% Revenue.

And the best part? You can get access to all 4 video & banner apps for unlimited personal & professional use when you make ONE SINGLE TINY payment. Get It Now.

WHITE LABEL STUDIO FEATURES

INSTANT SALES: As soon as you get your hands on White Label Studio, you can kickstart your own agency. Not a moment of delay…you will have access to FOUR brilliant VIDEO & BANNER APPS that will sell like hot cakes. The best part about this deal is that you don’t have to waste time & money on any research & product creation…you get access to ready-to-sell apps that will give you an instant head-start.
MULTIPLY YOUR INCOME: Why sell only one or two videos & banners when you can sell as many as you like. In fact, why not sell the entire platform with 4 incredible video & banner apps for a big monthly fee? Possibilities are limitless when you leverage this marketing wonder! Multiply your income with each client you effortlessly get on board.
AUTOMATED WORK: Replace hard work with smart work. With White Label Studio, All you need to do is click a few buttons to customize done-for-you services on these incredible apps from one powerful dashboard. Access stunning DFY video templates that can be tweaked to your needs instantly. Effortlessly create sales videos, review videos, walkthrough videos, training videos, 360 videos, banner ads and any video format under the sun in a matter of minutes…
AGENCY DOMINATION: You’ve got everything you need to create a name for yourself right from the beginning. The most incredible apps that are professionally designed & guaranteed to convert well. Don’t just own an agency, own the entire video & banner marketing space with this platform that’ll put you right on top of the charts!
UNCONDITIONAL SUPPORT: White Label Studio team of experts are behind you 100%. Training, support, updates & tutorials are all included with your purchase. Get everything right & enjoy unconditional support to take the maximum advantage of the platform from day 1.
UNCUT PROFIT: There is no need to shell out your valuable profits, not even a single penny. As you get more and more clients with each passing day, this tiny investment will clearly appear as the wisest decision you’ve ever made. Make big money without any monthly fee, success tax or subscription amount.
LIMITLESS FLEXIBILITY: Enjoy flexibility at its best. Don’t be bound by restrictions & long wait times. Don’t waste time on juggling apps or creating videos & banners that take hours to render. This platform has been created to help you profit easily just the way you like it…
ABSOLUTE POWER & JOY: You don’t need to be an expert or have years of experience to enjoy absolute power and joy. You’ve got everything you need to live the life of great fulfillment and success. Skyrocketing sales & profits can become your instant reality without waiting around.

—:: Check out the White Label Studio to get the Bonuses ::—
WHAT WHITE LABEL STUDIO CAN DO FOR YOU

With White Label Studio, you’ll GET ACCESS to FOUR futuristic apps with whitelabel rights for the price of a single app
SELL all FOUR apps individually or as a full-blown package to pocket massive profits
OFFER access for a one-time price or a monthly recurring subscription to create a never-ending passive income
4X PROFITS with included Commercial License. Use all 4 apps for your own use as well as for clients
White Label Studio is 100% Newbie Friendly: Cloud based, no technical experience needed.
Step-by-step training + customer support for your customers included
If you want to see a video Review about this product , then check my video Review also

MY YOUTUBE REVIEW : [“THE REVIEW CENTRE”](https://youtu.be/ketbNQpGAfQ ““THE REVIEW CENTRE””)
If you liked My Youtube Video Review ,
please Suscribe to my channel for more Reviews

WHITE LABEL STUDIO FREQUENTLY ASKED QUESTIONS

How easy is it to ‘REBRAND’ the apps inside White Label Studio? A. It is point-n-click easy. White Label Studio is ridiculously simple to use and 100% beginner friendly. You can simply upload your logo and customize the colors & text using a few clicks of buttons. Age, skill and experience is no bar.
What if I don’t make any profits with White Label Studio? A. Every app inside White Label Studio is powered by next-gen technology. Videos & banners are 2 of the hottest services on the internet today. And selling these services or software that help you render these services is really really easy. However, if you are still unable to make profits using White Label Studio – you can always get a refund within 14 days of your purchase.
Is White Label Studio Windows and Mac compatible? A. It is hosted on reliable cloud servers. You can log on from any device of your choice with any operating software.
Do you charge any monthly fees? A. Not yet…but to support this incredible technology and your customers, after this limited period offer, they will be charging a monthly fee. Make sure you buy it at this incredibly low one-time price to save your precious money!
Will I get any training or support for my questions? A. Absolutely. Their team of experts are available to you & your customers 24X7 to answer any questions that you or your customers may have. Training videos are included within your purchase to make you a seasoned software seller within seconds.
Do I need to download & install White Label Studio somewhere? A. Never! You simply have to use the software from any browser. Nothing to download or install. They push automatic updates through the cloud to make your experience bigger and better. Should you need anything else, they are a message away!

—:: Check out the White Label Studio to get the Bonuses ::—

Link for Discounted Price + Bonuses : Click here

WHITE LABEL STUDIO PRICE

Front End – 4 Software Apps With Reseller & White Label

– First Time Ever Done On JVZoo
– FOUR PROVEN Software Apps WITH White Label Dashboard
– EACH App Has a Strong Agency & MMO Angle
– Insanely High Value Offer & Easy To Promote
– 4 Apps: MyVirtualTours, Video Matic, Videoz Agency, Easy Banners Pro
– Your Audience Can Upload Their Logo, Manage Clients etc.

Discounted Link : https://jvz7.com/c/2105669/368555

OTO 1: PRO – Get PRO Features For EACH App + Resell PRO Features
– Get Access To ALL PRO Features of ALL 4 Apps
– Use All The PRO Features For Your Business
– RESELL ALL PRO Features With This Upgrade – Sell PRO Version To Clients
– Insane Savings, Just One Time Investment Instead Of Paying For All 4 Apps Each
– PRO Version Includes Advanced Features, More Banners, More Virtual Tours etc.

Discounted Link : https://jvz8.com/c/2105669/368557

OTO 2: White Label Studio X
– ADVANCED White Label Features
– SMTP, Custom Upgrade URL, Custom Tutorial URL, Custom Support Link etc.
– Future White Label Features Included
– Future Software Updates Included
– Dedicated White Label Support
– 5 Team Member Access

Discounted Link : https://jvz8.com/c/2105669/368559

OTO 3: White Label Studio UNLIMITED
– Unlimited Client Accounts !!!
– Unlimited Animated Videos
– Unlimited 360 Virtual Tours
– Unlimited Interactive Videos
– Unlimited Banners
– Unlimited Email Contacts & Leads
– Unlimited Everything

Discounted Link : https://jvz8.com/c/2105669/368561

OTO 4: Marketing Pack – Done For You Resources To Sell Each App
– This Package Helps You Sell Your Agency Services & Each App
– Done For You Animated Sales Video
– Done For You Graphics
– Done For You Brochure
– Done For You PowerPoint/Keynote Presentation
– Done For You Sales Script
– And A Lot More

Discounted Link : https://jvz6.com/c/2105669/368563

All of them are Agency Apps so this is a FANTASTIC fit for the current Agency craziness but it also works GREAT for Video Marketers, Local Marketers, Coaches & Consultants, anyone wanting a SAAS Business and Biz Opportunity folks. Each app is updated & battle tested with hundreds of happy customers, JVZOO product of the day and incredible support.

All links in this sales funnel:
Front End 1 link (White Label Studio)
– White Label Studio link

OTO 1 link (White Label Studio Unlimited)
– White Label Studio Unlimited link

OTO 2 link (White Label Studio Pro)
– White Label Studio Pro link

OTO 3 link (White Label Studio Marketing Kit)
White Label Studio Marketing Kit link

OTO 4 link ( White Label Studio (Upgrade 4))
– White Label Studio (Upgrade 4) link

Thanks for your precious time, hope you liked the Review

--------------------------------x--------------------------------

#white label studio review #white label studio #white label studio bonuses #white label software #white label software reseller #white label studio discount

What Do Regulators and Investors Want To See In Your White Papers?

In September 2017, the Office of Financial Conduct Authority of the UK (FCA) issued a warning about the risks associated with the ICO. In particular, they directly pointed out to the weak level of White Papers, and did describe it like the risk of the “imperfect documentation”.

Instead of a prospectus for securities and an investment project, companies that are going to the ICO usually provide only White Paper, a document that basically does not contain comprehensive information about the project, and often it is misleading, while for a full understanding of the characteristics and risks of tokens it is necessary to have the view on complex technical and economic issues, taking into account the legal content. So it was said by regulator, and not only British.

Whitepaper is the central document of the ICO. It should clearly and briefly convey to potential investors the essence of the concept, its technical aspects, and the perspectives for investing in it. This document should receive special attention, since most investors decide to invest in the project based on its analysis.

According to institutional investors opinion, based on comments and remarks in the press regarding the manifestation of interest in the ICO, the following conclusions on evaluation priorities can be summarized briefly what is a qualitative document that must meet the following criteria:

IDEA

  •  understandable from the first lines
  •  is presented in an accessible style for business, taking into account the development parameters
  •  innovation is revealed

MARKETS

  •  the current situation is described
  •  describes the impact of the project on markets

FORECASTS

  •  adequate to project possibilities
  •  adequate to the scale of markets
  •  do not contain unreasonable assumptions

FUNCTIONAL

  •  general description of business processes
  •  the token functions are detailed
  •  the need for a token in the project is obvious

VALUE OF CRYPTOACTIVE

  •  for the project audience
  •  for investors
  •  for the project sector of economy

WHITEPAPER STYLE

  •  structured document
  •  completed quality product
  •  revealing and selling the idea

In turn, regulators often pay attention to the criteria for evaluating ICO projects through their white papers. Particularly FINMA is interesting for its recommendations regarding the writing of such documents and the filling in of relevant project information in applications for authorization of the project under Swiss jurisdiction.

The general tone of the recommendations has two priorities in principle: economic logic and security.

The system of information disclosure embodied in the laws of many countries on securities is largely based on the fact that the founders of crypto projects are providing information about their company, management and securities (if their token is a security and not only), but also about the expected use of the investments raised.

This information is filed out with the SEC, FINMA, any other similar regulator in a particular jurisdiction. Accordingly, under the example of FINMA, you can understand the main criteria for evaluating the WP and the project itself and those moments that need to be covered:

Founder / promoter location

Very often in the ICO it is impossible to identify the origin of the issuing organization or promoter. This creates a serious information asymmetry on the part of the investor. Without this information, it becomes impossible to know or determine what rules and legal protection can be provided to investors.

Therefore, in the information documents ICO should describe in detail where the issuer is located. Without verifiable geographical addresses, ICO-documents do not have any evidentiary meaning and will not be accepted for consideration

Problem and proposed technological solution

For investors, there is no more important information than the issuer’s financial statements. However, there is no such reporting for startups going to the ICO. ICOs tend to serve a slightly different purpose in comparing to traditional primary offers (IPOs). ICO is raising funds to solve technological problems.

In exchange for funding, promoters / founders offer tokens with various functions (currency, utility or security). Therefore, it is important in WP to describe in detail the technological problem that the crypto project is resolving, and what exactly its token gives — what functional it has. It’s just one thing to describe the problem and the solution, it is necessary that these words be subjected to a third-party audit.

It’s worth paying tribute to those projects that publicly post their codes on GitHub if they are available in the project — an important point. It would be nice to post publicly the financial statements of the crypto project after the ICO.

Being able to analyze balances, cash flows and income statements, investors could assess the company’s performance, make sound assumptions about its future effectiveness and profitability and assess the value of the company’s securities.

Token description

Tokens can have many different qualitative and economic functions. If the coins meet certain rules, such as the ERC20 standard, the disclosure should clarify what this means for the average owner. It is also necessary to justify the choice of this or another standard of the token.

The token descriptions should indicate the intended use of coins issued at placement, their number quantity, whether the founders have reserve coins, and how / when they can liquidate them. Justification for the above actions and choices.

Also, the regulator needs to notify when the token will be generated, when the platform, what ICO terms, how the token transfer to users/investors, give a clear functional and economic description of the token. What rights will guarantee and provide the project’s token for investors, how they will be documented, how and where the token can be purchased or sold after the ICO.

#blockchain #crypto #ico #crypto-white-paper #crypto-regulation #startups #fundraising-tips #cryptocurrency-white-paper