1675258440
Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations.
If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.
Bolt also has theoretical guarantees bounding the errors in its approximations.
EDIT: this repo now also features the source code for MADDNESS, our shiny new algorithm for approximate matrix multiplication. MADDNESS has no Python wrapper yet, and is referred to as "mithral" in the source code. Name changed because apparently I'm the only who gets Lord of the Rings references. MADDNESS runs ridiculously fast and, under reasonable assumptions, requires zero multiply-adds. Realistically, it'll be most useful for speeding up neural net inference on CPUs, but it'll take another couple papers to get it there; we need to generalize it to convolution and write the CUDA kernels to allow GPU training. See also the poster and slides.
EDIT2: Looking for a research project? See our list of ideas.
EDIT3: See Build.md for a working dockerfile that builds and runs Bolt, contributed by @mneilly.
NOTE: All below code refers to the Python wrapper for Bolt and has nothing to do with MADDNESS. It also seems to be no longer building for many people. If you want to use MADDNESS, see the Python Implementation driven by amm_main.py or C++ implementation. All code is ugly, but Python code should be pretty easy to add new AMM methods/variations to.
$ brew install swig # for wrapping C++; use apt-get, yum, etc, if not OS X
$ pip install numpy # bolt installation needs numpy already present
$ git clone https://github.com/dblalock/bolt.git
$ cd bolt && python setup.py install
$ pytest tests/ # optionally, run the tests
If you run into any problems, please don't hesitate to mention it in the Python build problems issue.
Install Bazel, Google's open-source build system. Then
$ git clone https://github.com/dblalock/bolt.git
$ cd bolt/cpp && bazel run :main
The bazel run
command will build the project and run the tests and benchmarks.
If you want to integrate Bolt with another C++ project, include cpp/src/include/public.hpp
and add the remaining files under cpp/src
to your builds. You should let me know if you're interested in doing such an integration because I'm hoping to see Bolt become part of many libraries and thus would be happy to help you.
Bolt currently only supports machines with AVX2 instructions, which basically means x86 machines from fall 2013 or later. Contributions for ARM support are welcome. Also note that the Bolt Python wrapper is currently configured to require Clang, since GCC apparently runs into issues.
Bolt is based on vector quantization. For details, see the Bolt paper or slides.
Bolt includes a thorough set of speed and accuracy benchmarks. See the experiments/
directory. This is also what you want if you want to reproduce the results in the paper.
Note that all of the timing results use the raw C++ implementation. At present, the Python wrapper is slightly slower due to Python overhead. If you're interested in having a full-speed wrapper, let me know and I'll allocate time to making this happen.
X, queries = some N x D array, some iterable of length D arrays
# these are approximately equal (though the latter are shifted and scaled)
enc = bolt.Encoder(reduction='dot').fit(X)
[np.dot(X, q) for q in queries]
[enc.transform(q) for q in queries]
# same for these
enc = bolt.Encoder(reduction='l2').fit(X)
[np.sum((X - q) * (X - q), axis=1) for q in queries]
[enc.transform(q) for q in queries]
# but enc.transform() is 10x faster or more
import bolt
import numpy as np
from scipy.stats import pearsonr as corr
from sklearn.datasets import load_digits
import timeit
# for simplicity, use the sklearn digits dataset; we'll split
# it into a matrix X and a set of queries Q
X, _ = load_digits(return_X_y=True)
nqueries = 20
X, Q = X[:-nqueries], X[-nqueries:]
enc = bolt.Encoder(reduction='dot', accuracy='lowest') # can tweak acc vs speed
enc.fit(X)
dot_corrs = np.empty(nqueries)
for i, q in enumerate(Q):
dots_true = np.dot(X, q)
dots_bolt = enc.transform(q)
dot_corrs[i] = corr(dots_true, dots_bolt)[0]
# dot products closely preserved despite compression
print("dot product correlation: {} +/- {}".format(
np.mean(dot_corrs), np.std(dot_corrs))) # > .97
# massive space savings
print(X.nbytes) # 1777 rows * 64 cols * 8B = 909KB
print(enc.nbytes) # 1777 * 2B = 3.55KB
# massive time savings (~10x here, but often >100x on larger
# datasets with less Python overhead; see the paper)
t_np = timeit.Timer(
lambda: [np.dot(X, q) for q in Q]).timeit(5) # ~9ms
t_bolt = timeit.Timer(
lambda: [enc.transform(q) for q in Q]).timeit(5) # ~800us
print "Numpy / BLAS time, Bolt time: {:.3f}ms, {:.3f}ms".format(
t_np * 1000, t_bolt * 1000)
# can get output without offset/scaling if needed
dots_bolt = [enc.transform(q, unquantize=True) for q in Q]
# search using squared Euclidean distances
# (still using the Digits dataset from above)
enc = bolt.Encoder('l2', accuracy='high').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q] # knn for each query
# search using dot product (maximum inner product search)
enc = bolt.Encoder('dot', accuracy='medium').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q] # knn for each query
Bolt stands for "Based On Lookup Tables". Feel free to use this exciting fact at parties.
Author: dblalock
Source Code: https://github.com/dblalock/bolt
License: MPL-2.0 license
#machinelearning #datamining #compress #database
1657400640
Matrix is an ambitious new ecosystem for open federated Instant Messaging and VoIP. The basics you need to know to get up and running are:
#matrix:matrix.org
or #test:localhost:8448
.@matthew:matrix.org
(although in the future you will normally refer to yourself and others using a third party identifier (3PID): email address, phone number, etc rather than manipulating Matrix user IDs)The overall architecture is:
client <----> homeserver <=====================> homeserver <----> client
https://somewhere.org/_matrix https://elsewhere.net/_matrix
#matrix:matrix.org
is the official support room for Matrix, and can be accessed by any client from https://matrix.org/docs/projects/try-matrix-now.html or via IRC bridge at irc://irc.libera.chat/matrix.
Synapse is currently in rapid development, but as of version 0.5 we believe it is sufficiently stable to be run as an internet-facing service for real usage!
Matrix specifies a set of pragmatic RESTful HTTP JSON APIs as an open standard, which handle:
These APIs are intended to be implemented on a wide range of servers, services and clients, letting developers build messaging and VoIP functionality on top of the entirely open Matrix ecosystem rather than using closed or proprietary solutions. The hope is for Matrix to act as the building blocks for a new generation of fully open and interoperable messaging and VoIP apps for the internet.
Synapse is a Matrix "homeserver" implementation developed by the matrix.org core team, written in Python 3/Twisted.
In Matrix, every user runs one or more Matrix clients, which connect through to a Matrix homeserver. The homeserver stores all their personal chat history and user account information - much as a mail client connects through to an IMAP/SMTP server. Just like email, you can either run your own Matrix homeserver and control and own your own communications and history or use one hosted by someone else (e.g. matrix.org) - there is no single point of control or mandatory service provider in Matrix, unlike WhatsApp, Facebook, Hangouts, etc.
We'd like to invite you to join #matrix:matrix.org (via https://matrix.org/docs/projects/try-matrix-now.html), run a homeserver, take a look at the Matrix spec, and experiment with the APIs and Client SDKs.
Thanks for using Matrix!
For support installing or managing Synapse, please join #synapse:matrix.org
(from a matrix.org account if necessary) and ask questions there. We do not use GitHub issues for support requests, only for bug reports and feature requests.
Synapse's documentation is nicely rendered on GitHub Pages, with its source available in docs
.
The easiest way to try out your new Synapse installation is by connecting to it from a web client.
Unless you are running a test instance of Synapse on your local machine, in general, you will need to enable TLS support before you can successfully connect from a client: see TLS certificates.
An easy way to get started is to login or register via Element at https://app.element.io/#/login or https://app.element.io/#/register respectively. You will need to change the server you are logging into from matrix.org
and instead specify a Homeserver URL of https://<server_name>:8448
(or just https://<server_name>
if you are using a reverse proxy). If you prefer to use another client, refer to our client breakdown.
If all goes well you should at least be able to log in, create a room, and start sending messages.
By default, registration of new users via Matrix clients is disabled. To enable it, specify enable_registration: true
in homeserver.yaml
. (It is then recommended to also set up CAPTCHA - see docs/CAPTCHA_SETUP.md.)
Once enable_registration
is set to true
, it is possible to register a user via a Matrix client.
Your new user name will be formed partly from the server_name
, and partly from a localpart you specify when you create the account. Your name will take the form of:
@localpart:my.domain.name
(pronounced "at localpart on my dot domain dot name").
As when logging in, you will need to specify a "Custom server". Specify your desired localpart
in the 'User name' box.
Matrix serves raw, user-supplied data in some APIs -- specifically the content repository endpoints.
Whilst we make a reasonable effort to mitigate against XSS attacks (for instance, by using CSP), a Matrix homeserver should not be hosted on a domain hosting other web applications. This especially applies to sharing the domain with Matrix web clients and other sensitive applications like webmail. See https://developer.github.com/changes/2014-04-25-user-content-security for more information.
Ideally, the homeserver should not simply be on a different subdomain, but on a completely different registered domain (also known as top-level site or eTLD+1). This is because some attacks are still possible as long as the two applications share the same registered domain.
To illustrate this with an example, if your Element Web or other sensitive web application is hosted on A.example1.com
, you should ideally host Synapse on example2.com
. Some amount of protection is offered by hosting on B.example1.com
instead, so this is also acceptable in some scenarios. However, you should not host your Synapse on A.example1.com
.
Note that all of the above refers exclusively to the domain used in Synapse's public_baseurl
setting. In particular, it has no bearing on the domain mentioned in MXIDs hosted on that server.
Following this advice ensures that even if an XSS is found in Synapse, the impact to other applications will be minimal.
The instructions for upgrading synapse are in the upgrade notes. Please check these instructions as upgrading may require extra steps for some versions of synapse.
It is recommended to put a reverse proxy such as nginx, Apache, Caddy, HAProxy or relayd in front of Synapse. One advantage of doing so is that it means that you can expose the default https port (443) to Matrix clients without needing to run Synapse with root privileges.
For information on configuring one, see docs/reverse_proxy.md.
Identity servers have the job of mapping email addresses and other 3rd Party IDs (3PIDs) to Matrix user IDs, as well as verifying the ownership of 3PIDs before creating that mapping.
They are not where accounts or credentials are stored - these live on home servers. Identity Servers are just for mapping 3rd party IDs to matrix IDs.
This process is very security-sensitive, as there is obvious risk of spam if it is too easy to sign up for Matrix accounts or harvest 3PID data. In the longer term, we hope to create a decentralised system to manage it (matrix-doc #712), but in the meantime, the role of managing trusted identity in the Matrix ecosystem is farmed out to a cluster of known trusted ecosystem partners, who run 'Matrix Identity Servers' such as Sydent, whose role is purely to authenticate and track 3PID logins and publish end-user public keys.
You can host your own copy of Sydent, but this will prevent you reaching other users in the Matrix ecosystem via their email address, and prevent them finding you. We therefore recommend that you use one of the centralised identity servers at https://matrix.org
or https://vector.im
for now.
To reiterate: the Identity server will only be used if you choose to associate an email address with your account, or send an invite to another user via their email address.
Users can reset their password through their client. Alternatively, a server admin can reset a users password using the admin API or by directly editing the database as shown below.
First calculate the hash of the new password:
$ ~/synapse/env/bin/hash_password
Password:
Confirm password:
$2a$12$xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Then update the users
table in the database:
UPDATE users SET password_hash='$2a$12$xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
WHERE name='@test:test.com';
The best place to get started is our guide for contributors. This is part of our larger documentation, which includes information for synapse developers as well as synapse administrators.
Developers might be particularly interested in:
Alongside all that, join our developer community on Matrix: #synapse-dev:matrix.org, featuring real humans!
Before setting up a development environment for synapse, make sure you have the system dependencies (such as the python header files) installed - see Platform-specific prerequisites.
To check out a synapse for development, clone the git repo into a working directory of your choice:
git clone https://github.com/matrix-org/synapse.git
cd synapse
Synapse has a number of external dependencies. We maintain a fixed development environment using Poetry. First, install poetry. We recommend:
pip install --user pipx
pipx install poetry
as described here. (See poetry's installation docs for other installation methods.) Then ask poetry to create a virtual environment from the project and install Synapse's dependencies:
poetry install --extras "all test"
This will run a process of downloading and installing all the needed dependencies into a virtual env.
We recommend using the demo which starts 3 federated instances running on ports 8080 - 8082:
poetry run ./demo/start.sh
(to stop, you can use poetry run ./demo/stop.sh
)
See the demo documentation for more information.
If you just want to start a single instance of the app and run it directly:
# Create the homeserver.yaml config once
poetry run synapse_homeserver \
--server-name my.domain.name \
--config-path homeserver.yaml \
--generate-config \
--report-stats=[yes|no]
# Start the app
poetry run synapse_homeserver --config-path homeserver.yaml
After getting up and running, you may wish to run Synapse's unit tests to check that everything is installed correctly:
poetry run trial tests
This should end with a 'PASSED' result (note that exact numbers will differ):
Ran 1337 tests in 716.064s
PASSED (skips=15, successes=1322)
For more tips on running the unit tests, like running a specific test or to see the logging output, see the CONTRIBUTING doc.
Synapse is accompanied by SyTest, a Matrix homeserver integration testing suite, which uses HTTP requests to access the API as a Matrix client would. It is able to run Synapse directly from the source tree, so installation of the server is not required.
Testing with SyTest is recommended for verifying that changes related to the Client-Server API are functioning correctly. See the SyTest installation instructions for details.
Synapse uses a number of platform dependencies such as Python and PostgreSQL, and aims to follow supported upstream versions. See the docs/deprecation_policy.md document for more details.
Need help? Join our community support room on Matrix: #synapse:matrix.org
If synapse runs out of file handles, it typically fails badly - live-locking at 100% CPU, and/or failing to accept new TCP connections (blocking the connecting client). Matrix currently can legitimately use a lot of file handles, thanks to busy rooms like #matrix:matrix.org containing hundreds of participating servers. The first time a server talks in a room it will try to connect simultaneously to all participating servers, which could exhaust the available file descriptors between DNS queries & HTTPS sockets, especially if DNS is slow to respond. (We need to improve the routing algorithm used to be better than full mesh, but as of March 2019 this hasn't happened yet).
If you hit this failure mode, we recommend increasing the maximum number of open file handles to be at least 4096 (assuming a default of 1024 or 256). This is typically done by editing /etc/security/limits.conf
Separately, Synapse may leak file handles if inbound HTTP requests get stuck during processing - e.g. blocked behind a lock or talking to a remote server etc. This is best diagnosed by matching up the 'Received request' and 'Processed request' log lines and looking for any 'Processed request' lines which take more than a few seconds to execute. Please let us know at #synapse:matrix.org if you see this failure mode so we can help debug it, however.
First, ensure you are running the latest version of Synapse, using Python 3 with a PostgreSQL database.
Synapse's architecture is quite RAM hungry currently - we deliberately cache a lot of recent room data and metadata in RAM in order to speed up common requests. We'll improve this in the future, but for now the easiest way to either reduce the RAM usage (at the risk of slowing things down) is to set the almost-undocumented SYNAPSE_CACHE_FACTOR
environment variable. The default is 0.5, which can be decreased to reduce RAM usage in memory constrained enviroments, or increased if performance starts to degrade.
However, degraded performance due to a low cache factor, common on machines with slow disks, often leads to explosions in memory use due backlogged requests. In this case, reducing the cache factor will make things worse. Instead, try increasing it drastically. 2.0 is a good starting value.
Using libjemalloc can also yield a significant improvement in overall memory use, and especially in terms of giving back RAM to the OS. To use it, the library must simply be put in the LD_PRELOAD environment variable when launching Synapse. On Debian, this can be done by installing the libjemalloc1
package and adding this line to /etc/default/matrix-synapse
:
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
This can make a significant difference on Python 2.7 - it's unclear how much of an improvement it provides on Python 3.x.
If you're encountering high CPU use by the Synapse process itself, you may be affected by a bug with presence tracking that leads to a massive excess of outgoing federation requests (see discussion). If metrics indicate that your server is also issuing far more outgoing federation requests than can be accounted for by your users' activity, this is a likely cause. The misbehavior can be worked around by setting the following in the Synapse config file:
presence:
enabled: false
The typical failure mode here is that you send an invitation to someone to join a room or direct chat, but when they go to accept it, they get an error (typically along the lines of "Invalid signature"). They might see something like the following in their logs:
2019-09-11 19:32:04,271 - synapse.federation.transport.server - 288 - WARNING - GET-11752 - authenticate_request failed: 401: Invalid signature for server <server> with key ed25519:a_EqML: Unable to verify signature for <server>
This is normally caused by a misconfiguration in your reverse-proxy. See docs/reverse_proxy.md and double-check that your settings are correct.
Download Details:
Author: matrix-org
Source Code: https://github.com/matrix-org/synapse
License: Apache-2.0 license
#python
1619565060
What is a ternary operator: The ternary operator is a conditional expression that means this is a comparison operator and results come on a true or false condition and it is the shortest way to writing an if-else statement. It is a condition in a single line replacing the multiline if-else code.
syntax : condition ? value_if_true : value_if_false
condition: A boolean expression evaluates true or false
value_if_true: a value to be assigned if the expression is evaluated to true.
value_if_false: A value to be assigned if the expression is evaluated to false.
How to use ternary operator in python here are some examples of Python ternary operator if-else.
Brief description of examples we have to take two variables a and b. The value of a is 10 and b is 20. find the minimum number using a ternary operator with one line of code. ( **min = a if a < b else b ) **. if a less than b then print a otherwise print b and second examples are the same as first and the third example is check number is even or odd.
#python #python ternary operator #ternary operator #ternary operator in if-else #ternary operator in python #ternary operator with dict #ternary operator with lambda
1625362361
**Introduction of BOLT **
Brand New, Lightning Fast All-In-One Video Hosting, Video Player & Video Marketing Software That Boosts Engagements, Gets You FREE Traffic, And More Views For A Low, 1-Time Payment.
What’s The Power of BOLT
-Video Marketing & Traffic App that helps anyone generate sales quickly.
-Software Based On Real Life Problem Solving.
-High Converting Funnel. Every upgrade Compliments the previous one.
-Generate Profits Without Selling Anything.
-LIVE Proofs and Real Time Case Studies.
-Thousands in Prizes Paid Instantly.
-Newbies can Drive Traffic at Zero Cost.
**IT’S A COMPLETE VIDEO MARKETING SOLUTION **
That Will Save You Time, Get You A Tidal Wave Of FREE Traffic, AND We’re Able To Use The Included Commercial License To Make MASSIVE Profits Without Any Hard Work Or Video Creation Required…
Most people are fumbling along and struggling to get results from their video marketing… That’s because most people are using expensive video marketing platforms that cost an arm and a leg, and because of their built-in ad systems, they actually make it really hard for you to make money…
Inside, You Get Everything You Need To Finally Get HUGE Results From Video Marketing… Even If You’re Just Starting Out…
SEE WHY I HIGHLY NOT RECOMMEND BOLT SOFTWARE!!
Read Honest BOLT Review Here >>
#bolt review #bolt #bolt review art flair #bolt reviews
1617738420
In this article, we will discuss the unformatted Input/Output operations In C++. Using objects cin and cout for the input and the output of data of various types is possible because of overloading of operator >> and << to recognize all the basic C++ types. The operator >> is overloaded in the istream class and operator << is overloaded in the ostream class.
The general format for reading data from the keyboard:
cin >> var1 >> var2 >> …. >> var_n;
#c++ #c++ programs #c++-operator overloading #cpp-input-output #cpp-operator #cpp-operator-overloading #operators
1623233520
Pandas is one of the most commonly used data analysis and manipulation libraries in data science ecosystem. It offers plenty of functions and methods to perform efficient operations.
What I like most about Pandas is that there are almost always multiple ways to accomplish a given task. However, we should consider time and computational complexity when selection a method from available options.
It is not enough just to complete a given task. We should make it as efficient as possible. Thus, having a comprehensive understanding of how functions and methods work is of crucial importance.
In this article, we will do examples to compare the apply and applymap functions of pandas to vectorized operations. The apply and applymap functions come in hand for many tasks. However, as the size of data increases, time becomes an issue.
#programming #data-science #machine-learning #artificial-intelligence #efficient pandas: apply vs vectorized operations #apply vs vectorized operations