1640695338
Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database
NOTE: This installs the sfgraph
command-line utility
$ pip install spiderfoot-neo4j
NOTE: Docker must first be installed
$ docker run --rm --name sfgraph -v "$(pwd)/neo4j_database:/data" -e 'NEO4J_AUTH=neo4j/CHANGETHISIFYOURENOTZUCK' -e 'NEO4JLABS_PLUGINS=["apoc", "graph-data-science"]' -e 'NEO4J_dbms_security_procedures_unrestricted=apoc.*,gds.*' -p "7474:7474" -p "7687:7687" neo4j
$ sfgraph path_to/spiderfoot.db -s <SCANID_1> <SCANID_2> ...
Visit http://127.0.0.1:7474 and log in with neo4j/CHANGETHISIFYOURENOTZUCK
The --suggest
option will rank nodes based on their connectedness in the graph. This is perfect for finding closely-related affiliates (child companies, etc.) to scan and add to the graph. By default, Harmonic Centrality is used, but others such as PageRank can be specified with --closeness-algorithm
$ sfgraph --suggest DOMAIN_NAME
# match all INTERNET_NAMEs
MATCH (n:INTERNET_NAME) RETURN n
# match multiple event types
MATCH (n) WHERE n:INTERNET_NAME OR n:DOMAIN_NAME OR n:EMAILADDR RETURN n
# match by attribute
MATCH (n {data: "evilcorp.com"}) RETURN n
# match by spiderfoot module (relationship)
MATCH p=()-[r:WHOIS]->() RETURN p
# shortest path to all INTERNET_NAMEs from seed domain
MATCH p=shortestPath((d:DOMAIN_NAME {data:"evilcorp.com"})-[*]-(n:INTERNET_NAME)) RETURN p
# match only primary targets (non-affiliates)
MATCH (n {scanned: true}) return n
# match only affiliates
MATCH (n {affiliate: true}) return n
sfgraph [-h] [-db SQLITEDB] [-s SCANS [SCANS ...]] [--uri URI] [-u USERNAME] [-p PASSWORD] [--clear] [--suggest SUGGEST]
[--closeness-algorithm {pageRank,articleRank,closenessCentrality,harmonicCentrality,betweennessCentrality,eigenvectorCentrality}] [-v]
optional arguments:
-h, --help show this help message and exit
-db SQLITEDB, --sqlitedb SQLITEDB
Spiderfoot sqlite database
-s SCANS [SCANS ...], --scans SCANS [SCANS ...]
scan IDs to import
--uri URI Neo4j database URI (default: bolt://127.0.0.1:7687)
-u USERNAME, --username USERNAME
Neo4j username (default: neo4j)
-p PASSWORD, --password PASSWORD
Neo4j password
--clear Wipe the Neo4j database
--suggest SUGGEST Suggest targets of this type (e.g. DOMAIN_NAME) based on their connectedness in the graph
--closeness-algorithm {pageRank,articleRank,closenessCentrality,harmonicCentrality,betweennessCentrality,eigenvectorCentrality}
Algorithm to use when suggesting targets
-v, -d, --debug Verbose / debug
Download Details:
Author: blacklanternsecurity
Source Code: https://github.com/blacklanternsecurity/spiderfoot-neo4j
License: GPL-3.0 License
1621506661
Even if I prefer using SSIS for data transfer operations, Neo4j doesn’t have any official or stable (free) SSIS component. While searching, I found a third-party component that is [still in the beta version].
Another approach for migrating SQL Server graphs to Neo4j is to export data into flat files and then [import them into Neo4j].
The third approach is to develop a small application using C## to migrate Nodes and Edges created in SQL Server to a Neo4j database. This approach is explained in detail in this section.
#graph database #database #neo4j #neo4j database
1620663480
At the very beginning of most development endeavors lies an important question: What database do I choose? There is such an abundance of database technologies at this moment, it’s no wonder many developers don’t have the time or energy to research new ones. If you are one of those developers and you aren’t very familiar with graph databases in general, you’ve come to the right place!
In this article, you will learn about the main differences between a graph database and a relational database, what kind of use-cases are best suited for each database type, and what are their strengths and weaknesses.
#graph-database #relational-database #graph-theory #graph-analysis #data-analytics #networks #data #database
1603659600
The deep learning and knowledge graph technologies have been developing rapidly in recent years. Compared with the “black box” of deep learning, knowledge graphs are highly interpretable, thus are widely adopted in such scenarios as search recommendations, intelligent customer support, and financial risk management.
Meituan has been digging deep in the connections buried in the huge amount of business data over the past few years and has gradually developed the knowledge graphs in nearly ten areas, including cuisine graphs, tourism graphs, and commodity graphs. The ultimate goal is to enhance the smart local life.
Compared with the traditional RDBMS, graph databases can store and query knowledge graphs more efficiently. It gains obvious performance advantage in multi-hop queries to select graph databases as the storage engine. Currently, there are dozens of graph database solutions out there on the market.
It is imperative for the Meituan team to select a graph database solution that can meet the business requirements and to use the solution as the basis of Meituan’s graph storage and graph learning platform. The team has outlined the basic requirements as below per our business status quo:
By having control over the source code, the Meituan team can ensure data security and service availability.
The knowledge graph data size in Meituan can reach hundreds of billions of vertices and edges in total and the throughput can reach tens of thousands of QPS. With that being said, the single-node deployment cannot meet Meituan’s storage requirements.
To ensure the best search experience for Meituan users, the team has strictly restricted the timeout value within all chains of paths. Therefore, it is unacceptable to respond to a query at the second level.
The knowledge graph data is usually stored in data warehouses like Hive. The graph database should be equipped with the capability to quickly import data from such warehouses to the graph storage to ensure service effectiveness.
The Meituan team has tried the top 30 graph databases on DB-Engines and found that most well-known graph databases only support single-node deployment with their open-source edition, for example, Neo4j, ArangoDB, Virtuoso, TigerGraph, RedisGraph. This means that the storage service cannot scale horizontally and the requirement to store large-scale knowledge graph data cannot be met.
After thorough research and comparison, the team has selected the following graph databases for the final round: Nebula Graph (developed by a startup team who originally came from Alibaba), Dgraph (developed by a startup team who originally came from Google), and HugeGraph (developed by Baidu).
#database #tutorial #graph database #database performance #nebula graph #dgraph #graph database adoption
1599897600
Speaking of graph data processing, we have had experience in using various graph databases. In the beginning, we used the stand-alone edition of AgensGraph. Later, due to its performance limitations, we switched to JanusGraph, a distributed graph database. I introduced details on how to migrate data in my article “Migrate tens of billions of graph data into JanusGraph (only in Chinese)”. As the data size and the number of business calls grew, a new problem appeared: Each query consumed too much time. In some business scenarios, a single query took up to 10 seconds, and with increase of the data size, a more complicated single query needed two or three seconds. These problems had seriously affected the performance of the entire business process and the development of related businesses.
The architecture design of JanusGraph determines that a single query is time-consuming. The core reason is that its storage depends on the external storage, and JanusGraph cannot control the external storage well. In our production environment, an HBase cluster is used, which makes it impossible for all queries to be pushed down to the storage layer for processing. Instead, data can only be queried from HBase to the JanusGraph Server memory and then filtered accordingly.
#database #tutorial #graph database #database performance #nebula graph #graph database adoption
1595924640
Parts of the world are still in lockdown, while others are returning to some semblance of normalcy. Either way, while the last few months have given some things pause, they have boosted others. It seems like developments in the world of Graphs are among those that have been boosted.
An abundance of educational material on all things graph has been prepared and delivered online, and is now freely accessible, with more on the way.
Graph databases have been making progress and announcements, repositioning themselves by a combination of releasing new features, securing additional funds, and entering strategic partnerships.
A key graph database technology, RDF*, which enables compatibility between RDF and property graph databases, is gaining momentum and tool support.
And more cutting edge research combining graph AI and knowledge graphs is seeing the light, too. Buckle up and enjoy some graph therapy.
Stanford’s series of online seminars featured some of the world’s leading experts on all things graph. If you missed it, or if you’d like to have an overview of what was said, you can find summaries for each lecture in this series of posts by Bob Kasenchak and Ahren Lehnert. Videos from the lectures are available here.
Stanford University’s computer science department is offering a free class on Knowledge Graphs available to the public. Stanford is also making recordings of the class available via the class website.
Another opportunity to get up to speed with educational material: The entire program of the course “Information Service Engineering” at KIT - Karlsruhe Institute of Technology, is delivered online and made freely available on YouTube. It includes topics such as ontology design, knowledge graph programming, basic graph theory, and more.
Knowledge representation as a prerequisite for knowledge graphs. Learn about knowledge representation, ontologies, RDF(S), OWL, SPARQL, etc.
Ontology may sound like a formal term, while knowledge graph is a more approachable one. But the 2 are related, and so is ontology and AI. Without a consistent, thoughtful approach to developing, applying, evolving an ontology, AI systems lack underpinning that would allow them to be smart enough to make an impact.
The ontology is an investment that will continue to pay off, argue Seth Earley and Josh Bernoff in Harvard Business Review, making the case for how businesses may benefit from a knowldge-centric approach
Even after multiple generations of investments and billions of dollars of digital transformations, organizations struggle to use data to improve customer service, reduce costs, and speed the core processes that provide competitive advantage. AI was supposed to help with that.
Besides AI, knowledge graphs have a part to play in the Cloud, too. State is good, and lack of support for Stateful Cloud-native applications is a roadblock for many enterprise use-cases, writes Dave Duggal.
Graph knowledge bases are an old idea now being revisited to model complex, distributed domains. Combining high-level abstraction with Cloud-native design principles offers efficient “Context-as-a-Service” for hydrating stateless services. Graph knowledge-based systems can enable composition of Cloud-native services into event-driven dataflow processes.
Kubernetes also touches upon Organizational Knowledge, and that may be modeled as a Knowledge Graph.
Extending graph knowledge bases to model distributed systems creates a new kind of information system, one intentionally designed for today’s IT challenges.
The Enterprise Knowledge Graph Foundation was recently established to define best practices and mature the marketplace for EKG adoption, with a launch webinar on June the 23rd.
The Foundation defines its mission as including adopting semantic standards, developing best practices for accelerated EKG deployment, curating a repository of reusable models and resources, building a mechanism for engagement and shared knowledge, and advancing the business cases for EKG adoption.
The Enterprise Knowledge Graph Maturity Model (EKG/MM) is the industry-standard definition of the capabilities required for an enterprise knowledge graph. It establishes standard criteria for measuring progress and sets out the practical questions that all involved stakeholders ask to ensure trust, confidence and usage flexibility of data. Each capability area provides a business summary denoting its importance, a definition of the added value from semantic standards and scoring criteria based on five levels of defined maturity.
Enterprise Knowledge Graphs is what the Semantic Web Company (SWC) and Ontotext have been about for a long time, too. Two of the vendors in this space that have been around for the longer time just announced a strategic partnership: Ontotext, a graph database and platform provider, meets SWC, a management and added value layer that sits on top.
SWC and Ontotext CEOs emphasize how their portfolios are complementary, while the press release states that the companies have implemented a seamless integration of the PoolParty Semantic Suite™ v.8 with the GraphDB™ and Ontotext Platform, which offers benefits for many use cases.
#database #artificial intelligence #graph databases #rdf #graph analytics #knowledge graph #graph technology