1665211740
Parsr, is a minimal-footprint document (image, pdf, docx, eml) cleaning, parsing and extraction toolchain which generates readily available, organized and usable data in JSON, Markdown (MD), CSV/Pandas DF or TXT formats.
It provides analysis, data scientists and developers with clean structured and label-enriched information set for ready-to-use applications ranging from data entry and document analysts automation, archival, and many others.
Currently, Parsr can perform: document cleaning, hierarchy regeneration (words, lines, paragraphs), detection of headings, tables, lists, table of contents, page numbers, headers/footers, links, and others. Check out all the features.
-- The advanced installation guide is available here --
The quickest way to install and run the Parsr API is through the docker image:
docker pull axarev/parsr
If you also wish to install the GUI for sending documents and visualising results:
docker pull axarev/parsr-ui-localhost
Note: Parsr can also be installed bare-metal (not via Docker containers), the procedure for which is documented in the installation guide.
-- The advanced usage guide is available here --
To run the API, issue:
docker run -p 3001:3001 axarev/parsr
which will launch it on http://localhost:3001.
Consult the documentation on the usage of the API.
To access the python client to Parsr API, issue:
pip install parsr-client
To sample the Jupyter Notebook, using the python client, head over to the jupyter demo.
To use the GUI tool (the API needs to already be running), issue:
docker run -t -p 8080:80 axarev/parsr-ui-localhost:latest
Then, access it through http://localhost:8080.
Refer to the Configuration documentation to interpret the configurable options in the GUI viewer.
The API based usage and the command line usage are documented in the advanced usage guide.
Documentation
All documentation files can be found here.
Contribute
Please refer to the contribution guidelines.
Third Party Licenses
Third Party Libraries licenses for its dependencies:
Author: axa-group
Source Code: https://github.com/axa-group/Parsr
License: Apache-2.0 license
#javascript #pdf #data #typescript
1653123600
This repository is a fork of SimpleMDE, made by Sparksuite. Go to the dedicated section for more information.
A drop-in JavaScript text area replacement for writing beautiful and understandable Markdown. EasyMDE allows users who may be less experienced with Markdown to use familiar toolbar buttons and shortcuts.
In addition, the syntax is rendered while editing to clearly show the expected result. Headings are larger, emphasized words are italicized, links are underlined, etc.
EasyMDE also features both built-in auto saving and spell checking. The editor is entirely customizable, from theming to toolbar buttons and javascript hooks.
Via npm:
npm install easymde
Via the UNPKG CDN:
<link rel="stylesheet" href="https://unpkg.com/easymde/dist/easymde.min.css">
<script src="https://unpkg.com/easymde/dist/easymde.min.js"></script>
Or jsDelivr:
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/easymde/dist/easymde.min.css">
<script src="https://cdn.jsdelivr.net/npm/easymde/dist/easymde.min.js"></script>
After installing and/or importing the module, you can load EasyMDE onto the first textarea
element on the web page:
<textarea></textarea>
<script>
const easyMDE = new EasyMDE();
</script>
Alternatively you can select a specific textarea
, via JavaScript:
<textarea id="my-text-area"></textarea>
<script>
const easyMDE = new EasyMDE({element: document.getElementById('my-text-area')});
</script>
Use easyMDE.value()
to get the content of the editor:
<script>
easyMDE.value();
</script>
Use easyMDE.value(val)
to set the content of the editor:
<script>
easyMDE.value('New input for **EasyMDE**');
</script>
true
, force downloads Font Awesome (used for icons). If set to false
, prevents downloading. Defaults to undefined
, which will intelligently check whether Font Awesome has already been included, then download accordingly.true
, focuses the editor automatically. Defaults to false
.true
, saves the text automatically. Defaults to false
.10000
(10 seconds).autosave.delay
or 10000
(10 seconds).locale: en-US, format: hour:minute
.{ delay: 300 }
, it will check every 300 ms if the editor is visible and if positive, call CodeMirror's refresh()
.**
or __
. Defaults to **
.```
or ~~~
. Defaults to ```
.*
or _
. Defaults to *
.*
, -
or +
. Defaults to *
.textarea
element to use. Defaults to the first textarea
element on the page.true
, force text changes made in EasyMDE to be immediately stored in original text area. Defaults to false
.false
, indent using spaces instead of tabs. Defaults to true
.false
by default, preview for images will appear only for images on separate lines.
as argument and returns a string that serves as the src
attribute of the <img>
tag in the preview. Enables dynamic previewing of images in the frontend without having to upload them to a server, allows copy-pasting of images to the editor with preview.["[", "](http://)"]
.true
, enables line numbers in the editor.false
, disable line wrapping. Defaults to true
."500px"
. Defaults to "300px"
.minHeight
option will be ignored. Should be a string containing a valid CSS value like "500px"
. Defaults to undefined
.true
when the editor is currently going into full screen mode, or false
.true
, will render headers without a space after the #
. Defaults to false
.false
, will not process GFM strikethrough syntax. Defaults to true
.true
, let underscores be a delimiter for separating words. Defaults to false
.false
, will replace CSS classes returned by the default Markdown mode. Otherwise the classes returned by the custom mode will be combined with the classes returned by the default mode. Defaults to true
."editor-preview"
.true
, a JS alert window appears asking for the link or image URL. Defaults to false
.URL of the image:
.URL for the link:
.true
, enables the image upload functionality, which can be triggered by drag and drop, copy-paste and through the browse-file window (opened when the user click on the upload-image icon). Defaults to false
.1024 * 1024 * 2
(2 MB).image/png, image/jpeg
.imageMaxSize
, imageAccept
, imageUploadEndpoint
and imageCSRFToken
ineffective.onSuccess
and onError
callback functions as parameters. onSuccess(imageUrl: string)
and onError(errorMessage: string)
{"data": {"filePath": "<filePath>"}}
where filePath is the path of the image (absolute if imagePathAbsolute
is set to true, relative if otherwise);{"error": "<errorCode>"}
, where errorCode can be noFileGiven
(HTTP 400 Bad Request), typeNotAllowed
(HTTP 415 Unsupported Media Type), fileTooLarge
(HTTP 413 Payload Too Large) or importError
(see errorMessages below). If errorCode is not one of the errorMessages, it is alerted unchanged to the user. This allows for server-side error messages. No default value.true
, will treat imageUrl
from imageUploadFunction
and filePath returned from imageUploadEndpoint
as an absolute rather than relative path, i.e. not prepend window.location.origin
to it.imageCSRFToken
has value, defaults to csrfmiddlewaretoken
.true
, passing CSRF token via header. Defaults to false
, which pass CSRF through request body.#image_name#
, #image_size#
and #image_max_size#
will replaced by their respective values, that can be used for customization or internationalization:uploadImage
is set to true
. Defaults to Attach files by drag and dropping or pasting from clipboard.
.Drop image to upload it.
.Uploading images #images_names#
.Uploading #file_name#: #progress#%
.Uploaded #image_name#
.B, KB, MB
(example: 218 KB
). You can use B,KB,MB
instead if you prefer without whitespaces (218KB
).errorCallback
option, where #image_name#
, #image_size#
and #image_max_size#
will replaced by their respective values, that can be used for customization or internationalization:You must select a file.
.imageAccept
list, or the server returned this error code. Defaults to This image type is not allowed.
.imageMaxSize
, or if the server returned this error code. Defaults to Image #image_name# is too big (#image_size#).\nMaximum file size is #image_max_size#.
.Something went wrong when uploading the image #image_name#.
.(errorMessage) => alert(errorMessage)
.true
, will highlight using highlight.js. Defaults to false
. To use this feature you must include highlight.js on your page or pass in using the hljs
option. For example, include the script and the CSS files like:<script src="https://cdn.jsdelivr.net/highlight.js/latest/highlight.min.js"></script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/highlight.js/latest/styles/github.min.css">
window.hljs
), you can provide an instance here. Defaults to undefined
.renderingConfig
options will take precedence.false
, disable parsing GitHub Flavored Markdown (GFM) single line breaks. Defaults to true
.false
, disable the spell checker. Defaults to true
. Optionally pass a CodeMirrorSpellChecker-compliant function.textarea
or contenteditable
. Defaults to textarea
for desktop and contenteditable
for mobile. contenteditable
option is necessary to enable nativeSpellcheck.false
, disable native spell checker. Defaults to true
.false
, allows side-by-side editing without going into fullscreen. Defaults to true
.false
, hide the status bar. Defaults to the array of built-in status bar items.false
, remove the CodeMirror-selectedtext
class from selected lines. Defaults to true
.false
, disable syncing scroll in side by side mode. Defaults to true
.2
.easymde
.false
, hide the toolbar. Defaults to the array of icons.false
, disable toolbar button tips. Defaults to true
.rtl
or ltr
. Changes text direction to support right-to-left languages. Defaults to ltr
.Most options demonstrate the non-default behavior:
const editor = new EasyMDE({
autofocus: true,
autosave: {
enabled: true,
uniqueId: "MyUniqueID",
delay: 1000,
submit_delay: 5000,
timeFormat: {
locale: 'en-US',
format: {
year: 'numeric',
month: 'long',
day: '2-digit',
hour: '2-digit',
minute: '2-digit',
},
},
text: "Autosaved: "
},
blockStyles: {
bold: "__",
italic: "_",
},
unorderedListStyle: "-",
element: document.getElementById("MyID"),
forceSync: true,
hideIcons: ["guide", "heading"],
indentWithTabs: false,
initialValue: "Hello world!",
insertTexts: {
horizontalRule: ["", "\n\n-----\n\n"],
image: [""],
link: ["[", "](https://)"],
table: ["", "\n\n| Column 1 | Column 2 | Column 3 |\n| -------- | -------- | -------- |\n| Text | Text | Text |\n\n"],
},
lineWrapping: false,
minHeight: "500px",
parsingConfig: {
allowAtxHeaderWithoutSpace: true,
strikethrough: false,
underscoresBreakWords: true,
},
placeholder: "Type here...",
previewClass: "my-custom-styling",
previewClass: ["my-custom-styling", "more-custom-styling"],
previewRender: (plainText) => customMarkdownParser(plainText), // Returns HTML from a custom parser
previewRender: (plainText, preview) => { // Async method
setTimeout(() => {
preview.innerHTML = customMarkdownParser(plainText);
}, 250);
return "Loading...";
},
promptURLs: true,
promptTexts: {
image: "Custom prompt for URL:",
link: "Custom prompt for URL:",
},
renderingConfig: {
singleLineBreaks: false,
codeSyntaxHighlighting: true,
sanitizerFunction: (renderedHTML) => {
// Using DOMPurify and only allowing <b> tags
return DOMPurify.sanitize(renderedHTML, {ALLOWED_TAGS: ['b']})
},
},
shortcuts: {
drawTable: "Cmd-Alt-T"
},
showIcons: ["code", "table"],
spellChecker: false,
status: false,
status: ["autosave", "lines", "words", "cursor"], // Optional usage
status: ["autosave", "lines", "words", "cursor", {
className: "keystrokes",
defaultValue: (el) => {
el.setAttribute('data-keystrokes', 0);
},
onUpdate: (el) => {
const keystrokes = Number(el.getAttribute('data-keystrokes')) + 1;
el.innerHTML = `${keystrokes} Keystrokes`;
el.setAttribute('data-keystrokes', keystrokes);
},
}], // Another optional usage, with a custom status bar item that counts keystrokes
styleSelectedText: false,
sideBySideFullscreen: false,
syncSideBySidePreviewScroll: false,
tabSize: 4,
toolbar: false,
toolbarTips: false,
});
Below are the built-in toolbar icons (only some of which are enabled by default), which can be reorganized however you like. "Name" is the name of the icon, referenced in the JavaScript. "Action" is either a function or a URL to open. "Class" is the class given to the icon. "Tooltip" is the small tooltip that appears via the title=""
attribute. Note that shortcut hints are added automatically and reflect the specified action if it has a key bind assigned to it (i.e. with the value of action
set to bold
and that of tooltip
set to Bold
, the final text the user will see would be "Bold (Ctrl-B)").
Additionally, you can add a separator between any icons by adding "|"
to the toolbar array.
Name | Action | Tooltip Class |
---|---|---|
bold | toggleBold | Bold fa fa-bold |
italic | toggleItalic | Italic fa fa-italic |
strikethrough | toggleStrikethrough | Strikethrough fa fa-strikethrough |
heading | toggleHeadingSmaller | Heading fa fa-header |
heading-smaller | toggleHeadingSmaller | Smaller Heading fa fa-header |
heading-bigger | toggleHeadingBigger | Bigger Heading fa fa-lg fa-header |
heading-1 | toggleHeading1 | Big Heading fa fa-header header-1 |
heading-2 | toggleHeading2 | Medium Heading fa fa-header header-2 |
heading-3 | toggleHeading3 | Small Heading fa fa-header header-3 |
code | toggleCodeBlock | Code fa fa-code |
quote | toggleBlockquote | Quote fa fa-quote-left |
unordered-list | toggleUnorderedList | Generic List fa fa-list-ul |
ordered-list | toggleOrderedList | Numbered List fa fa-list-ol |
clean-block | cleanBlock | Clean block fa fa-eraser |
link | drawLink | Create Link fa fa-link |
image | drawImage | Insert Image fa fa-picture-o |
table | drawTable | Insert Table fa fa-table |
horizontal-rule | drawHorizontalRule | Insert Horizontal Line fa fa-minus |
preview | togglePreview | Toggle Preview fa fa-eye no-disable |
side-by-side | toggleSideBySide | Toggle Side by Side fa fa-columns no-disable no-mobile |
fullscreen | toggleFullScreen | Toggle Fullscreen fa fa-arrows-alt no-disable no-mobile |
guide | This link | Markdown Guide fa fa-question-circle |
undo | undo | Undo fa fa-undo |
redo | redo | Redo fa fa-redo |
Customize the toolbar using the toolbar
option.
Only the order of existing buttons:
const easyMDE = new EasyMDE({
toolbar: ["bold", "italic", "heading", "|", "quote"]
});
All information and/or add your own icons
const easyMDE = new EasyMDE({
toolbar: [
{
name: "bold",
action: EasyMDE.toggleBold,
className: "fa fa-bold",
title: "Bold",
},
"italics", // shortcut to pre-made button
{
name: "custom",
action: (editor) => {
// Add your own code
},
className: "fa fa-star",
title: "Custom Button",
attributes: { // for custom attributes
id: "custom-id",
"data-value": "custom value" // HTML5 data-* attributes need to be enclosed in quotation marks ("") because of the dash (-) in its name.
}
},
"|" // Separator
// [, ...]
]
});
Put some buttons on dropdown menu
const easyMDE = new EasyMDE({
toolbar: [{
name: "heading",
action: EasyMDE.toggleHeadingSmaller,
className: "fa fa-header",
title: "Headers",
},
"|",
{
name: "others",
className: "fa fa-blind",
title: "others buttons",
children: [
{
name: "image",
action: EasyMDE.drawImage,
className: "fa fa-picture-o",
title: "Image",
},
{
name: "quote",
action: EasyMDE.toggleBlockquote,
className: "fa fa-percent",
title: "Quote",
},
{
name: "link",
action: EasyMDE.drawLink,
className: "fa fa-link",
title: "Link",
}
]
},
// [, ...]
]
});
EasyMDE comes with an array of predefined keyboard shortcuts, but they can be altered with a configuration option. The list of default ones is as follows:
Shortcut (Windows / Linux) | Shortcut (macOS) | Action |
---|---|---|
Ctrl-' | Cmd-' | "toggleBlockquote" |
Ctrl-B | Cmd-B | "toggleBold" |
Ctrl-E | Cmd-E | "cleanBlock" |
Ctrl-H | Cmd-H | "toggleHeadingSmaller" |
Ctrl-I | Cmd-I | "toggleItalic" |
Ctrl-K | Cmd-K | "drawLink" |
Ctrl-L | Cmd-L | "toggleUnorderedList" |
Ctrl-P | Cmd-P | "togglePreview" |
Ctrl-Alt-C | Cmd-Alt-C | "toggleCodeBlock" |
Ctrl-Alt-I | Cmd-Alt-I | "drawImage" |
Ctrl-Alt-L | Cmd-Alt-L | "toggleOrderedList" |
Shift-Ctrl-H | Shift-Cmd-H | "toggleHeadingBigger" |
F9 | F9 | "toggleSideBySide" |
F11 | F11 | "toggleFullScreen" |
Here is how you can change a few, while leaving others untouched:
const editor = new EasyMDE({
shortcuts: {
"toggleOrderedList": "Ctrl-Alt-K", // alter the shortcut for toggleOrderedList
"toggleCodeBlock": null, // unbind Ctrl-Alt-C
"drawTable": "Cmd-Alt-T", // bind Cmd-Alt-T to drawTable action, which doesn't come with a default shortcut
}
});
Shortcuts are automatically converted between platforms. If you define a shortcut as "Cmd-B", on PC that shortcut will be changed to "Ctrl-B". Conversely, a shortcut defined as "Ctrl-B" will become "Cmd-B" for Mac users.
The list of actions that can be bound is the same as the list of built-in actions available for toolbar buttons.
You can catch the following list of events: https://codemirror.net/doc/manual.html#events
const easyMDE = new EasyMDE();
easyMDE.codemirror.on("change", () => {
console.log(easyMDE.value());
});
You can revert to the initial text area by calling the toTextArea
method. Note that this clears up the autosave (if enabled) associated with it. The text area will retain any text from the destroyed EasyMDE instance.
const easyMDE = new EasyMDE();
// ...
easyMDE.toTextArea();
easyMDE = null;
If you need to remove registered event listeners (when the editor is not needed anymore), call easyMDE.cleanup()
.
The following self-explanatory methods may be of use while developing with EasyMDE.
const easyMDE = new EasyMDE();
easyMDE.isPreviewActive(); // returns boolean
easyMDE.isSideBySideActive(); // returns boolean
easyMDE.isFullscreenActive(); // returns boolean
easyMDE.clearAutosavedValue(); // no returned value
EasyMDE is a continuation of SimpleMDE.
SimpleMDE began as an improvement of lepture's Editor project, but has now taken on an identity of its own. It is bundled with CodeMirror and depends on Font Awesome.
CodeMirror is the backbone of the project and parses much of the Markdown syntax as it's being written. This allows us to add styles to the Markdown that's being written. Additionally, a toolbar and status bar have been added to the top and bottom, respectively. Previews are rendered by Marked using GitHub Flavored Markdown (GFM).
I originally made this fork to implement FontAwesome 5 compatibility into SimpleMDE. When that was done I submitted a pull request, which has not been accepted yet. This, and the project being inactive since May 2017, triggered me to make more changes and try to put new life into the project.
Changes include:
https://
by defaultMy intention is to continue development on this project, improving it and keeping it alive.
You may want to edit this library to adapt its behavior to your needs. This can be done in some quick steps:
gulp
command, which will generate files: dist/easymde.min.css
and dist/easymde.min.js
;Want to contribute to EasyMDE? Thank you! We have a contribution guide just for you!
Author: Ionaru
Source Code: https://github.com/Ionaru/easy-markdown-editor
License: MIT license
1620466520
If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.
If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.
In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.
#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition
1594369800
SQL stands for Structured Query Language. SQL is a scripting language expected to store, control, and inquiry information put away in social databases. The main manifestation of SQL showed up in 1974, when a gathering in IBM built up the principal model of a social database. The primary business social database was discharged by Relational Software later turning out to be Oracle.
Models for SQL exist. In any case, the SQL that can be utilized on every last one of the major RDBMS today is in various flavors. This is because of two reasons:
1. The SQL order standard is genuinely intricate, and it isn’t handy to actualize the whole standard.
2. Every database seller needs an approach to separate its item from others.
Right now, contrasts are noted where fitting.
#programming books #beginning sql pdf #commands sql #download free sql full book pdf #introduction to sql pdf #introduction to sql ppt #introduction to sql #practical sql pdf #sql commands pdf with examples free download #sql commands #sql free bool download #sql guide #sql language #sql pdf #sql ppt #sql programming language #sql tutorial for beginners #sql tutorial pdf #sql #structured query language pdf #structured query language ppt #structured query language
1620629020
The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.
This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.
As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).
This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.
#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management
1617959340
Companies across every industry rely on big data to make strategic decisions about their business, which is why data analyst roles are constantly in demand. Even as we transition to more automated data collection systems, data analysts remain a crucial piece in the data puzzle. Not only do they build the systems that extract and organize data, but they also make sense of it –– identifying patterns, trends, and formulating actionable insights.
If you think that an entry-level data analyst role might be right for you, you might be wondering what to focus on in the first 90 days on the job. What skills should you have going in and what should you focus on developing in order to advance in this career path?
Let’s take a look at the most important things you need to know.
#data #data-analytics #data-science #data-analysis #big-data-analytics #data-privacy #data-structures #good-company