Nat  Grady

Nat Grady

1677734665

Amsi-Killer: Lifetime AMSI bypass

Lifetime AMSI bypass


Opcode Scan

We get the exact address of the jump instruction by searching for the first byte of each instruction this technique is effective even in the face of updates or modifications to the target data set.

for example :

| 48:85D2 | test rdx, rdx |

| 74 3F | je amsi.7FFAE957C694 |

| 48 : 85C9 | test rcx, rcx |

| 74 3A | je amsi.7FFAE957C694 |

| 48 : 8379 08 00 | cmp qword ptr ds : [rcx + 8] , 0 |

| 74 33 | je amsi.7FFAE957C694 |

the search pattern will be like this :

{ 0x48,'?','?', 0x74,'?',0x48,'?' ,'?' ,0x74,'?' ,0x48,'?' ,'?' ,'?' ,'?',0x74,0x33}

image

Patch

Before Patch

The program tests the value of RDX against itself. If the comparison evaluates to 0, the program executes a jump to return. Otherwise, the program proceeds to evaluate the next instruction

image

we cant execute "Invoke-Mimikatz"

image

After Patch

we patch the first byte and change it from JE to JMP so it return directly

Screenshot 2023-02-26 195848

image

now we execute "Invoke-Mimikatz"

Screenshot 2023-02-26 195914


Download Details:

Author: ZeroMemoryEx
Source Code: https://github.com/ZeroMemoryEx/Amsi-Killer 

#cpluplus #win 

Amsi-Killer: Lifetime AMSI bypass

How to Fix Python [WinError 3] The System Cannot Find The Path Specifi

Python error [WinError 3] is a variation of the [WinError 2] error. The complete error message is as follows:

FileNotFoundError: [WinError 3] The system cannot find the path specified

This error usually occurs when you use the Python os module to interact with the Windows filesystem.

While [WinError 2] means that a file can’t be found, [WinError 3] means that the path you specified doesn’t exist.

This article will show you several examples that can cause this error and how to fix it.

1. Typed the wrong name when calling os.listdir() method

Suppose you have the following directory structure on your computer:

.
├── assets
│   ├── image.png
│   └── photo.png
└── main.py

Next, suppose you try to get the names of the files inside the assets/ directory using the os.listdir() method.

But you specified the wrong directory name when calling the method as follows:

import os

files = os.listdir("asset")
print(files)

Because you have an assets folder and not asset, the os module can’t find the directory:

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    files = os.listdir("asset")
            ^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'asset'

To fix this error, make sure that the path you passed as the parameter to the listdir() method exists.

For our example, replacing asset with assets should fix the error.

2. Specifying non-existent path in os.rename() method.

The error also appears when you specified the wrong path name when calling the os.rename() method.

Suppose you have a different directory structure on your computer as follows:

.
├── docs
│   └── file.txt  
└── main.py

Now, you want to rename the file.txt file into article.txt file.

You called the os.rename() method as follows:

import os

os.rename("doc/file.txt", "article.txt")

The code above incorrectly specified the path docs/ as doc/, so the error is triggered:

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    os.rename("doc/file.txt", "article.txt")
FileNotFoundError: [WinError 3] The system cannot find the path specified: 
'doc/file.txt' -> 'article.txt'

To fix this error, you need to specify the correct path to the file, which is docs/file.txt.

Please note that the extension of the file must be specified in the arguments. If you type file.txt as file, then you’ll get the same error.

This is because Python will think you’re instructing to rename a directory or folder and not a file.

When renaming a file, always include the file extension.

Also, keep in mind that directory and file names are case-sensitive, so you need to use the right capitalization.

3. Use absolute path instead of relative

At times, you might want to access a folder or file that’s a bit difficult to reach from the location of your script.

Suppose you have a directory structure as follows on your computer:

. C:
├── assets
│   └── text
│       └── file.txt
└── scripts
    └── test
        └──  main.py

In this structure, the path to main.py is C:/scripts/test/main.py, while the file.txt is in C:/assets/text/file.txt

Suppose you want to rename file.txt to article.txt, this is how you specify the name with relative paths:

import os

os.rename("../../assets/text/file.txt", "../../assets/text/article.txt")

It’s easy for you to specify the wrong path when using relative paths, so it’s recommended to use absolute paths when the path is complex.

Here’s what the arguments look like using absolute paths:

import os

os.rename("C:/assets/text/file.txt", "C:/assets/text/article.txt")

As you can see, absolute paths are easier to read and understand. In Windows, the absolute path usually starts with the drive letter you have in your system like C: or D:.

To find the absolute path of a file, right-click on the file and select Properties from the context menu.

You’ll see the location of the file as follows:

File location in WindowsFile location in Windows

Add the file name at the end of the location path, and you get the absolute path.

I hope this tutorial is helpful. See you in other articles! 👋

Original article source at: https://sebhastian.com/

#python #system #win #error 

How to Fix Python [WinError 3] The System Cannot Find The Path Specifi
Rupert  Beatty

Rupert Beatty

1673932097

Swift-win32: A Windows Application Framework for Swift

Swift/Win32 - A Swift Application Framework for Windows

Swift/Win32 aims to provide a MVC model for writing applications on Windows. It provides Swift friendly wrapping of the Win32 APIs much like MFC did for C++.

Swift/Win32 Screenshot

Build Requirements

  • Swift 5.4 or newer
  • Windows SDK 10.0.107763 or newer
  • CMake 3.16 or newer

Building

This project requires Swift 5.4 or newer. You can use the the snapshot binaries from swift.org, download the nightly build from Azure, or build the Swift compiler from source.

Recommended (CMake)

The following example session shows how to build with CMake 3.16 or newer.

cmake -B build -D BUILD_SHARED_LIBS=YES -D CMAKE_BUILD_TYPE=Release -D CMAKE_Swift_FLAGS="-sdk %SDKROOT%" -G Ninja -S .
ninja -C build SwiftWin32 UICatalog

%CD%\build\bin\UICatalog.exe

Swift Package Manager

Building this project with swift-package-manager is supported although CMake is recommended for ease. The Swift Package Manager based build is required for code completion via SourceKit-LSP. It also allows for the use of Swift/Win32 in other applications using SPM. In order to use SPM to build this project additional post-build steps are required to use the demo applications.

The following known limitations are known:

  1. It is not possible to deploy auxiliary files which are required for Swift/Win32 based applications to function to the correct location.
  2. It is not possible to build and run multiple demo projects as the auxiliary files collide.
swift build --product UICatalog
mt -nologo -manifest Examples\UICatalog\UICatalog.exe.manifest -outputresource:.build\x86_64-unknown-windows-msvc\debug\UICatalog.exe
copy Examples\UICatalog\Info.plist .build\x86_64-unknown-windows-msvc\debug\
.build\x86_64-unknown-windows-msvc\debug\UICatalog.exe

In order to get access to the manifest tool (mt), the build and testing should occur in a x64 Native Tools Command Prompt for VS2019

Testing

The current implementation is still under flux and many of the interfaces we expect to be present are not yet implemented. Because clearly indicating the missing surface makes it easier to focus on what needs to be accomplished, there are many instances of interfaces being declared but not implemented. Most of these sites will abort if they are reached. In order to enable testing for scenarios which may interct with these cases, a special condition has been added as ENABLE_TESTING to allow us to bypass the missing functionality.

You can run tests by adding that as a flag when invoking the SPM test command as:

swift test -Xswiftc -DENABLE_TESTING

Download Details:

Author: Compnerd
Source Code: https://github.com/compnerd/swift-win32 
License: BSD-3-Clause license

#swift #window  #ui #system #win

Swift-win32: A Windows Application Framework for Swift
Rylan  Becker

Rylan Becker

1663769400

Libraries for Working with Human Languages in Popular Python

In this Python article, let's learn about Natural Language Processing: Libraries for Working with Human Languages in Popular Python

Table of contents:

  • General
    • gensim - Topic Modeling for Humans.
    • langid.py - Stand-alone language identification system.
    • nltk - A leading platform for building Python programs to work with human language data.
    • pattern - A web mining module.
    • polyglot - Natural language pipeline supporting hundreds of languages.
    • pytext - A natural language modeling framework based on PyTorch.
    • PyTorch-NLP - A toolkit enabling rapid deep learning NLP prototyping for research.
    • spacy - A library for industrial-strength natural language processing in Python and Cython.
    • Stanza - The Stanford NLP Group's official Python library, supporting 60+ languages.
  • Chinese
    • funNLP - A collection of tools and datasets for Chinese NLP.
    • jieba - The most popular Chinese text segmentation library.
    • pkuseg-python - A toolkit for Chinese word segmentation in various domains.
    • snownlp - A library for processing Chinese text.

What is Python?

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

what is Natural Language Processing in python?

Natural language processing (NLP) is a field that focuses on making natural human language usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP. A lot of the data that you could be analyzing is unstructured data and contains human-readable text.


Libraries for Working with Human Languages in Popular  Python

  1. gensim

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Installation

This software depends on NumPy and Scipy, two Python packages for scientific computing. You must have them installed prior to installing gensim.

It is also recommended you install a fast BLAS library before installing NumPy. This is optional, but using an optimized BLAS such as MKL, ATLAS or OpenBLAS is known to improve performance by as much as an order of magnitude. On OSX, NumPy picks up its vecLib BLAS automatically, so you don’t need to do anything special.

Install the latest version of gensim:

    pip install --upgrade gensim

Or, if you have instead downloaded and unzipped the source tar.gz package:

    python setup.py install

For alternative modes of installation, see the documentation.

Gensim is being continuously tested under all supported Python versions. Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

View on GitHub


2.  Langid.py

langid.py is a standalone Language Identification (LangID) tool.

The design principles are as follows:

  1. Fast
  2. Pre-trained over a large number of languages (currently 97)
  3. Not sensitive to domain-specific features (e.g. HTML/XML markup)
  4. Single .py file with minimal dependencies
  5. Deployable as a web service

All that is required to run langid.py is >= Python 2.7 and numpy. The main script langid/langid.py is cross-compatible with both Python2 and Python3, but the accompanying training tools are still Python2-only.

langid.py is WSGI-compliant. langid.py will use fapws3 as a web server if available, and default to wsgiref.simple_server otherwise.

langid.py comes pre-trained on 97 languages (ISO 639-1 codes given):

af, am, an, ar, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, gu, he, hi, hr, ht, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lb, lo, lt, lv, mg, mk, ml, mn, mr, ms, mt, nb, ne, nl, nn, no, oc, or, pa, pl, ps, pt, qu, ro, ru, rw, se, si, sk, sl, sq, sr, sv, sw, ta, te, th, tl, tr, ug, uk, ur, vi, vo, wa, xh, zh, zu

The training data was drawn from 5 different sources:

  • JRC-Acquis
  • ClueWeb 09
  • Wikipedia
  • Reuters RCV2
  • Debian i18n

View on GitHub


3.  NLTK

NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. NLTK requires Python version 3.7, 3.8, 3.9 or 3.10.

For documentation, please visit nltk.org.

Contributing

Do you want to contribute to NLTK development? Great! Please read CONTRIBUTING.md for more details.

See also how to contribute to NLTK.

Donate

Have you found the toolkit helpful? Please support NLTK development by donating to the project via PayPal, using the link on the NLTK homepage.

Citing

If you publish work that uses NLTK, please cite the NLTK book, as follows:

Bird, Steven, Edward Loper and Ewan Klein (2009).
Natural Language Processing with Python.  O'Reilly Media Inc.

View on GitHub


4.  Pattern

Pattern is a web mining module for Python. It has tools for:

  • Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
  • Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
  • Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
  • Network Analysis: graph centrality and visualization.

It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD.

Example workflow

Example

This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled WIN or FAIL. The classifier uses the vectors to learn which other tweets look more like WIN or more like FAIL.

from pattern.web import Twitter
from pattern.en import tag
from pattern.vector import KNN, count

twitter, knn = Twitter(), KNN()

for i in range(1, 3):
    for tweet in twitter.search('#win OR #fail', start=i, count=100):
        s = tweet.text.lower()
        p = '#win' in s and 'WIN' or 'FAIL'
        v = tag(s)
        v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
        v = count(v) # {'sweet': 1}
        if v:
            knn.train(v, type=p)

print(knn.classify('sweet potato burger'))
print(knn.classify('stupid autocorrect'))

View on GitHub


5.  polyglot

Polyglot is a natural language pipeline that supports massive multilingual applications.

Features

  • Tokenization (165 Languages)
  • Language detection (196 Languages)
  • Named Entity Recognition (40 Languages)
  • Part of Speech Tagging (16 Languages)
  • Sentiment Analysis (136 Languages)
  • Word Embeddings (137 Languages)
  • Morphological analysis (135 Languages)
  • Transliteration (69 Languages)

Developer

  • Rami Al-Rfou @ rmyeid gmail com

Quick Tutorial

import polyglot
from polyglot.text import Text, Word

Language Detection

text = Text("Bonjour, Mesdames.")
print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))
Language Detected: Code=fr, Name=French

Tokenization

zen = Text("Beautiful is better than ugly. "
           "Explicit is better than implicit. "
           "Simple is better than complex.")
print(zen.words)
[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']
print(zen.sentences)
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple i

View on GitHub


6.  PyText

PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapid experimentation and of serving models at scale. It achieves this by providing simple and extensible interfaces and abstractions for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We are using PyText in Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale.


Installing PyText

PyText requires Python 3.6.1 or above.

To get started on a Cloud VM, check out our guide.

Get the source code:

  $ git clone https://github.com/facebookresearch/pytext
  $ cd pytext

Create a virtualenv and install PyText:

  $ python3 -m venv pytext_venv
  $ source pytext_venv/bin/activate
  (pytext_venv) $ pip install pytext-nlp

Detailed instructions and more installation options can be found in our Documentation. If you encounter issues with missing dependencies during installation, please refer to OS Dependencies.

View on GitHub


7.  PyTorch-NLP

Basic Utilities for PyTorch Natural Language Processing (NLP)
PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. torchnlp extends PyTorch to provide you with basic text data processing functions.

Installation 🐾

Make sure you have Python 3.6+ and PyTorch 1.0+. You can then install pytorch-nlp using pip:

pip install pytorch-nlp

Or to install the latest code via:

pip install git+https://github.com/PetrochukM/PyTorch-NLP.git

Docs

The complete documentation for PyTorch-NLP is available via our ReadTheDocs website.

Get Started

Within an NLP data pipeline, you'll want to implement these basic steps:

1. Load your Data 🐿

Load the IMDB dataset, for example:

from torchnlp.datasets import imdb_dataset

# Load the imdb training dataset
train = imdb_dataset(train=True)
train[0]  # RETURNS: {'text': 'For a movie that gets..', 'sentiment': 'pos'}

Load a custom dataset, for example:

from pathlib import Path

from torchnlp.download import download_file_maybe_extract

directory_path = Path('data/')
train_file_path = Path('trees/train.txt')

download_file_maybe_extract(
    url='http://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip',
    directory=directory_path,
    check_files=[train_file_path])

open(directory_path / train_file_path)

Don't worry we'll handle caching for you!

View on GitHub


8.  spaCy

spaCy is a library for advanced Natural Language Processing in Python and Cython. It's built on the very latest research, and was designed from day one to be used in real products.

spaCy comes with pretrained pipelines and currently supports tokenization and training for 60+ languages. It features state-of-the-art speed and neural network models for tagging, parsing, named entity recognition, text classification and more, multi-task learning with pretrained transformers like BERT, as well as a production-ready training system and easy model packaging, deployment and workflow management. spaCy is commercial open-source software, released under the MIT license.

Features

  • Support for 60+ languages
  • Trained pipelines for different languages and tasks
  • Multi-task learning with pretrained transformers like BERT
  • Support for pretrained word vectors and embeddings
  • State-of-the-art speed
  • Production-ready training system
  • Linguistically-motivated tokenization
  • Components for named entity recognition, part-of-speech-tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more
  • Easily extensible with custom components and attributes
  • Support for custom models in PyTorch, TensorFlow and other frameworks
  • Built in visualizers for syntax and NER
  • Easy model packaging, deployment and workflow management
  • Robust, rigorously evaluated accuracy

📖 For more details, see the facts, figures and benchmarks.

View on GitHub | View on source


9.  Stanza

Official Stanford NLP Python Library for Many Human Languages


The Stanford NLP Group's official Python NLP library. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python. For detailed information please visit our official website.

🔥  A new collection of biomedical and clinical English model packages are now available, offering seamless experience for syntactic analysis and named entity recognition (NER) from biomedical literature text and clinical notes. For more information, check out our Biomedical models documentation page.

References

If you use this library in your research, please kindly cite our ACL2020 Stanza system demo paper:

@inproceedings{qi2020stanza,
    title={Stanza: A {Python} Natural Language Processing Toolkit for Many Human Languages},
    author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    year={2020}
}

If you use our biomedical and clinical models, please also cite our Stanza Biomedical Models description paper:

@article{zhang2021biomedical,
    author = {Zhang, Yuhao and Zhang, Yuhui and Qi, Peng and Manning, Christopher D and Langlotz, Curtis P},
    title = {Biomedical and clinical {E}nglish model packages for the {S}tanza {P}ython {NLP} library},
    journal = {Journal of the American Medical Informatics Association},
    year = {2021},
    month = {06},
    issn = {1527-974X}
}

The PyTorch implementation of the neural pipeline in this repository is due to Peng Qi (@qipeng), Yuhao Zhang (@yuhaozhang), and Yuhui Zhang (@yuhui-zh15), with help from Jason Bolton (@j38), Tim Dozat (@tdozat) and John Bauer (@AngledLuffa). Maintenance of this repo is currently led by John Bauer.

If you use the CoreNLP software through Stanza, please cite the CoreNLP software package and the respective modules as described here ("Citing Stanford CoreNLP in papers"). The CoreNLP client is mostly written by Arun Chaganty, and Jason Bolton spearheaded merging the two projects together.

View on GitHub


10.  funNLP

The Most Powerful NLP-Weapon Arsenal

NLP民工的乐园: 几乎最全的中文NLP资源库

在入门到熟悉NLP的过程中,用到了很多github上的包,遂整理了一下,分享在这里。

很多包非常有趣,值得收藏,满足大家的收集癖! 如果觉得有用,请分享并star⭐,谢谢!

长期不定时更新,欢迎watch和fork!❤️❤️❤️❤️❤️

View on GitHub


11.  jieba

“结巴”中文分词:做最好的 Python 中文分词组件

"Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation module.

  • Scroll down for English documentation.

特点

  • 支持四种分词模式:
    • 精确模式,试图将句子最精确地切开,适合文本分析;
    • 全模式,把句子中所有的可以成词的词语都扫描出来, 速度非常快,但是不能解决歧义;
    • 搜索引擎模式,在精确模式的基础上,对长词再次切分,提高召回率,适合用于搜索引擎分词。
    • paddle模式,利用PaddlePaddle深度学习框架,训练序列标注(双向GRU)网络模型实现分词。同时支持词性标注。paddle模式使用需安装paddlepaddle-tiny,pip install paddlepaddle-tiny==1.6.1。目前paddle模式支持jieba v0.40及以上版本。jieba v0.40以下版本,请升级jieba,pip install jieba --upgradePaddlePaddle官网
  • 支持繁体分词
  • 支持自定义词典
  • MIT 授权协议

安装说明

代码对 Python 2/3 均兼容

  • 全自动安装:easy_install jieba 或者 pip install jieba / pip3 install jieba
  • 半自动安装:先下载 http://pypi.python.org/pypi/jieba/ ,解压后运行 python setup.py install
  • 手动安装:将 jieba 目录放置于当前目录或者 site-packages 目录
  • 通过 import jieba 来引用
  • 如果需要使用paddle模式下的分词和词性标注功能,请先安装paddlepaddle-tiny,pip install paddlepaddle-tiny==1.6.1

View on GitHub


12.  pkuseg-python

pkuseg 是基于论文[Luo et. al, 2019]的工具包。其简单易用,支持细分领域分词,有效提升了分词准确度。

主要亮点

pkuseg具有如下几个特点:

  1. 多领域分词。不同于以往的通用中文分词工具,此工具包同时致力于为不同领域的数据提供个性化的预训练模型。根据待分词文本的领域特点,用户可以自由地选择不同的模型。 我们目前支持了新闻领域,网络领域,医药领域,旅游领域,以及混合领域的分词预训练模型。在使用中,如果用户明确待分词的领域,可加载对应的模型进行分词。如果用户无法确定具体领域,推荐使用在混合领域上训练的通用模型。各领域分词样例可参考 example.txt
  2. 更高的分词准确率。相比于其他的分词工具包,当使用相同的训练数据和测试数据,pkuseg可以取得更高的分词准确率。
  3. 支持用户自训练模型。支持用户使用全新的标注数据进行训练。
  4. 支持词性标注。

View on GitHub


13.  SnowNLP

Python library for processing Chinese text


SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob不同的是,这里没有用NLTK,所有的算法都是自己实现的,并且自带了一些训练好的字典。注意本程序都是处理的unicode编码,所以使用时请自行decode成unicode。

from snownlp import SnowNLP

s = SnowNLP(u'这个东西真心很赞')

s.words         # [u'这个', u'东西', u'真心',
                #  u'很', u'赞']

s.tags          # [(u'这个', u'r'), (u'东西', u'n'),
                #  (u'真心', u'd'), (u'很', u'd'),
                #  (u'赞', u'Vg')]

s.sentiments    # 0.9769663402895832 positive的概率

s.pinyin        # [u'zhe', u'ge', u'dong', u'xi',
                #  u'zhen', u'xin', u'hen', u'zan']

s = SnowNLP(u'「繁體字」「繁體中文」的叫法在臺灣亦很常見。')

s.han           # u'「繁体字」「繁体中文」的叫法
                # 在台湾亦很常见。'

text = u'''
自然语言处理是计算机科学领域与人工智能领域中的一个重要方向。
它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。
自然语言处理是一门融语言学、计算机科学、数学于一体的科学。
因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,
所以它与语言学的研究有着密切的联系,但又有重要的区别。
自然语言处理并不是一般地研究自然语言,
而在于研制能有效地实现自然语言通信的计算机系统,
特别是其中的软件系统。因而它是计算机科学的一部分。
'''

s = SnowNLP(text)

s.keywords(3)	# [u'语言', u'自然', u'计算机']

s.summary(3)	# [u'因而它是计算机科学的一部分',
                #  u'自然语言处理是一门融语言学、计算机科学、
				#	 数学于一体的科学',
				#  u'自然语言处理是计算机科学领域与人工智能
				#	 领域中的一个重要方向']
s.sentences

s = SnowNLP([[u'这篇', u'文章'],
             [u'那篇', u'论文'],
             [u'这个']])
s.tf
s.idf
s.sim([u'文章'])# [0.3756070762985226, 0, 0]

View on GitHub


FAQ About Natural Language Processing in Python

  • What is natural language processing with example?

Natural language processing (NLP) describes the interaction between human language and computers. It's a technology that many people use daily and has been around for years, but is often taken for granted. A few examples of NLP that people use every day are: Spell check. Autocomplete.

  • What is NLTK used for in Python?

The Natural Language Toolkit (NLTK) is a platform used for building Python programs that work with human language data for applying in statistical natural language processing (NLP). It contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning.

  • Which Python library is used for natural language processing?

NLTK is an essential library supports tasks such as classification, stemming, tagging, parsing, semantic reasoning, and tokenization in Python. It's basically your main tool for natural language processing and machine learning.

  • Why Python is used for NLP?

Natural language processing (NLP) is a field that focuses on making natural human language usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP.


Related videos:

Natural Language Processing in Python


Related posts:

#python 

Libraries for Working with Human Languages in Popular  Python

Pattern: A Web Mining Module for Python

Pattern

Pattern is a web mining module for Python. It has tools for:

  • Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
  • Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
  • Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
  • Network Analysis: graph centrality and visualization.

It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD.

Example workflow

Example

This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled WIN or FAIL. The classifier uses the vectors to learn which other tweets look more like WIN or more like FAIL.

from pattern.web import Twitter
from pattern.en import tag
from pattern.vector import KNN, count

twitter, knn = Twitter(), KNN()

for i in range(1, 3):
    for tweet in twitter.search('#win OR #fail', start=i, count=100):
        s = tweet.text.lower()
        p = '#win' in s and 'WIN' or 'FAIL'
        v = tag(s)
        v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
        v = count(v) # {'sweet': 1}
        if v:
            knn.train(v, type=p)

print(knn.classify('sweet potato burger'))
print(knn.classify('stupid autocorrect'))

Installation

Pattern supports Python 2.7 and Python 3.6. To install Pattern so that it is available in all your scripts, unzip the download and from the command line do:

cd pattern-3.6
python setup.py install

If you have pip, you can automatically download and install from the PyPI repository:

pip install pattern

If none of the above works, you can make Python aware of the module in three ways:

  • Put the pattern folder in the same folder as your script.
  • Put the pattern folder in the standard location for modules so it is available to all scripts:
    • c:\python36\Lib\site-packages\ (Windows),
    • /Library/Python/3.6/site-packages/ (Mac OS X),
    • /usr/lib/python3.6/site-packages/ (Unix).
  • Add the location of the module to sys.path in your script, before importing it:
MODULE = '/users/tom/desktop/pattern'
import sys; if MODULE not in sys.path: sys.path.append(MODULE)
from pattern.en import parsetree

Documentation

For documentation and examples see the user documentation.

Version

3.6

License

BSD, see LICENSE.txt for further details.

Reference

De Smedt, T., Daelemans, W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2031–2035.

Contribute

The source code is hosted on GitHub and contributions or donations are welcomed.

Bundled dependencies

Pattern is bundled with the following data sets, algorithms and Python packages:

  • Brill tagger, Eric Brill
  • Brill tagger for Dutch, Jeroen Geertzen
  • Brill tagger for German, Gerold Schneider & Martin Volk
  • Brill tagger for Spanish, trained on Wikicorpus (Samuel Reese & Gemma Boleda et al.)
  • Brill tagger for French, trained on Lefff (Benoît Sagot & Lionel Clément et al.)
  • Brill tagger for Italian, mined from Wiktionary
  • English pluralization, Damian Conway
  • Spanish verb inflection, Fred Jehle
  • French verb inflection, Bob Salita
  • Graph JavaScript framework, Aslak Hellesoy & Dave Hoover
  • LIBSVM, Chih-Chung Chang & Chih-Jen Lin
  • LIBLINEAR, Rong-En Fan et al.
  • NetworkX centrality, Aric Hagberg, Dan Schult & Pieter Swart
  • spelling corrector, Peter Norvig

Acknowledgements

Authors:

Contributors (chronological):

  • Frederik De Bleser
  • Jason Wiener
  • Daniel Friesen
  • Jeroen Geertzen
  • Thomas Crombez
  • Ken Williams
  • Peteris Erins
  • Rajesh Nair
  • F. De Smedt
  • Radim Řehůřek
  • Tom Loredo
  • John DeBovis
  • Thomas Sileo
  • Gerold Schneider
  • Martin Volk
  • Samuel Joseph
  • Shubhanshu Mishra
  • Robert Elwell
  • Fred Jehle
  • Antoine Mazières + fabelier.org
  • Rémi de Zoeten + closealert.nl
  • Kenneth Koch
  • Jens Grivolla
  • Fabio Marfia
  • Steven Loria
  • Colin Molter + tevizz.com
  • Peter Bull
  • Maurizio Sambati
  • Dan Fu
  • Salvatore Di Dio
  • Vincent Van Asch
  • Frederik Elwert

Author: clips
Source Code: https://github.com/clips/pattern
License: BSD-3-Clause License

#python #pattern 

Pattern: A Web Mining Module for Python
曾 俊

曾 俊

1649352540

Windows 11 导致 SSD固态硬盘性能下降?

 近日,微软承认了Win11的一个新问题,那就是系统会显著影响存储驱动器(SSD等)的速度。并非所有人都会收到问题的影响,不过Win11的确在各种方面都降低PC速度,这让人感到非常糟糕。

有用户注意到,Win11中存储驱动器表现出了更慢的I/O性能,此外还有错误会降低Win11驱动器的读写速度。在大多数情况下Win11和Win10的速度不会有显著差异,但当安装大型应用程序或者传输大文件时,就会观察到性能问题。

 

Win10(左)和Win11的性能对比

在一些情况下,Win11可以将驱动器的写入性能降低多达45%,因此Win11会无法立即加载出应用程序,或者打开文件夹。该问题在反馈中心、Twitter、Reddit和微软论坛上都有详细记录。

有用户指出,在Win10核Win11中都进行了SSD的基准测试,在Win10上测出3500MB/s的读取速度和3200MB/s的写入速度,但在Win11当中,写入速度只有900MB/s了。

另一位用户则称,新购买的1TB 980 SSD,写入速度不到900MB/s,读取速度则大约是1700~2300MB/s,这显然是不正常的。而在旧的浦科特SSD当中,则表现出了更好的、合格的速度。

在11月22日发布的支持文档中,微软承认了该问题。这个问题在8月首次发现,目前官方已开始测试修复程序,修复程序将作为操作系统可选更新 (KB5007262) 的一部分推送。

在另一份声明中,微软官方也证实了Win11会影响所有硬盘(SSD、HDD等)的行。当操作系统为了响应写入操作时,执行了不必要的操作,就会发生类似情况。不过,该错误不会影响所有的分区和驱动器。

如果启用了“NTFS USN 日志”功能,那么可能会引发Win11的性能问题。值得幸庆的是,该功能只会在C盘默认启用,其他分区或者驱动器不会存在NTFS USN日志。

根据熟悉开发的消息人士称,微软正在积极调查Win11性能方面的多个问题,例如系统错误导致文件资源管理器变慢,以及右键菜单动画出错且缓慢等等。

某些修复程序已经通过了Windows更新来提供,不过用户可以需要加入预览体验计划才能第一时间获取修复程序。

根据报告,Win11 SSD问题似乎并不普遍,这有可能是由于Win11的普及缓慢。微软并没有像2015年发布Win10时一样强迫用户升级到Win11,当前Win11仍属于可选更新,且只面向部分用户推送,仅在小部分PC运行。

普通用户可能会在12月14日发布的星期二更新中,获取相关的修复程序。如果你感觉到了性能问题,那么可以安装可选更新 (KB5007262)。可选更新通常是安全的,并且经过微软测试验证。

#win #windows #ssd 

Windows  11 导致 SSD固态硬盘性能下降?

Pattern: A Web Mining Module for Python

Pattern

Pattern is a web mining module for Python. It has tools for:

  • Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
  • Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
  • Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
  • Network Analysis: graph centrality and visualization.

It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD.

Example workflow

Example

This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: "$20 tip off a sweet little old lady today #win". The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled WIN or FAIL. The classifier uses the vectors to learn which other tweets look more like WIN or more like FAIL.

from pattern.web import Twitter
from pattern.en import tag
from pattern.vector import KNN, count

twitter, knn = Twitter(), KNN()

for i in range(1, 3):
    for tweet in twitter.search('#win OR #fail', start=i, count=100):
        s = tweet.text.lower()
        p = '#win' in s and 'WIN' or 'FAIL'
        v = tag(s)
        v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
        v = count(v) # {'sweet': 1}
        if v:
            knn.train(v, type=p)

print(knn.classify('sweet potato burger'))
print(knn.classify('stupid autocorrect'))

Installation

Pattern supports Python 2.7 and Python 3.6. To install Pattern so that it is available in all your scripts, unzip the download and from the command line do:

cd pattern-3.6
python setup.py install

If you have pip, you can automatically download and install from the PyPI repository:

pip install pattern

If none of the above works, you can make Python aware of the module in three ways:

  • Put the pattern folder in the same folder as your script.
  • Put the pattern folder in the standard location for modules so it is available to all scripts:
    • c:\python36\Lib\site-packages\ (Windows),
    • /Library/Python/3.6/site-packages/ (Mac OS X),
    • /usr/lib/python3.6/site-packages/ (Unix).
  • Add the location of the module to sys.path in your script, before importing it:
MODULE = '/users/tom/desktop/pattern'
import sys; if MODULE not in sys.path: sys.path.append(MODULE)
from pattern.en import parsetree

Documentation

For documentation and examples see the user documentation.

Version

3.6

License

BSD, see LICENSE.txt for further details.

Reference

De Smedt, T., Daelemans, W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2031–2035.

Contribute

The source code is hosted on GitHub and contributions or donations are welcomed.

Bundled dependencies

Pattern is bundled with the following data sets, algorithms and Python packages:

  • Brill tagger, Eric Brill
  • Brill tagger for Dutch, Jeroen Geertzen
  • Brill tagger for German, Gerold Schneider & Martin Volk
  • Brill tagger for Spanish, trained on Wikicorpus (Samuel Reese & Gemma Boleda et al.)
  • Brill tagger for French, trained on Lefff (Benoît Sagot & Lionel Clément et al.)
  • Brill tagger for Italian, mined from Wiktionary
  • English pluralization, Damian Conway
  • Spanish verb inflection, Fred Jehle
  • French verb inflection, Bob Salita
  • Graph JavaScript framework, Aslak Hellesoy & Dave Hoover
  • LIBSVM, Chih-Chung Chang & Chih-Jen Lin
  • LIBLINEAR, Rong-En Fan et al.
  • NetworkX centrality, Aric Hagberg, Dan Schult & Pieter Swart
  • spelling corrector, Peter Norvig

Acknowledgements

Authors:

Contributors (chronological):

  • Frederik De Bleser
  • Jason Wiener
  • Daniel Friesen
  • Jeroen Geertzen
  • Thomas Crombez
  • Ken Williams
  • Peteris Erins
  • Rajesh Nair
  • F. De Smedt
  • Radim Řehůřek
  • Tom Loredo
  • John DeBovis
  • Thomas Sileo
  • Gerold Schneider
  • Martin Volk
  • Samuel Joseph
  • Shubhanshu Mishra
  • Robert Elwell
  • Fred Jehle
  • Antoine Mazières + fabelier.org
  • Rémi de Zoeten + closealert.nl
  • Kenneth Koch
  • Jens Grivolla
  • Fabio Marfia
  • Steven Loria
  • Colin Molter + tevizz.com
  • Peter Bull
  • Maurizio Sambati
  • Dan Fu
  • Salvatore Di Dio
  • Vincent Van Asch
  • Frederik Elwert

Author: clips
Source Code: https://github.com/clips/pattern
License: BSD-3-Clause License

#python 

Pattern: A Web Mining Module for Python
郝 玉华

郝 玉华

1643034720

將網站(WebSite)或專案(Project)部屬到IIS裡面

[ASP.NET]將網站(WebSite)或專案(Project)部屬到IIS裡面
VS2015 + IIS 10 (Win10)
步驟非常簡單,預設值就能執行。ASP.NET初學者在學習ASP.NET &程式「之前」......不必為此傷腦筋,先學好程式、做出程式成果以後,再來煩惱這件事。影片內容在我的書籍「ASP.NET專題實務(I) 」有一章專文解說。

#aspdotnet #win 

 將網站(WebSite)或專案(Project)部屬到IIS裡面
Elyna Ezza

Elyna Ezza

1631683690

Announcing The Game Changer #GameJet $JET Token Presale Event!

#100x NFT token $JET presents #GameJet initial #presale to start on 18 September 2021.
This token will boom 🔥🔥

☑ Listing: #Justswap and more big #exchanges

☑ 100X #JET Token

Joining Link:

Register https://gamejet.network/register/JET1625857428

Twitter : https://twitter.com/gamejetpro

Telegram: https://t.me/GamejetNetwork

#NFT #TRX #Tron #NFTGiveaway #win #Presale

  Announcing The Game Changer #GameJet $JET Token Presale Event!

Personal Injury Claims in Scotland | No Win No Fee 100% Compensation

We are deal with all kinds of personal injury claims nationwide please contact us for more information personal injury claims in scotland

#no #win

Personal Injury Claims in Scotland | No Win No Fee 100% Compensation