1644486760
The Transformer architecture has been successfully applied across various domains – language processing, computer vision, time series analysis, among others. Researchers have found yet another domain that could do well with Transformer – keyword spotting.
https://analyticsindiamag.com/now-transformers-are-being-applied-for-keyword-spotting/
1653377002
This PySpark cheat sheet with code samples covers the basics like initializing Spark in Python, loading data, sorting, and repartitioning.
Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It allows you to speed analytic applications up to 100 times faster compared to technologies on the market today. You can interface Spark with Python through "PySpark". This is the Spark Python API exposes the Spark programming model to Python.
Even though working with Spark will remind you in many ways of working with Pandas DataFrames, you'll also see that it can be tough getting familiar with all the functions that you can use to query, transform, inspect, ... your data. What's more, if you've never worked with any other programming language or if you're new to the field, it might be hard to distinguish between RDD operations.
Let's face it, map()
and flatMap()
are different enough, but it might still come as a challenge to decide which one you really need when you're faced with them in your analysis. Or what about other functions, like reduce()
and reduceByKey()
?
Even though the documentation is very elaborate, it never hurts to have a cheat sheet by your side, especially when you're just getting into it.
This PySpark cheat sheet covers the basics, from initializing Spark and loading your data, to retrieving RDD information, sorting, filtering and sampling your data. But that's not all. You'll also see that topics such as repartitioning, iterating, merging, saving your data and stopping the SparkContext are included in the cheat sheet.
Note that the examples in the document take small data sets to illustrate the effect of specific functions on your data. In real life data analysis, you'll be using Spark to analyze big data.
PySpark is the Spark Python API that exposes the Spark programming model to Python.
>>> from pyspark import SparkContext
>>> sc = SparkContext(master = 'local[2]')
>>> sc.version #Retrieve SparkContext version
>>> sc.pythonVer #Retrieve Python version
>>> sc.master #Master URL to connect to
>>> str(sc.sparkHome) #Path where Spark is installed on worker nodes
>>> str(sc.sparkUser()) #Retrieve name of the Spark User running SparkContext
>>> sc.appName #Return application name
>>> sc.applicationld #Retrieve application ID
>>> sc.defaultParallelism #Return default level of parallelism
>>> sc.defaultMinPartitions #Default minimum number of partitions for RDDs
>>> from pyspark import SparkConf, SparkContext
>>> conf = (SparkConf()
.setMaster("local")
.setAppName("My app")
. set ("spark. executor.memory", "lg"))
>>> sc = SparkContext(conf = conf)
In the PySpark shell, a special interpreter-aware SparkContext is already created in the variable called sc.
$ ./bin/spark-shell --master local[2]
$ ./bin/pyspark --master local[s] --py-files code.py
Set which master the context connects to with the --master argument, and add Python .zip..egg or.py files to the
runtime path by passing a comma-separated list to --py-files.
>>> rdd = sc.parallelize([('a',7),('a',2),('b',2)])
>>> rdd2 = sc.parallelize([('a',2),('d',1),('b',1)])
>>> rdd3 = sc.parallelize(range(100))
>>> rdd = sc.parallelize([("a",["x","y","z"]),
("b" ["p","r,"])])
Read either one text file from HDFS, a local file system or any Hadoop-supported file system URI with textFile(), or read in a directory of text files with wholeTextFiles().
>>> textFile = sc.textFile("/my/directory/•.txt")
>>> textFile2 = sc.wholeTextFiles("/my/directory/")
>>> rdd.getNumPartitions() #List the number of partitions
>>> rdd.count() #Count RDD instances 3
>>> rdd.countByKey() #Count RDD instances by key
defaultdict(<type 'int'>,{'a':2,'b':1})
>>> rdd.countByValue() #Count RDD instances by value
defaultdict(<type 'int'>,{('b',2):1,('a',2):1,('a',7):1})
>>> rdd.collectAsMap() #Return (key,value) pairs as a dictionary
{'a': 2, 'b': 2}
>>> rdd3.sum() #Sum of RDD elements 4950
>>> sc.parallelize([]).isEmpty() #Check whether RDD is empty
True
>>> rdd3.max() #Maximum value of RDD elements
99
>>> rdd3.min() #Minimum value of RDD elements
0
>>> rdd3.mean() #Mean value of RDD elements
49.5
>>> rdd3.stdev() #Standard deviation of RDD elements
28.866070047722118
>>> rdd3.variance() #Compute variance of RDD elements
833.25
>>> rdd3.histogram(3) #Compute histogram by bins
([0,33,66,99],[33,33,34])
>>> rdd3.stats() #Summary statistics (count, mean, stdev, max & min)
#Apply a function to each RFD element
>>> rdd.map(lambda x: x+(x[1],x[0])).collect()
[('a' ,7,7, 'a'),('a' ,2,2, 'a'), ('b' ,2,2, 'b')]
#Apply a function to each RDD element and flatten the result
>>> rdd5 = rdd.flatMap(lambda x: x+(x[1],x[0]))
>>> rdd5.collect()
['a',7 , 7 , 'a' , 'a' , 2, 2, 'a', 'b', 2 , 2, 'b']
#Apply a flatMap function to each (key,value) pair of rdd4 without changing the keys
>>> rdds.flatMapValues(lambda x: x).collect()
[('a', 'x'), ('a', 'y'), ('a', 'z'),('b', 'p'),('b', 'r')]
Getting
>>> rdd.collect() #Return a list with all RDD elements
[('a', 7), ('a', 2), ('b', 2)]
>>> rdd.take(2) #Take first 2 RDD elements
[('a', 7), ('a', 2)]
>>> rdd.first() #Take first RDD element
('a', 7)
>>> rdd.top(2) #Take top 2 RDD elements
[('b', 2), ('a', 7)]
Sampling
>>> rdd3.sample(False, 0.15, 81).collect() #Return sampled subset of rdd3
[3,4,27,31,40,41,42,43,60,76,79,80,86,97]
Filtering
>>> rdd.filter(lambda x: "a" in x).collect() #Filter the RDD
[('a',7),('a',2)]
>>> rdd5.distinct().collect() #Return distinct RDD values
['a' ,2, 'b',7]
>>> rdd.keys().collect() #Return (key,value) RDD's keys
['a', 'a', 'b']
>>> def g (x): print(x)
>>> rdd.foreach(g) #Apply a function to all RDD elements
('a', 7)
('b', 2)
('a', 2)
Reducing
>>> rdd.reduceByKey(lambda x,y : x+y).collect() #Merge the rdd values for each key
[('a',9),('b',2)]
>>> rdd.reduce(lambda a, b: a+ b) #Merge the rdd values
('a', 7, 'a' , 2 , 'b' , 2)
Grouping by
>>> rdd3.groupBy(lambda x: x % 2) #Return RDD of grouped values
.mapValues(list)
.collect()
>>> rdd.groupByKey() #Group rdd by key
.mapValues(list)
.collect()
[('a',[7,2]),('b',[2])]
Aggregating
>> seqOp = (lambda x,y: (x[0]+y,x[1]+1))
>>> combOp = (lambda x,y:(x[0]+y[0],x[1]+y[1]))
#Aggregate RDD elements of each partition and then the results
>>> rdd3.aggregate((0,0),seqOp,combOp)
(4950,100)
#Aggregate values of each RDD key
>>> rdd.aggregateByKey((0,0),seqop,combop).collect()
[('a',(9,2)), ('b',(2,1))]
#Aggregate the elements of each partition, and then the results
>>> rdd3.fold(0,add)
4950
#Merge the values for each key
>>> rdd.foldByKey(0, add).collect()
[('a' ,9), ('b' ,2)]
#Create tuples of RDD elements by applying a function
>>> rdd3.keyBy(lambda x: x+x).collect()
>>>> rdd.subtract(rdd2).collect() #Return each rdd value not contained in rdd2
[('b' ,2), ('a' ,7)]
#Return each (key,value) pair of rdd2 with no matching key in rdd
>>> rdd2.subtractByKey(rdd).collect()
[('d', 1)1
>>>rdd.cartesian(rdd2).collect() #Return the Cartesian product of rdd and rdd2
>>> rdd2.sortBy(lambda x: x[1]).collect() #Sort RDD by given function
[('d',1),('b',1),('a',2)]
>>> rdd2.sortByKey().collect() #Sort (key, value) ROD by key
[('a' ,2), ('b' ,1), ('d' ,1)]
>>> rdd.repartition(4) #New RDD with 4 partitions
>>> rdd.coalesce(1) #Decrease the number of partitions in the RDD to 1
>>> rdd.saveAsTextFile("rdd.txt")
>>> rdd.saveAsHadoopFile("hdfs:// namenodehost/parent/child",
'org.apache.hadoop.mapred.TextOutputFormat')
>>> sc.stop()
$ ./bin/spark-submit examples/src/main/python/pi.py
Have this Cheat Sheet at your fingertips
Original article source at https://www.datacamp.com
#pyspark #cheatsheet #spark #python
1625644560
In this video, we use the this keyword in Java to access an object’s instance variables, invoke an object’s instance methods, and return the current object. Thank you for watching and happy coding!
Need some new tech gadgets or a new charger? Buy from my Amazon Storefront https://www.amazon.com/shop/blondiebytes
What is Object Oriented Programming: https://youtu.be/gJLtvKMGjhU
Constructors:
https://youtu.be/8lr1ybCvtwU
Check out my courses on LinkedIn Learning!
REFERRAL CODE: https://linkedin-learning.pxf.io/blondiebytes
https://www.linkedin.com/learning/instructors/kathryn-hodge
Support me on Patreon!
https://www.patreon.com/blondiebytes
Check out my Python Basics course on Highbrow!
https://gohighbrow.com/portfolio/python-basics/
Check out behind-the-scenes and more tech tips on my Instagram!
https://instagram.com/blondiebytes/
Free HACKATHON MODE playlist:
https://open.spotify.com/user/12124758083/playlist/6cuse5033woPHT2wf9NdDa?si=VFe9mYuGSP6SUoj8JBYuwg
MY FAVORITE THINGS:
Stitch Fix Invite Code: https://www.stitchfix.com/referral/10013108?sod=w&som=c
FabFitFun Invite Code: http://xo.fff.me/h9-GH
Uber Invite Code: kathrynh1277ue
Postmates Invite Code: 7373F
SoulCycle Invite Code: https://www.soul-cycle.com/r/WY3DlxF0/
Rent The Runway: https://rtr.app.link/e/rfHlXRUZuO
Want to BINGE?? Check out these playlists…
Quick Code Tutorials: https://www.youtube.com/watch?v=4K4QhIAfGKY&index=1&list=PLcLMSci1ZoPu9ryGJvDDuunVMjwKhDpkB
Command Line: https://www.youtube.com/watch?v=Jm8-UFf8IMg&index=1&list=PLcLMSci1ZoPvbvAIn_tuSzMgF1c7VVJ6e
30 Days of Code: https://www.youtube.com/watch?v=K5WxmFfIWbo&index=2&list=PLcLMSci1ZoPs6jV0O3LBJwChjRon3lE1F
Intermediate Web Dev Tutorials: https://www.youtube.com/watch?v=LFa9fnQGb3g&index=1&list=PLcLMSci1ZoPubx8doMzttR2ROIl4uzQbK
GitHub | https://github.com/blondiebytes
Twitter | https://twitter.com/blondiebytes
LinkedIn | https://www.linkedin.com/in/blondiebytes
#keyword #java #using the this keyword in java #what is the this keyword
1624252974
Compete in this Digital-First world with PixelCrayons’ advanced level digital transformation consulting services. With 16+ years of domain expertise, we have transformed thousands of companies digitally. Our insight-led, unique, and mindful thinking process helps organizations realize Digital Capital from business outcomes.
Let our expert digital transformation consultants partner with you in order to solve even complex business problems at speed and at scale.
Digital transformation company in india
#digital transformation agency #top digital transformation companies in india #digital transformation companies in india #digital transformation services india #digital transformation consulting firms
1596716340
It may seem like a long time since the world of natural language processing (NLP) was transformed by the seminal “Attention is All You Need” paper by Vaswani et al., but in fact that was less than 3 years ago. The relative recency of the introduction of transformer architectures and the ubiquity with which they have upended language tasks speaks to the rapid rate of progress in machine learning and artificial intelligence. There’s no better time than now to gain a deep understanding of the inner workings of transformer architectures, especially with transformer models making big inroads into diverse new applications like predicting chemical reactions and reinforcement learning.
Whether you’re an old hand or you’re only paying attention to transformer style architecture for the first time, this article should offer something for you. First, we’ll dive deep into the fundamental concepts used to build the original 2017 Transformer. Then we’ll touch on some of the developments implemented in subsequent transformer models. Where appropriate we’ll point out some limitations and how modern models inheriting ideas from the original Transformer are trying to overcome various shortcomings or improve performance.
Transformers are the current state-of-the-art type of model for dealing with sequences. Perhaps the most prominent application of these models is in text processing tasks, and the most prominent of these is machine translation. In fact, transformers and their conceptual progeny have infiltrated just about every benchmark leaderboard in natural language processing (NLP), from question answering to grammar correction. In many ways transformer architectures are undergoing a surge in development similar to what we saw with convolutional neural networks following the 2012 ImageNet competition, for better and for worse.
#natural language processing #ai artificial intelligence #transformers #transformer architecture #transformer models
1655027473
api_manager .A simple flutter API to manage rest api request easily with the help of flutter dio.
dependencies:
api_manager: $latest_version
import 'package:api_manager/api_manager.dart';
void main() async {
ApiResponse response = await ApiManager().request(
requestType: RequestType.GET,
route: "your route",
);
print(response);
}
class ApiRepository {
static final ApiRepository _instance = ApiRepository._internal(); /// singleton api repository
ApiManager _apiManager;
factory ApiRepository() {
return _instance;
}
/// base configuration for api manager
ApiRepository._internal() {
_apiManager = ApiManager();
_apiManager.options.baseUrl = BASE_URL; /// EX: BASE_URL = https://google.com/api/v1
_apiManager.options.connectTimeout = 100000;
_apiManager.options.receiveTimeout = 100000;
_apiManager.enableLogging(responseBody: true, requestBody: false); /// enable api logging EX: response, request, headers etc
_apiManager.enableAuthTokenCheck(() => "access_token"); /// EX: JWT/PASSPORT auth token store in cache
}
}
Suppose we have a response model like this:
class SampleResponse{
String name;
int id;
SampleResponse.fromJson(jsonMap):
this.name = jsonMap['name'],
this.id = jsonMap['id'];
}
and actual api response json structure is:
{
"data": {
"name": "md afratul kaoser taohid",
"id": "id"
}
}
#Now we Performing a GET request :
Future<ApiResponse<SampleResponse>> getRequestSample() async =>
await _apiManager.request<SampleResponse>(
requestType: RequestType.GET,
route: 'api_route',
requestParams: {"userId": 12}, /// add params if required
isAuthRequired: true, /// by set it to true, this request add a header authorization from this method enableAuthTokenCheck();
responseBodySerializer: (jsonMap) {
return SampleResponse.fromJson(jsonMap); /// parse the json response into dart model class
},
);
#Now we Performing a POST request :
Future<ApiResponse<SampleResponse>> postRequestSample() async =>
await _apiManager.request<SampleResponse>(
requestType: RequestType.POST,
route: 'api_route',
requestBody: {"userId": 12}, /// add POST request body
isAuthRequired: true, /// by set it to true, this request add a header authorization from this method enableAuthTokenCheck();
responseBodySerializer: (jsonMap) {
return SampleResponse.fromJson(jsonMap); /// parse the json response into dart model class
},
);
#Now er performing a multipart file upload request :
Future<ApiResponse<void>> updateProfilePicture(
String filePath,
) async {
MultipartFile multipartFile =
await _apiManager.getMultipartFileData(filePath);
FormData formData = FormData.fromMap({'picture': multipartFile});
return await _apiManager.request(
requestType: RequestType.POST,
isAuthRequired: true,
requestBody: formData,
route: 'api_route',
);
}
Run this command:
With Flutter:
$ flutter pub add api_manager
This will add a line like this to your package's pubspec.yaml (and run an implicit dart pub get):
dependencies:
api_manager: ^0.1.29
Alternatively, your editor might support flutter pub get. Check the docs for your editor to learn more.
Now in your Dart code, you can use:
import 'package:api_manager/api_manager.dart';
//void main() async {
// ApiManager _apiManager = ApiManager();
// _apiManager.options.baseUrl = $base_url;
// _apiManager.responseBodyWrapper("data");
//
// ApiResponse<List<dynamic>> response = await _apiManager.request(
// requestType: RequestType.GET,
// route: $route,
// responseBodySerializer: (jsonMap) {
// return jsonMap as List;
// },
// );
// print(response);
//}
GitHub :
https://github.com/fdmominbd