Qualifying in this
qualification is not a simple thing to attain. Candidate needs to be prepared
along with necessary material and resources to complete the exam.
In the written examination
segment of CCP:DS, applicants are tested on their acquaintance of essential
data science subjects. Applicants must complete DS-200 to
become entitled for a Data Science
Essentials. Following are the topics covered in the examination.
Data Acquisition
·
Deploy a variety of acquisition methods for
obtaining data, containing database integration running with APIs
·
Utilize Hadoop tools such as Flume and Sqoop
·
Utilize command line tools such curl and wget
Data Evaluation
·
Knowledge of the file types commonly used for
input and output and the advantages and disadvantages of each
·
A familiarity with Hadoop SequenceFiles and
serialization using Avro
·
An understanding of filtering and sampling
techniques
·
Tools, utilities and techniques for evaluating
data from the command line and at scale
·
Methods for working with various file formats
containing binary files, XML, JSON and .csv
Data Transformation
·
Write records into a new format such
AvroOutputFormat or SequenceFileOutputFormat
·
Write a custom subclass of FileOutputFormat
·
Write a Mapper using Python and invoke via
Hadoop streaming
·
Write scripts to anonymize data sets
·
Join data sets
·
Invoke Unix tools to convert file formats
·
Write a script that receives records on stdin
and write them to stdout
·
Write a map-only Hadoop Streaming job
Machine Learning Basics
·
Understand how to use Mappers and Reducers to
create predictive models
·
Identify appropriate uses of the following:
parametric/non-parametric algorithms, kernels, support vector machines,
clustering, neural networks, recommender systems and dimensionality reduction
·
Understand the different kinds of machine
learning, including supervised and unsupervised learning
Clustering
·
Identify appropriate uses of various models
including distribution, centroid, group, density and graph
·
Classify the algorithms applicable to each model
·
Describe clustering and identify appropriate use
cases
·
Explain the value and use of similarity metrics
including Euclidean distance, Pearson correlation and block distance
Classification
·
Explain the steps for training a set of data in
order to classify new data based on known data
·
Describe classification formulas and techniques
·
Classify the utilize cases for logistic
regression, Bayes theorem
Collaborative Filtering
·
Explain the limitations and strengths of
collaborative filtering techniques
·
Classify the use of item-based and user-based
collaborative filtering techniques
·
Decide the metrics one should use to evaluate
the accuracy of a recommender system
·
Decide the appropriate collaborative filtering
implementation
Model/Feature Selection
·
Examine a scenario and determine the appropriate
attributes and features to select
·
Explain the role and function of feature
selection
·
Examine a scenario and determine the methods to
deploy for optimal feature selection
Probability
·
Decide sample percentiles
·
Examine a scenario and determine the likelihood
of a particular outcome
·
Summarize a distribution of sample numbers
·
Decide a range of items based on a sample
probability density function
Visualization
·
Examine data visualization and interpret its
meaning
·
Decide the most effective visualization for a
given problem
Optimization
·
Classify 1st order and 2nd order optimization
techniques
·
Understand optimization methods
·
Decide the sources of errors in a model
·
Decide the learning rate for a particular
algorithm
No comments:
Post a Comment