This section contains carefully selected MCQs and Previous Year Questions with explanations to help students understand concepts and prepare effectively for examinations, interviews, and competitive tests.
Q: 1Which of following is not a data classification technique?
Option D
Data classification is a type of supervised learning where the goal is to assign data points to predefined classes based on training data.
| TECHNIQUE NAME | DESCRIPTION |
|---|---|
| Bayesian Belief Networks | Probabilistic models that classify data based on conditional dependencies between variables; useful for uncertain or probabilistic scenarios. |
| Support Vector Machine (SVM) | Finds the optimal hyperplane that separates data points into different classes, works well for high-dimensional data. |
| K-Nearest Neighbors (KNN) | Classifies a data point based on the majority class of its nearest neighbors in the feature space, simple and intuitive. |
| Decision Trees | Builds a tree-like model of decisions and their possible consequences to classify data, interpretable and widely used. |
| Random Forest | An ensemble of decision trees that improves classification accuracy by aggregating multiple trees' predictions. |
| Neural Networks | Model’s complex relationships using layers of interconnected nodes, suitable for large and complex datasets. |
Principal Component Analysis (PCA), is not a classification technique. PCA is a dimensionality reduction technique used to reduce the number of features while preserving variance.
Q: 2What is the total number of non-empty subsets of a 100-item frequent itemset?
Option C
For a set containing n items, the total number of subsets is given by 2n. This includes the empty set. To find the number of non-empty subsets, we subtract the empty set, i.e., 2n−1.
Here, the itemset has 100 items, so the total number of non-empty subsets is 2100−1.
Q: 3Which of the following is not a data mining technique?
Option D
Data Mining is the process of extracting useful patterns, relationships, and knowledge from large amounts of data. Different techniques are used in data mining to analyze and predict information.
Evaluation is not considered a standard data mining technique. It is generally a process used to measure or assess the performance of a model or system.
Q: 4Which of the following statement is incorrect?
Option C
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) serve different purposes in database systems:
OLTP:
OLAP:
Q: 5Which of the following is not a data mining language?
Option D
Data Mining languages and tools are used to analyze large amounts of data, discover patterns, and perform machine learning or statistical operations.
However, BASIC (Beginners All-Purpose Symbolic Instruction Code) is a general-purpose programming language and is not commonly used as a data mining language or tool.
Q: 6In the context of data warehousing, the semantic heterogeneity and structure of data, are challenges in which of following?
Option B
In a Data Warehouse, data comes from multiple heterogeneous sources, such as different databases, formats, and structures. The process of combining this data into a single, consistent view is known as Data Integration.
Q: 7In context of mining descriptive statistical measures of data, which of the following sets represents measures of the central tendency and measures of the dispersion of data respectively?
Option B
In descriptive statistics, data is analyzed using two main types of measures, measures of central tendency and measures of dispersion.
Measures of Central Tendency describe the center or average of a data set. They indicate where most of the data values lie.
Measures of Dispersion describe the spread or variability of data. They show how much the data values differ from the central value. Common examples include:
Q: 8In context of multidimensional data models for a data warehouse, the fact table contains:
Option A
In a multidimensional data model used in a data warehouse, data is organized into fact tables and dimension tables.
The Fact Table is the central table that contains quantitative data, also called measures or facts such as sales amount, quantity, or profit.
In addition to the facts, the fact table contains foreign keys that link it to the associated dimension tables.
Dimension Tables provide descriptive context for the facts, such as time, product, customer, or location.
Q: 9If a decision tree classifier keeps expanding until every training instance is correctly classified but test error rate begin to increase what is the most likely outcome?
Option C
When a decision tree is grown to perfectly classify every training instance, it may start capturing noise and random fluctuations in the training data rather than just the underlying patterns. This results in a phenomenon called overfitting, where the model performs exceptionally well on the training set but fails to generalize to new, unseen data.
Q: 10In data warehouse technology, a multiple dimensional view can be implemented using different OLAP storage models. Which of the following correctly distinguishes between ROLAP, MOLAP, and HOLAP?
Option B
In Data Warehouse technology, OLAP (Online Analytical Processing) systems provide multidimensional views of data for fast analysis. There are three main OLAP storage models.
| OLAP TYPE | STORAGE | DATA REPRESENTATION | ADVANTAGES | REMARK |
|---|---|---|---|---|
| ROLAP (Relational OLAP) | Relational or extended-relational databases | Data stored in tables;. Multidimensional views generated using SQL. | Scales well for large datasets, supports detailed data | Slower query performance on aggregated data. |
| MOLAP (Multidimensional OLAP) | Specialized multidimensional storage engines. | Data stored in arrays / cubes. | Fast query performance, efficient aggregation, and summary. | Handles sparse data well using compression. |
| HOLAP (Hybrid OLAP) | Combines relational tables and multidimensional cubes. | Detailed data in ROLAP, aggregated data in MOLAP. | Balances storage efficiency and query performance. | Provides both scalability and speed. |
Q: 11Which of the following statement(s) is/are true about OLAP?
I. These systems have very large number of users than that of database systems.
II. Accesses to these systems are mostly read-only operations.
Option B
OLAP (Online Analytical Processing) systems are designed for complex analysis of large volumes of data. They are optimized for query performance and analytical operations, rather than for handling large numbers of concurrent users.
OLAP systems primarily involve read-only operations, such as slicing, dicing, and aggregating data, rather than frequent updates or inserts.
Q: 12Which of the following techniques cannot be used for removal of noise from data?
Option C
Noise removal in data preprocessing aims to reduce errors or random variations in datasets.
SMOOTHING BY BIN MEANS: Replaces each value in a bin with the mean of the bin to reduce variability.
SMOOTHING BY BIN MEDIANS: Replaces each value with the median of the bin, which is robust to outliers.
SMOOTHING BY BIN BOUNDARIES: Replaces values with the closest boundary (min or max) of the bin to limit extreme values.
Q: 13
Match the clustering approach (Column 1) with its correct description (Column 2):
| Column 1 (Clustering Approach) | Column 2 (Description) |
|---|---|
| 1. Agglomerative Method | A. Begins with each data object as its own cluster and merges them iteratively. |
| 2. Divisive Method | B. Uses density rather than distance to form clusters, enabling discovery of arbitrary shapes. |
| 3. Density Based Method | C. Starts with all data in one cluster and then recursively splits into smaller clusters. |
Option A
Clustering approaches can be categorized based on how they form groups of data.
| CLUSTERING APPROACH | DESCRIPTION |
|---|---|
| Agglomerative Method | Bottom-up hierarchical approach, begins with each data object as its own cluster and merges them iteratively. |
| Divisive Method | Top-down hierarchical approach, starts with all data in one cluster and recursively splits into smaller clusters. |
| Density-Based Method | Forms clusters based on density rather than distance, allowing discovery of arbitrarily shaped clusters and handling noise. |
Q: 14The 0-D cuboid, which holds the highest level of summarization is also known as:
Option B
In data warehousing and OLAP, a cuboid represents a specific level of aggregation in a multidimensional cube.
The 0-D cuboid is the highest level of summarization, meaning it aggregates data across all dimensions, providing only a single summarized value for the entire dataset. This cuboid is also called the Apex Cuboid because it sits at the top of the aggregation lattice.
Q: 15In the context of data warehousing, let 'smoothing by bin boundaries' is applied for data cleaning on the data [4, 8, 15, 21, 21, 24, 25, 28, 34] with equal-frequency bins of size 3 (namely bin1, bin2 and bin3). After smoothing bin2 data is given by:
Option A
Smoothing by bin boundaries is a data cleaning technique used to reduce the effect of noise or outliers in a dataset. The process involves dividing data into bins and then replacing each value in a bin with the closest bin boundary value either minimum or maximum of the bin.
Given the data [4, 8, 15, 21, 21, 24, 25, 28, 34] and equal-frequency bins of size 3:
Smoothing by bin boundaries for Bin2:
Bin boundaries: Min = 21, Max = 24
Replace each value in Bin2 with the nearest boundary:
Finally, the smoothed Bin2 is [21,21,24].
Thank you so much for taking the time to read my Computer Science MCQs section carefully. Your support and interest mean a lot, and I truly appreciate you being part of this journey. Stay connected for more insights and updates! If you'd like to explore more tutorials and insights, check out my YouTube channel.
Don’t forget to subscribe and stay connected for future updates.