And
Implementation of Information Privacy and Security
(Name of Author)
Aim and Objectives
The creation of awareness in the networking skills has increased the interest of individuals on the need to obtain private information through phishing of passwords as well as the use virus and worms that are designed to replicate users. These methods have been used by advanced and well experienced programmers to infiltrate and obtain data that is always intended to be used maliciously. Recent research in data storage and privacy issues has revealed that the methods used to infiltrate systems have been used successfully and documented. This information has been shared with all individuals around the globe through the connectivity availed by the internet. This has made it easy to decrypt user passwords using open source description software. As a result, individuals are opting for cloud computing where their data is kept on online servers and are only accessed when needed with only the required data retrieved. This makes the window of opportunity for hacking very small as data is available on their hardware for only limited period of time. However, there is still need to provide adequate security for information and data on the cloud through efficient user verification techniques.
The implementation of biometrics as a solution to the vulnerabilities facing the information privacy and security has been a preferred method because of its effectiveness and difficulty in cracking. This is attributed to the dedicated channel it follows to facilitate the authentication process. It is imperative to note that the cloud has resulted in the expansion of the available information that can be held; consequently, it is possible to incorporate data from a large number of individuals. The project aims at finding, exploring and improving a biometric system that is universally accepted and utilizable by all users for verification in cloud computing. Subsequently, the results are bound to have an impact on the overall information privacy and data security.
The voice biometric system is the identified system for user verification. The system will provide a new feature that is not present in the current voice biometric systems available. The biometric systems in use today for cloud user verification only provides one security layer that is once the user name is provided the only other thing required to gain access to the system is the user voice signature sample. To improve on security the project will provide two layers of security with the user having to pass both security measures to be allowed access to the cloud storage. If the user fails in the layer of security, access to the second layer is not granted. Failing of the security measure in the second access denies access to the information in the cloud.
Algorithms
User verification using speech samples relies on the fact that speech differs in tone, quality, dynamics of pitch and loudness. This characteristic can be identified and isolated in the frequency domain. The isolated properties have then to be compared to the once in the database for person authentication. These are two different processes requiring two distinct algorithms; the first process is known as feature extraction and the second called pattern matching.
Feature extraction process involves maintaining the necessary data and discarding redundant and useless information from the speech signal. The desirable properties required for user recognition include high sub-word classes’ discrimination, low speaker variability and invariance to degradations due to noise. The main aim of this process is to find acoustic correlations of the utterances that can be computed through the processing of the signal waveform. The algorithm to perform this task is the Mel Frequency Cepstrum Coefficients.
Mel Frequency Cepstrum Coefficients (MFCC)
This algorithm is based on the human perception that does not follow the linear scale and is less sensitive above 1000Hz hence the reason why it was chosen for voice verification. For each tone of the actual frequency, a subjective pitch is measured on the Mel scale. The scale measures linearity up to 1000Hz and logarithmically spaced above 1000Hz to find the log power of the scaled signal. Cepstrum is the inverse discrete Fourier transform and is obtained from the logarithm of Fourier Transform Magnitude. This cepstrum is used in the determination of speech signal’s pitch period. The algorithm typically follows five steps which are discussed below.
First, the input signal is converted from the analog format to digital format and divided into overlapping frames. This is because speech is not a stationery signal and has frames sizes of 10-25m with 5-10ms frame shifts that is the period of time between successive frames. The next step is reducing the signal discontinuity at both ends of the blocks through a process called windowing. The algorithm employs hamming window technique. The next step involves performing a Discrete Fourier Transform using the Fast Fourier Transform using on the Hamming-windowed signals. The fourth procedure involves designing of the triangular filter banks on the Mel scale. This is because the Mel scale is based human perception and the human ears act as a bank of overlapping band-pass filters. Therefore, the approach employed is to build a filter bank using the bandwidth provided by the Mel scale and pass the spectra magnitudes through these filters and obtain the Mel-frequency spectrum. This is followed by computation of the logarithm of Mel-filter bank square magnitude output which compresses dynamic range of values. Human response to signal level is logarithmic with it being less in high amplitudes making frequency estimates less sensitive to input variations. The final stage is converting the output to time domain and finding the Mel Frequency cepstrum coefficient that is the spectrum of a spectrum. Spectrum gives information on the frequency of a signal while the cepstrum gives information on the frequency changes. An inverse discrete Fourier Transform is applied to the cepstrum and the output based on to the testing algorithm that is the Hidden Markov Model algorithm.
Hidden Markov Model (HMM)
The algorithm describes a stochastic process comprising of two stages. The first stage is the Markov chain and the second stage involves generation of output for every point in time. This output sequence is the only thing that can be observed in the behavior of models since the state sequence observed during data generation cannot be observed. The algorithm is used die to its high classification rates with minimum errors.
HMM is characterized by N that is the number of hidden in a given model, M is the number of different observation symbols corresponding to the physical output of a model, A is the state transition probability distribution and B is the observational symbol probability distribution matrix.
For the HMM to perform voice authentication, each user in the database must build a HMM model parameter. The feature vectors obtained from MFCC are quantized by the k-mean algorithm and classification technique. This method maps each continuous observation vector into a discrete codebook that are used to estimate the HMM parameters using the forward-backward algorithm. The algorithm employs the dynamic principle to compute the required values necessary in obtaining the posterior marginal distributions in two passes. One pass goes forward in time with the other going backwards in time hence the name. The algorithm uses the likelihood probability computed for every model and selects the one with the highest similarity.
Bibliography
Raj, B. & Singh, R., 2011. Design and Implementation of Speech Recognition Systems. Machine Learning for Signal Processing.
Rashmi, C. R., 2014. Review of Algorithms and Applications in Speech Recognition Systems. International Journal of Computer Science and Information Technologies, 5(4).