FAQ

Frequently Asked Questions on Kaizen Secure Voiz Product

  • Programming Languages Used in Developing the System

    Open standards have been deployed in Kaizen Voice Authentication, through Java and Python. It works with most of the popular databases like DB2, MS SQL, Oracle.

  • One Single Largest Drawback Against Voice Authentication System?

    Of course, the system results vary due to noise/network disturbances. Too much noise is the only enemy. Users must be in a less noisy environment to allow successful enrollment and authentication of a human voice. We recommend using other factors in addition to voiceprints if it is a noisy work environment

  • What are Some of the Arguments Against Iris/ Fingerprint/facial Recognition Systems?

    These are very well established biometrics options. They offer good results, but they need specific readers/ scanners at all access points. Most of the access equipment wears out in a few months, to reduce the effectiveness score. Dust, moisture, wear-tear, low maintenance affects the results of a scan. Voice biometrics will work through any mobile/landline and does not need any specific equipment.

    It is highly aware of “liveness” [actual human participating in the authentication], less intrusive, inexpensive and scores very high on EER [equal error rate]. Go for it.

  • Since Voice Authentication Takes Approx 10 secs for Successful Authentication, is it not a Slow Process?

    The usual customer authentication by a live help desk agent needs nearly 90 seconds. The first 40 secs are spent choosing the IVRS options or problem description, later 40-50 secs on quoting the relevant PIN/date of birth/answers to secret questions.

    This is when a customer calls into a contact center and seeks help through the contact center executive. Instead, if we deploy the Voice Authentication, the authentication is completed in 10 secs and allows for faster resolution. This increases customer satisfaction rates and reduces AHT (Average Handling Time)/contact center billing costs

  • What is Active and Passive Voice Authentication?

    – Active authentication is where the customer is aware of the process, voluntarily participates in the authentication and provides time for the same. This is before he is connected to a live agent/contact center axis

    - Passive authentication is where the customer has contacted the service help desk, is describing his/her problem and the biometric server runs an authentication check in the background, as he speaks. The help desk may encourage the customer to speak for more than 10 secs, to allow for correct authentication/recognition against voiceprints. This passive system needs more ports, E1/PRI lines on the CTI to allow for real-time processing. This is resource hungry.

  • What are Some of the Guiding Standards for Voice Authentication?

    National Institute of Standards and Technology (NIST) Special Publication 800-63-2 discusses various forms of two-factor authentication and guides on using them in business processes requiring different levels of assurance. We adhere to all the norms and comply with

    We adhere to all the norms and comply with NIST standards. Customers can seek 3rd party validation of the same.

    NIST standards. Customers can seek 3rd party validation of the same.

  • Integration Touch points with CTI/IVRS

    KSV has published web services for integration touch-points with other 3rd party apps like CRM, IVRS and ERP. During the deployment, voice-stream [min 9 Sec string] of users is routed from the IVRS port to the cache of CTI, later to the biometric server. This is an established best practice. Media gateway through the SIP protocol manages all other streams [VOIP] and sends them to the voice biometric server.

  • Can a Channel Partner Learn the Product Implementation?

    Do you want to deliver the project fully? Yes, that can be organized. Partners can focus on selling the product initially in a chosen territory. If they can invest in resources, KSV team will train them on best practices and involve in one project implementation, working alongside the KSV implementation team. Learn, get certified and get independent on implementation.

  • Details on GMM Algorithm used in Voice Authentication

    Gaussian mixture models are a probabilistic model for representing normally distributed subpopulations within an overall population. Mixture models in general don’t require knowing which subpopulation a data point belongs to, allowing the model to learn the subpopulations automatically. Since subpopulation assignment is not known, this constitutes a form of unsupervised learning.


    For example, in modeling human height data, height is typically modeled as a normal distribution for each gender with a mean of approximately 5’10” for males and 5’5” for females. Given only the height data and not the gender assignments for each data point, the distribution of all heights would follow the sum of two scaled (different variance) and shifted (different mean) normal distributions. A model making this assumption is an example of a Gaussian mixture model (GMM), though in general, a GMM may have more than two components. Estimating the parameters of the individual normal distribution components is a canonical problem in modeling data with GMMs.


    GMMs have been used for feature extraction from speech data, and have also been used extensively in object tracking of multiple objects, where the number of mixture components and their means predict object

    Locations at each frame in a video sequence.

  • What are the 12-14 Parameters of Human Voice that Determines the Voice Print?

    Relatively stable characteristics :

    • Vocal tract length
    • Vocal tract shape
    • Vocal cord length (pitch)
    • Gender (Breathiness)
    • Nasal cavity size and shape
    • Speaking rate and prosody
    • Language, dialect, and accent

    Transient characteristics :

    • Health
    • Emotional state
    • Environment

  • Integration with Other Applications and APIs

    Done in a jiffy. KSV has published APIs for common web services and data call out from other apps. Partners/ customer IT teams can do it themselves or our delivery team can do it for you. Easily available.

  • What if a Customer Wants a Full Suite of Biometrics for Deployment?

    You mean the customer wants other biometrics? No worries, we can do that. Fingerprinting, facial recognition is two more options that can be provided directly by KSV. We can configure other 3rd party solutions if needed to complement Voice Authentication and provide a comprehensive solution to end customers.

  • Platform Agnostic Deployment for Windows, Linux, Unix Servers

    Yes, it is platform agnostic. It will work on all server O/S

  • What is Geo-tagging and How is it Used? Can the App Tamper?

    Geo-tagging is an app that is installed on the user’s mobile phone. This picks up the latitude-longitude with the time-stamp, while the voice authentication is done. Instead of a fixed line, this app can serve as the proof of fixed latitude-longitude coordinates, via a mobile phone-based voice authentication.

  • Commercial Pricing of License and Services

    - Number of customers/user based blocks, shall be the base for a product license fee.

    - One-time setup cost and implementation costs.

    - AMC for product license at 15%

  • Do we have a POC/testbed for demos?

    Yes, we are seeking a Microsoft technology center in Bangalore as one such POC testbed. Customers can walk-in for an immersion experience of the product and surround technologies like database, datacenter automation, IVRS, networking, etc.

  • Peak load tests and scale architecture

    This is being made available. Test reports can be shared with registered partners and customers under NDA

  • Cloud-Based Instance for Customers

    Yes, this will be made available through Azure, AWS, and other service providers. Hosted/dedicated instance, cloud or SaaS options available. Customers can pay USD/INR per user per month for a complete product suite.

  • What is the Average Time Needed for Project Implementation of 5000 Users?

    Plain vanilla installation and configuration of voice biometrics application will take less than 3 weeks. Integration with other applications /CTI/IVRS will be based on our effort estimation. It will vary. Web services based APIs can be used to shorten this integration.

  • Can one Enroll in English Language and Try Using Another Language for Authentication?

    Yes, you can do that. This is independent of language and text.

  • How Much Affect Does Surrounding Environ, Ambient Noise has on the Authentication?

    A noisy work environment will lead to a high rate of false rejection. Customers must be in a less noisy environment, use a mobile phone in normal mode, and not use the loudspeaker mode. Authentication depends on clean reception of voice by the biometric engine

  • Will the Database Size Balloon and will it Need Huge Storage Space?

    No, the database captures and retains only the parameters of a voice in the Voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

  • Is Reverse Engineering Possible for Re-creation of Voice Prints?

    No, the database captures and retains only the parameters of a voice in the voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

  • Can Someone use Pre-recorded Voice and Mimicry to Beat the System?

    No, they can’t. The human voice has 12 unique parameters like pitch, tone, breathlessness, concatenation, etc. Mimicry may match 2-3 parameters of another person’s voice. Hence it is rejected during authentication. Prerecorded voice will have different waveforms and is easily identified as a false attempt, leading to rejection.

  • How Much Time is Needed for Enrollment and Each Authentication?

    First Time Enrollment:

    This is a one-timed exercise for any user/customer and will not take more than 30 seconds of speaking into the phone. They can speak in any language/any text. This enrollment creates a customer voiceprint.

    Successive Authentication: 

    Every time an enrolled customer calls into the specified IVRS/ contact center number, the biometrics engine needs only 7 seconds to authenticate against a database of voiceprints.

  • ROI on Voice Authentication in different use cases

    Savings on call center costs, bandwidth, manpower costs, and cost of a data breach. GDPR guidelines or banking regulatory authority guidelines can be better met through multi-factor authentication.

    Telecom: 

    Retain high ARPU customers, save on contact center costs, ensure low ARPU clients use only IVRS option, and not cut into call center time / AHT, GDPR compliance.

    Government Treasury: 

    Avoid invalidated payouts to pensioners, facilitate better compliance to proof of life process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof.

    Government Treasury:

    Avoid invalidated payouts to pensioners, facilitate better compliance to proof of life.

    Process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof.

    Not for Profit /NGO: 

    Control fund flow from trust /donor to field implementation agencies, track beneficiaries every month, track them as alumni and map them against expected impact assessment, ensure money spent on social dev projects are validated in real-time.

    Manufacturing/Logistics/Others: 

    Real-time attendance mapping of field staff with Geo-tagging, ensure scheduled shifts don’t slip up, the customer’s customer is happy and retains the contracted business, save manpower costs in manual monitoring of staff, cut costs and align field attendance with a timestamp to actual salary calculations.

  • Can we use Voice Authentication to segregate different segments of customers?

    Yes, it can be used as a privilege or differentiator for banks or other organizations.  Allow high net worth account holders or high-value customers to use this tool.

  • How many attempts can a validated user make, if he/she is rejected on the first attempt through IVRS/ biometrics?

    The system can be configured for the desired number of attempts. For E.g. a validated user who gets rejected can be allowed for 2 more attempts or re-directed to the OTP pin generation or to speak to a customer service desk. It can be configured to give 3-5 attempts for re-authentication.

  • Can the false acceptance rate be made zero?

    Yes, it is scientifically possible. Our product roadmap has a high vector algorithm in the beta test stage now. The acceptance rate is closer to zero.

  • What if a validated person has cold/ill health and tries to log in using biometrics?

    Cold, nasal congestion may affect 2 parameters out of 12 unique distinctions of the human voice. The scoring system will allow the user to log-in, even if the 2 parameters are not acceptable, for that specific day. This setting is done by the implementation team and can be set.No initial investments on scanners/readers as it is voice/telephone-based high or low, based on customer preferences.

  • Which is more important false acceptance rate [FAR] or false rejection rate [FRR]?

    Each user will have to determine and choose the ratio suiting the risk profile. FAR cannot be zero and we are testing a high vector algorithm to make it a 0.01% failure rate. Every customer needs to understand the Voice Authentication application to their industry needs.

  • What is an equal error rate?

    The percentage of validation attempts in a biometric system that is either Falsely Accepted or Falsely Rejected where the probability of acceptance and rejection are equal. EER is a trade-off or a ratio between FAR and FRR. The false acceptance rate is allowing unrecognized/unwanted customers to log in and False rejection rate is rejecting previously validated users or preventing valid customers from entering the system. Customers can choose to fix their ratio in favor of any one of them. It KSV team will configure the system to suit customer needs and also monitor this.

  • Voice Authentication references and customer names

    KSV is a new organization with unique IPRs. The KSV Secure has references in SRI, USA, Lebara UK. We are striving towards achieving top references in govt, bank, defense, NGO trusts, and other sectors. We will share names as and when pilot implementation starts.

  • Programming Languages Used in Developing the System

    Open standards have been deployed in Kaizen Voice Authentication, through Java and Python. It works with most of the popular databases like DB2, MS SQL, Oracle.

  • One Single Largest Drawback Against Voice Authentication System?

    Of course, the system results vary due to noise/network disturbances. Too much noise is the only enemy. Users must be in a less noisy environment to allow successful enrollment and authentication of a human voice. We recommend using other factors in addition to voiceprints if it is a noisy work environment

  • What are Some of the Arguments Against Iris/ Fingerprint/facial Recognition Systems?

    These are very well established biometrics options. They offer good results, but they need specific readers/ scanners at all access points. Most of the access equipment wears out in a few months, to reduce the effectiveness score. Dust, moisture, wear-tear, low maintenance affects the results of a scan. Voice biometrics will work through any mobile/landline and does not need any specific equipment.

    It is highly aware of “liveness” [actual human participating in the authentication], less intrusive, inexpensive and scores very high on EER [equal error rate]. Go for it.

  • Since Voice Authentication Takes Approx 10 secs for Successful Authentication, is it not a Slow Process?

    The usual customer authentication by a live help desk agent needs nearly 90 seconds. The first 40 secs are spent choosing the IVRS options or problem description, later 40-50 secs on quoting the relevant PIN/date of birth/answers to secret questions.

    This is when a customer calls into a contact center and seeks help through the contact center executive. Instead, if we deploy the Voice Authentication, the authentication is completed in 10 secs and allows for faster resolution. This increases customer satisfaction rates and reduces AHT (Average Handling Time)/contact center billing costs

  • What is Active and Passive Voice Authentication?

    – Active authentication is where the customer is aware of the process, voluntarily participates in the authentication and provides time for the same. This is before he is connected to a live agent/contact center axis

    - Passive authentication is where the customer has contacted the service help desk, is describing his/her problem and the biometric server runs an authentication check in the background, as he speaks. The help desk may encourage the customer to speak for more than 10 secs, to allow for correct authentication/recognition against voiceprints. This passive system needs more ports, E1/PRI lines on the CTI to allow for real-time processing. This is resource hungry.

  • What are Some of the Guiding Standards for Voice Authentication?

    National Institute of Standards and Technology (NIST) Special Publication 800-63-2 discusses various forms of two-factor authentication and guides on using them in business processes requiring different levels of assurance. We adhere to all the norms and comply with

    We adhere to all the norms and comply with NIST standards. Customers can seek 3rd party validation of the same.

    NIST standards. Customers can seek 3rd party validation of the same.

  • Integration Touch points with CTI/IVRS

    KSV has published web services for integration touch-points with other 3rd party apps like CRM, IVRS and ERP. During the deployment, voice-stream [min 9 Sec string] of users is routed from the IVRS port to the cache of CTI, later to the biometric server. This is an established best practice. Media gateway through the SIP protocol manages all other streams [VOIP] and sends them to the voice biometric server.

  • Can a Channel Partner Learn the Product Implementation?

    Do you want to deliver the project fully? Yes, that can be organized. Partners can focus on selling the product initially in a chosen territory. If they can invest in resources, KSV team will train them on best practices and involve in one project implementation, working alongside the KSV implementation team. Learn, get certified and get independent on implementation.

  • Details on GMM Algorithm used in Voice Authentication

    Gaussian mixture models are a probabilistic model for representing normally distributed subpopulations within an overall population. Mixture models in general don’t require knowing which subpopulation a data point belongs to, allowing the model to learn the subpopulations automatically. Since subpopulation assignment is not known, this constitutes a form of unsupervised learning.

    For example, in modeling human height data, height is typically modeled as a normal distribution for each gender with a mean of approximately 5’10” for males and 5’5” for females. Given only the height data and not the gender assignments for each data point, the distribution of all heights would follow the sum of two scaled (different variance) and shifted (different mean) normal distributions. A model making this assumption is an example of a Gaussian mixture model (GMM), though in general, a GMM may have more than two components. Estimating the parameters of the individual normal distribution components is a canonical problem in modeling data with GMMs.

    GMMs have been used for feature extraction from speech data, and have also been used extensively in object tracking of multiple objects, where the number of mixture components and their means predict object

    Locations at each frame in a video sequence.

  • What are the 12-14 Parameters of Human Voice that Determines the Voice Print?

    Relatively stable characteristics :

    • Vocal tract length
    • Vocal tract shape
    • Vocal cord length (pitch)
    • Gender (Breathiness)
    • Nasal cavity size and shape
    • Speaking rate and prosody
    • Language, dialect, and accent

    Transient characteristics :

    • Health
    • Emotional state
    • Environment

  • Integration with Other Applications and APIs

    Done in a jiffy. KSV has published APIs for common web services and data call out from other apps. Partners/ customer IT teams can do it themselves or our delivery team can do it for you. Easily available.

  • What if a Customer Wants a Full Suite of Biometrics for Deployment?

    You mean the customer wants other biometrics? No worries, we can do that. Fingerprinting, facial recognition is two more options that can be provided directly by KSV. We can configure other 3rd party solutions if needed to complement Voice Authentication and provide a comprehensive solution to end customers.

  • Platform Agnostic Deployment for Windows, Linux, Unix Servers

    Yes, it is platform agnostic. It will work on all server O/S

  • What is Geo-tagging and How is it Used? Can the App Tamper?

    Geo-tagging is an app that is installed on the user’s mobile phone. This picks up the latitude-longitude with the time-stamp, while the voice authentication is done. Instead of a fixed line, this app can serve as the proof of fixed latitude-longitude coordinates, via a mobile phone-based voice authentication.

  • Commercial Pricing of License and Services

    - Number of customers/user based blocks, shall be the base for a product license fee.

    - One-time setup cost and implementation costs.

    - AMC for product license at 15%

  • Do we have a POC/testbed for demos?

    Yes, we are seeking a Microsoft technology center in Bangalore as one such POC testbed. Customers can walk-in for an immersion experience of the product and surround technologies like database, datacenter automation, IVRS, networking, etc.

  • Peak load tests and scale architecture

    This is being made available. Test reports can be shared with registered partners and customers under NDA

  • Cloud-Based Instance for Customers

    Yes, this will be made available through Azure, AWS, and other service providers. Hosted/dedicated instance, cloud or SaaS options available. Customers can pay USD/INR per user per month for a complete product suite.

  • What is the Average Time Needed for Project Implementation of 5000 Users?

    Plain vanilla installation and configuration of voice biometrics application will take less than 3 weeks. Integration with other applications /CTI/IVRS will be based on our effort estimation. It will vary. Web services based APIs can be used to shorten this integration.

  • Can one Enroll in English Language and Try Using Another Language for Authentication?

    Yes, you can do that. This is independent of language and text.

  • How Much Affect Does Surrounding Environ, Ambient Noise has on the Authentication?

    A noisy work environment will lead to a high rate of false rejection. Customers must be in a less noisy environment, use a mobile phone in normal mode, and not use the loudspeaker mode. Authentication depends on clean reception of voice by the biometric engine

  • Will the Database Size Balloon and will it Need Huge Storage Space?

    No, the database captures and retains only the parameters of a voice in the Voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

  • Is Reverse Engineering Possible for Re-creation of Voice Prints?

    No, the database captures and retains only the parameters of a voice in the voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

  • Can Someone use Pre-recorded Voice and Mimicry to Beat the System?

    No, they can’t. The human voice has 12 unique parameters like pitch, tone, breathlessness, concatenation, etc. Mimicry may match 2-3 parameters of another person’s voice. Hence it is rejected during authentication. Prerecorded voice will have different waveforms and is easily identified as a false attempt, leading to rejection.

  • How Much Time is Needed for Enrollment and Each Authentication?

    First Time Enrollment:

    This is a one-timed exercise for any user/customer and will not take more than 30 seconds of speaking into the phone. They can speak in any language/any text. This enrollment creates a customer voiceprint.

    Successive Authentication: 

    Every time an enrolled customer calls into the specified IVRS/ contact center number, the biometrics engine needs only 7 seconds to authenticate against a database of voiceprints.

  • ROI on Voice Authentication in different use cases

    Savings on call center costs, bandwidth, manpower costs, and cost of a data breach. GDPR guidelines or banking regulatory authority guidelines can be better met through multi-factor authentication.

    Telecom: 

    Retain high ARPU customers, save on contact center costs, ensure low ARPU clients use only IVRS option, and not cut into call center time / AHT, GDPR compliance.

    Government Treasury: 

    Avoid invalidated payouts to pensioners, facilitate better compliance to proof of life process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof.

    Government Treasury:

    Avoid invalidated payouts to pensioners, facilitate better compliance to proof of life.

    Process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof.

    Not for Profit /NGO: 

    Control fund flow from trust /donor to field implementation agencies, track beneficiaries every month, track them as alumni and map them against expected impact assessment, ensure money spent on social dev projects are validated in real-time.

    Manufacturing/Logistics/Others: 

    Real-time attendance mapping of field staff with Geo-tagging, ensure scheduled shifts don’t slip up, the customer’s customer is happy and retains the contracted business, save manpower costs in manual monitoring of staff, cut costs and align field attendance with a timestamp to actual salary calculations.

  • Can we use Voice Authentication to segregate different segments of customers?

    Yes, it can be used as a privilege or differentiator for banks or other organizations.  Allow high net worth account holders or high-value customers to use this tool.

  • How many attempts can a validated user make, if he/she is rejected on the first attempt through IVRS/ biometrics?

    The system can be configured for the desired number of attempts. For E.g. a validated user who gets rejected can be allowed for 2 more attempts or re-directed to the OTP pin generation or to speak to a customer service desk. It can be configured to give 3-5 attempts for re-authentication.

  • Can the false acceptance rate be made zero?

    Yes, it is scientifically possible. Our product roadmap has a high vector algorithm in the beta test stage now. The acceptance rate is closer to zero.

  • What if a validated person has cold/ill health and tries to log in using biometrics?

    Cold, nasal congestion may affect 2 parameters out of 12 unique distinctions of the human voice. The scoring system will allow the user to log-in, even if the 2 parameters are not acceptable, for that specific day. This setting is done by the implementation team and can be set.No initial investments on scanners/readers as it is voice/telephone-based high or low, based on customer preferences.

  • Which is more important false acceptance rate [FAR] or false rejection rate [FRR]?

    Each user will have to determine and choose the ratio suiting the risk profile. FAR cannot be zero and we are testing a high vector algorithm to make it a 0.01% failure rate. Every customer needs to understand the Voice Authentication application to their industry needs.

  • What is an equal error rate?

    The percentage of validation attempts in a biometric system that is either Falsely Accepted or Falsely Rejected where the probability of acceptance and rejection are equal. EER is a trade-off or a ratio between FAR and FRR. The false acceptance rate is allowing unrecognized/unwanted customers to log in and False rejection rate is rejecting previously validated users or preventing valid customers from entering the system. Customers can choose to fix their ratio in favor of any one of them. It KSV team will configure the system to suit customer needs and also monitor this.

  • Voice Authentication references and customer names

    KSV is a new organization with unique IPRs. The KSV Secure has references in SRI, USA, Lebara UK. We are striving towards achieving top references in govt, bank, defense, NGO trusts, and other sectors. We will share names as and when pilot implementation starts.

  • Kaizen Secure Voiz is a new organization with unique IPRs. The Kaizen Secure has references in SRI, USA, Lebara UK. We are striving towards achieving top references in govt, bank, defense, NGO trusts and other sectors. We will share names as and when pilot implementation starts

    The percentage of validation attempts in a biometric system that are either Falsely Accepted or Falsely Rejected where the probability of acceptance and rejection are equal. EER is a trade off or a ratio between FAR and FRR. The false acceptance rate is allowing unrecognized/unwanted customers to log in and False rejection rate is rejecting previously validated users or preventing valid customers from entering the system. Customers can choose to fix their ratio in favor of any one of them. It Kaizen team will configure the system to suit customer needs and also monitor this

    Each user will have to determine and choose the ratio suiting the risk profile. FAR cannot be zero and we are testing a high vector algorithm to make it 0.01% failure rate. Every customer needs to understand the Voice Authentication application to their industry needs

    Cold, nasal congestion may affect 2 parameters out of total 12 unique distinctions of the human voice. The scoring system will allow the user to log-in, even if the 2 parameters are not acceptable, for that specific day. This setting is done by implementation team and can be set No initial investments on scanners/readers as it is voice/telephone based high or low, based on customer preferences.

    Yes, it is scientifically possible. Our product roadmap has a high vector algorithm in the beta test stage now. The acceptance rate is closer to zero.

    The system can be configured for the desired number of attempts. E.g. A validated user, who gets rejected, can
    Be allowed 2 more attempts or re-directed to the OTP pin generation or to speak to a customer service desk. It can be configured to give 3-5 attempts for re-authentication.

    Yes, it can be used as a privilege or differentiator for banks or other organizations. Allow high net worth account holders or high value customers to use this tool.

    Banks: Savings on call center costs, bandwidth, manpower costs and cost of a data breach. GDPR guidelines or banking regulatory authority guidelines can be better met through multi factor authentication.

    Telecom: Retain high ARPU customers, save on contact centers costs, ensure low ARPU clients use only IVRS option and not cut into call center time / AHT, GDPR compliance

    Government Treasury: Avoid invalidated payouts to pensioners, facilitate better compliance to proof of life process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof

    Government Treasury: Avoid invalidated payouts to pensioners, facilitate better compliance to proof of life
    Process, save time and costs of enrollment of the new batch of retiring personnel, seek monthly proof of life at no extra cost instead of annual proof

    Not for Profit /NGO: Control fund flow from trust /donor to field implementation agencies, track beneficiaries every month, track them as alumni and map them against expected impact assessment, ensure money spent on social dev projects are validated in real time

    Manufacturing/Logistics/Others: Real time attendance mapping of field staff with Geo-tagging, ensure scheduled shifts don’t slip up, the customer’s customer is happy and retains the contracted business, save manpower costs in manual monitoring of staff, cut costs and align field attendance with time stamp to actual salary calculations

    First Time Enrollment: This is a one-timed exercise for any user/customer and will not take more than 30 seconds of speaking into the phone. They can speak in any language/any text. This enrollment creates customer voiceprint.

    Successive Authentication: Every time an enrolled customer calls into the specified IVRS/ contact center number, the biometrics engine needs only 7 seconds to authenticate against a database of voiceprints.

    No, they can’t. The human voice has 12 unique parameters like pitch, tone, breathlessness, concatenation, etc./ Mimicry may match 2-3 parameters of another person’s voice and hence it is rejected during authentication. Prerecorded voice will have different waveforms and is easily identified as a false attempt, leading to rejection

    No, the database captures and retains only the parameters of a voice in the voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

    No, the database captures and retains only the parameters of a voice in the voiceprint. There is no recorded voice or audio file in the biometrics engine, that can be hacked. Hence no-one can copy/re-create any audio file.

    Noisy work environment will lead to high rate of false rejection. Customers must be in a less noisy environment, use a mobile phone in normal mode and not use the loud speaker mode. Authentication depends on clean reception of voice by the biometric engine

    Yes, you can do that. This is independent of language and text.

    Plain vanilla installation and configuration of voice biometrics application will take less than 3 weeks. Integration with other application /CTI/IVRS will be based on our effort estimation. It will vary. Web services based APIs can be used to shorten this integration.

    Yes, this will be made available through Azure, AWS and other service providers. Hosted/dedicated instance on cloud or SaaS options available. Customers can pay USD/INR per user per month for a complete product suite.

    This is being made available. Test reports can be shared with registered partners and customers under NDA

    Yes, we are seeking a Microsoft technology center Bangalore as one such POC test bed. Customers can walk-in for an immersion experience of product and surround technologies like database, datacenter automation, IVRS, networking etc.

    • Number of customers/user based blocks, shall be the base for a product license fee.
    • One-time setup cost and implementation costs.
    • AMC for product license at 15%

    Geo-tagging is an app that be installed on the user mobile phone. This picks up the latitude-longitude with timestamp, while the voice authentication is done. Instead of a fixed line, this app can serve as a proof of fixed lat-long coordinates, via a mobile phone based voice authentication.

    Yes, it is platform agnostic. It will work on all server O/S

    You mean, the customer wants other biometrics? No worries, we can do that. Fingerprinting, facial recognition is two more options that can be provided directly by KSV. We can configure other 3rd party solutions if need be, to complement Voice Authentication and provide a comprehensive solution to end customer.

    Done in a jiffy. Kaizen has published APIs for common web services and data call out from other apps. Partners/ customer IT teams can do it themselves or our delivery team can do it for you. Easily available.

    Relatively stable characteristics

    • Vocal tract length
    • Vocal tract shape
    • Vocal cord length (pitch)
    • Gender (Breathiness)
    • Nasal cavity size and shabe
    • Speaking rate and prosody
    • Language, dialect and accent

    Transient characteristics

    • Health
    • Emotional state
    • Environment

    Gaussian mixture models are a probabilistic model for representing normally distributed subpopulations within an overall population. Mixture models in general don’t require knowing which subpopulation a data point belongs to, allowing the model to learn the subpopulations automatically. Since subpopulation assignment is not known, this constitutes a form of unsupervised learning.
    For example, in modeling human height data, height is typically modeled as a normal distribution for each gender with a mean of approximately 5’10” for males and 5’5” for females. Given only the height data and not the gender assignments for each data point, the distribution of all heights would follow the sum of two scaled (different variance) and shifted (different mean) normal distributions. A model making this assumption is an example of a Gaussian mixture model (GMM), though in general a GMM may have more than two components. Estimating the parameters of the individual normal distribution components is a canonical problem in modeling data with GMMs.
    GMMs have been used for feature extraction from speech data, and have also been used extensively in object tracking of multiple objects, where the number of mixture components and their means predict object
    Locations at each frame in a video sequence.

    You want to deliver the project fully? Yes, that can be organized. Partners can focus on selling the product initially in chosen territory. If they can invest in resources, Kaizen team will train them on best practices and involve in one project implementation, working alongside the KSV implementation team. Learn, get certified and get independent on implementation

    Kaizen secure voiz has published web services for integration touch points with other 3rd party apps like CRM, IVRS, ERP etc. During the deployment, voice stream [min 9 Sec string] of users is routed from the IVRS port to the cache of CTI, later to the biometric server. This is an established best practice. Media gateway through the SIP protocol manages all other streams [VOIP] and sends it to the voice biometric server

    National Institute of Standards and Technology (NIST) Special Publication 800-63-2 discusses various forms of two-factor authentication and provides guidance on using them in business processes requiring different levels of assurance. We adhere to all the norms and comply with

    We adhere to all the norms and comply with NIST standards. Customers can seek 3rd party validation of the same

    NIST standards. Customers can seek 3rd party validation of the same

    1. Active authentication is where the customer is aware of the process, voluntarily participates in the authentication and provides time for the same. This is before he is connected to a live agent/contact center axis
    2. Passive authentication is where the customer has contacted service help desk, is describing about his/her problem and the biometric server runs an authentication check in the background, as he speaks. The help desk may encourage the customer to speak for more than 10 secs, to allow for correct authentication/recognition against voiceprints. This passive system needs more ports, E1/PRI lines on the CTI to allow for real time processing. This is resource hungry

    Seriously? We feel what’s the hurry? Humor apart, the alternative landscape by using customer authentication by live help desk agent, needs nearly 90 seconds. The first 40 secs are spent on choosing the IVRS options or problem description, later 40-50 secs on quoting the relevant PIN/date of birth/answers to secret questions etc. This is in a scenario where a customer calls into a contact center and seeks help through contact center exec. Instead, if we deploy the Voice Authentication, the authentication is completed in 10 secs and allows for faster resolution. This increases customer satisfaction rates and reduces AHT time/contact center billing costs

    These are very well established biometrics options. They offer good results, but they need specific readers/ scanners at all access points. Most of the access equipment wears out in a few months, to reduce the effectiveness score. Dust, moisture, wear-tear, low maintenance affects the results of a scan. Voice biometrics will work through any mobile/landline and does not need any specific equipment. It is highly aware of “liveness” [actual human participating in the authentication], less intrusive, inexpensive and scores very high on EER [equal error rate]. Go for it.

    Off course, the system results vary due to noise/network disturbances. Too much noise is the only enemy. Users must be in less noisy environment to allow successful enrollment and authentication of human voice. We recommend using other factors in addition to voiceprints, if it is a noisy work environment

    Open standards have been deployed in Kaizen Voice Authentication, through Java, Python and works with most of the popular databases like DB2, MS SQL, Oracle etc.