Zego + Mini Game ASR Integration (Android)
SudMGP provides interactive mini-games like "Pictionary", "Guess the Word", and "Number Bomb" that support voice interactions, enhancing playability and social aspects. The integration steps are straightforward, and this document outlines the steps to integrate the ASR feature of the SudMGP SDK.
I. Background
Mini-games can have voice interaction capabilities, where the App needs to obtain specific format PCM data from Zego RTC and pass it to the SudMGP SDK in a specified way. Taking hello-sud-plus-android as an example, the source code can be found at: https://github.com/SudTechnology/hello-sud-plus-android hello-sud-plus-android encapsulates the integration of the SudMGP SDK with SudMGPWrapper. We recommend clients to integrate the SDK using SudMGPWrapper.
II. Integration Steps
SDK GitHub link: https://github.com/SudTechnology/sud-mgp-android Please use the latest version for Maven integration. For example, using version V1.3.2.1154:
- Integrate SudMGP SDK: Standard version: implementation 'tech.sud.mgp:SudMGP:1.3.2.1154' Lite version: implementation 'tech.sud.mgp:SudMGP-lite:1.3.2.1154'
- Integrate SudASR SDK for speech recognition: implementation 'tech.sud.mgp:SudASR:1.3.2.1154' Notes:
- SudASR is an extension library. The new version of the SDK will search for this extension library to enable ASR multi-language recognition capabilities.
- Library download link: https://github.com/SudTechnology/sud-mgp-android/releases
- Demo link: https://github.com/SudTechnology/hello-sud-plus-android
III. Starting ASR in the Mini Game
When the mini-game enters the ASR scene, it will automatically start the ASR capability. At this point, it will send the MG_COMMON_GAME_ASR state to the App with isOpen == true, as seen in SudFSMMGListener.onGameMGCommonGameASR.
IV. App Starts Listening to RTC Audio Streams
Once the App receives the MG_COMMON_GAME_ASR state with isOpen == true: Call the Zego interface to start capturing audio PCM data, initiating local PCM data collection by Zego Implement the IZegoAudioDataHandler interface object and set the original audio PCM data callback using ZegoExpressEngine.startAudioDataObserver(int bitmask, ZegoAudioFrameParam param) and ZegoExpressEngine.setAudioDataHandler(IZegoAudioDataHandler handler).
- Calling ZegoExpressEngine.startAudioDataObserver and ZegoExpressEngine.setAudioDataHandler:
startAudioDataObserver() is used to set the PCM data format: The audio slices passed to pushAudio are obtained from RTC as PCM data PCM data format must be: sample rate: 16000, sample bit depth: 16, number of channels: MONO The length of PCM data slices can be adjusted based on the effect. Longer length provides better accuracy but longer delay, while shorter length reduces delay but sacrifices accuracy Zego's audio slices default to 10ms, but the length can be adjusted for better results when passed to pushAudio.@Override public void startPCMCapture() { ZegoExpressEngine engine = getEngine(); if (engine != null) { /* Enable PCM data capture */ ZegoAudioFrameParam param = new ZegoAudioFrameParam(); int bitmask = 0; param.channel = ZegoAudioChannel.MONO; param.sampleRate = ZegoAudioSampleRate.ZEGO_AUDIO_SAMPLE_RATE_16K; bitmask |= ZegoAudioDataCallbackBitMask.CAPTURED.value(); engine.startAudioDataObserver(bitmask, param); /* Set the original audio data callback */ engine.setAudioDataHandler(zegoAudioDataHandler); } }
- Calling ZegoExpressEngine.startAudioDataObserver and ZegoExpressEngine.setAudioDataHandler:
- Implementing the IZegoAudioDataHandler interface object:
The onCapturedAudioData() callback method returns the local PCM data captured by RTC, with further processing explained in the next section.private final IZegoAudioDataHandler zegoAudioDataHandler = new IZegoAudioDataHandler() { @Override public void onCapturedAudioData(ByteBuffer data, int dataLength, ZegoAudioFrameParam param) { super.onCapturedAudioData(data, dataLength, param); ISudAudioEventListener listener = mISudAudioEventListener; if (listener != null) { AudioPCMData audioPCMData = new AudioPCMData(); audioPCMData.data = data; audioPCMData.dataLength = dataLength; listener.onCapturedPCMData(audioPCMData); } } };
- Implementing the IZegoAudioDataHandler interface object:
- Passing the RTC Captured PCM Data to the SDK
The onCapturedAudioData() callback method returns local PCM data slices, and the following method is used to pass the PCM data to the SDK:
The pushAudio interface can be called in a working thread. Zego's audio slices default to 10ms, but the length can be adjusted for better results when passed to pushAudio.// Audio stream data public void onCapturedAudioData(AudioPCMData audioPCMData) { sudFSTAPPDecorator.pushAudio(audioPCMData.data, audioPCMData.dataLength); }
- Passing the RTC Captured PCM Data to the SDK
The onCapturedAudioData() callback method returns local PCM data slices, and the following method is used to pass the PCM data to the SDK:
V. App Stops Listening to RTC Audio Streams
When the mini-game exits the ASR scene due to a hit or timeout, it will send a state notification to the App to stop capturing PCM data. Once the App receives the MG_COMMON_GAME_ASR state with isOpen == false, call the Zego interface ZegoExpressEngine.setAudioDataHandler(null) and ZegoExpressEngine.stopAudioDataObserver() to stop local PCM data collection by Zego.
@Override
public void stopPCMCapture() {
ZegoExpressEngine engine = getEngine();
if (engine != null) {
/* Set the audio data callback to null */
engine.setAudioDataHandler(null);
engine.stopAudioDataObserver();
}
}
VI. Playing Games with ASR Only
When playing games with ASR only, the App only needs to handle the MG_COMMON_GAME_ASR state to enable/disable local PCM data collection. There is no need to send mg_common_key_word_to_hit to the game as in text-based hits.
VII. Text-Based Hits for Mini Games with ASR
Mini-games with ASR scenes usually allow text input for hits alongside voice interaction. The game will notify the App of the hit scene using the mg_common_key_word_to_hit state. The App will receive this through the callback interface:
SudFSMMGListener.onGameMGCommonKeyWordToHit(ISudFSMStateHandle handle, SudMGPMGState.MGCommonKeyWordToHit model)
Text-based hit scenes in mini-games can be categorized into two types:
- Games where the App holds the keyword, like "Pictionary" and "Guess the Word". If model.word is not empty, the App needs to locally compare and determine the hit. After determining the hit, the App notifies the game through the sudFSTAPPDecorator.notifyAPPCommonSelfTextHitState method.
- Games where the App does not hold the keyword, like "Number Bomb". If model.word is empty, the App needs to send the text to the game each time for the game to determine the hit.
public void sendMsgCompleted(String msg) { if (msg == null || msg.isEmpty()) { return; } // Number Bomb if (sudFSMMGDecorator.isHitBomb() && HSTextUtils.isInteger(msg)) { sudFSTAPPDecorator.notifyAPPCommonSelfTextHitState(false, null, msg, null, null, null); return; } String keyword = gameKeywordLiveData.getValue(); if (keyword == null || keyword.isEmpty()) { return; } // Pictionary, check if the keyword is hit. Here, we use a contains check. Implement based on specific business needs. if (msg.contains(keyword)) { sudFSTAPPDecorator.notifyAPPCommonSelfTextHitState(true, keyword, msg, null, null, null); gameKeywordLiveData.setValue(null); } }