AI Voice Prompts for 3CX: Make Greetings, Hold Music & IVR in Minutes

Recording phone audio that 3CX will actually accept is a surprisingly annoying job. Voice Studio generates the speech with AI, conforms any existing file to spec, and drops it straight onto the right slot — queue, IVR, music-on-hold, voicemail, or holiday.

The real pain isn't writing the script — it's the file

Ask any MSP tech who has set up a few 3CX systems and they'll tell you the script is the easy part. "Thank you for calling Acme, press 1 for sales" takes thirty seconds to write. The friction is everything that happens next: turning those words into an audio file the PBX will accept without complaint.

3CX is specific about that file. Prompts, IVR menus, hold music, and voicemail greetings all have to be a WAV that is mono (one channel), 8 kHz sample rate, and 16-bit signed PCM. That is the telephony standard — narrow-band, single-channel audio — and almost nothing produces it by default. A voice memo off a phone is an m4a at 44.1 kHz. A track exported from a consumer editor is a 44.1 kHz stereo MP3. A "high quality" studio WAV is 24-bit at 48 kHz. Feed any of those to 3CX and you get a rejected upload or, worse, audio that plays back warbly and aliased because the sample rate is wrong.

So the job balloons. You either book a voice actor and wait for a deliverable, or you record it yourself and now you need a quiet room, a decent microphone, and an editor that can downmix to mono and resample to 8 kHz — which most people have to go and learn. Multiply that by every customer, every seasonal holiday message, every "can you change the greeting" ticket, and a thirty-second script becomes a half-hour chore. Across a fleet, it's death by a thousand small audio tasks.

AI voices remove the recording step entirely

The first half of the fix is to stop recording audio at all. Sikurd's Voice Studio includes text-to-speech built on lifelike AI voices: you type the script, choose a voice, and it synthesizes the greeting for you. There's no booth, no microphone, no talent booking, and no editing pass — the thing that used to require a studio is now a text box and a dropdown.

The voices are the modern kind, not the flat robotic text-to-speech of a decade ago. They handle punctuation, pacing, and emphasis on their own, so "For sales, press 1. For support, press 2." reads with natural cadence instead of a monotone. You can browse a catalogue spanning different genders and accents — American, British, and others — audition each one before you commit, and there's even a flow to email a set of samples straight to your client so they pick the voice that matches their brand. No more being the bottleneck on a subjective "which voice sounds right" decision.

Crucially, the synthesized output is already in 3CX's format — mono, 8 kHz, 16-bit PCM WAV. There's no post-processing step, no "now convert it" stage. What comes out of text-to-speech is a file 3CX will accept as-is.

Already have audio? The free WAV Converter conforms it

Generating speech covers the common case, but not every prompt is read aloud. Sometimes the customer hands you a branded jingle, a hold loop with their tagline mixed in, an old greeting recorded by their receptionist, or even a video whose voiceover is exactly what they want on the line. For that, Voice Studio includes a WAV Converter.

It accepts any audio or video file your browser can decode and turns it into a 3CX-spec WAV. Under the hood it decodes the source, mixes it down to a single mono channel, applies a low-pass filter at the telephone-bandwidth ceiling so nothing aliases when it's narrowed, resamples to 8 kHz, and encodes a clean 16-bit PCM WAV. It's even smart enough to recognise a file that already conforms and leave it untouched.

Two things make it genuinely convenient. First, it's free — and available publicly, so it's useful even outside the platform. Second, it runs entirely in your browser. All of that decoding, filtering, and resampling happens client-side via the Web Audio API: no server round-trip, no native software to install, and the customer's audio never leaves the machine it's on. For an MSP that's a real privacy and simplicity win — you're not uploading a client's audio to a third-party site to get a phone-ready file.

Where the prompts actually go — the part most tools skip

Here's the difference that matters day to day. Plenty of tools can produce an audio file. The tedious part is getting that file onto the right place in 3CX — the correct slot, on the correct instance, without fumbling through the management console.

Voice Studio is built as a guided wizard: target → source → review. You start by choosing the destination, then provide the audio (generate it with text-to-speech or upload and convert a file), then confirm and push. Because you pick the slot first, the result isn't a loose download sitting in your Downloads folder waiting to be uploaded by hand — it lands exactly where it belongs. The slots it can target:

  • Queue intro & on-hold announcements
    The greeting callers hear when they enter a call queue, plus the periodic on-hold messaging while they wait.
  • IVR / digital receptionist prompt
    The menu that routes callers — "press 1 for sales, 2 for support" — read in a consistent, professional voice.
  • Music-on-hold
    The instance's hold audio. Generate a spoken loop or convert a licensed track to the right format.
  • Voicemail greeting
    A per-user mailbox greeting, so individual extensions get a polished message without each person recording their own.
  • Holiday greeting
    Seasonal and closure messages — the ones you otherwise scramble to record the day before a long weekend.

And because Sikurd is a fleet tool, you do this against any instance you manage from one place. Onboarding a new customer's IVR, refreshing a hold message, or rolling a holiday greeting isn't a per-server expedition into individual 3CX consoles — it's the same three-step flow each time, pointed at whichever instance you choose.

A quick example, end to end

Say a customer is closing for a public holiday and wants callers to hear about it. The old way: write the message, record it (or wait on a voice actor), realise the file is a stereo 48 kHz export, find a converter, fix the format, log into their 3CX, hunt for the holiday prompt setting, upload, test. The Voice Studio way:

  1. Open Voice Studio and pick the customer's instance.
  2. Choose holiday greeting as the target slot.
  3. Type the message — "Thank you for calling Acme. Our offices are closed for the holiday and will reopen Monday at 9am." — and pick a voice. (Or upload an existing recording and let the converter conform it.)
  4. Review the generated, 3CX-spec audio and push it to the slot.

The script is still the thirty-second part. Everything that used to surround it — the recording, the format wrangling, the console spelunking — collapses into a guided flow that ends with the prompt live on the right system.

Why this matters at fleet scale

One prompt is a minor annoyance. A fleet of customers is where it compounds. Every IVR menu, every seasonal closure, every "can we update the greeting" request is a small audio production task, and small tasks repeated across dozens of PBXes are exactly the kind of work that quietly eats an MSP's hours. Voice Studio attacks the friction at every layer at once: it removes the recording (AI text-to-speech), removes the format problem (spec-correct output and a free converter), and removes the upload hunt (a wizard that assigns the slot for you).

And it's not a premium add-on. Sikurd's pricing is purely per managed instance — no tiers, no feature gating — so Voice Studio is included for every account, with your first three instances free. It's just part of managing the fleet.

Adjacent reading

Frequently asked questions

What audio format does 3CX require for prompts and greetings?
3CX expects a WAV file that is mono (1 channel), 8 kHz sample rate, and 16-bit signed PCM. Anything else — a 44.1 kHz stereo MP3 export, a 24-bit studio WAV, an m4a voice memo — will be rejected or play back distorted. Voice Studio's text-to-speech outputs that exact spec automatically, and the WAV Converter conforms any file you already have to it.
Do I need recording software or a voice actor to make 3CX prompts?
No. Voice Studio generates speech from text using lifelike AI voices, so you type the script and pick a voice — no microphone, booth, editing suite, or talent booking. If you do have existing audio (a jingle, a previously recorded greeting, a video with the right voiceover), the built-in WAV Converter turns it into a 3CX-spec file. Either way you skip the production step entirely.
Which 3CX audio slots can Voice Studio fill?
The guided wizard assigns audio to a specific slot on a specific instance: queue intro and on-hold announcements, the IVR / digital receptionist prompt, the instance's music-on-hold, a per-user voicemail greeting, or a holiday greeting. You pick the target first, then provide the audio, then review — so the file always lands in the right place rather than being a loose download you have to upload by hand.
Is the WAV Converter free, and does my audio get uploaded anywhere?
The WAV Converter is free and runs entirely in your browser. Decoding, downmixing to mono, low-pass filtering, resampling to 8 kHz, and encoding the 16-bit PCM WAV all happen client-side via the Web Audio API — the file never leaves your machine and there's no server round-trip. It accepts any audio or video file the browser can decode and gives you back a 3CX-ready WAV.
Will the AI voices sound robotic on a phone line?
Modern AI voices are a long way from the flat text-to-speech of a decade ago — they handle punctuation, natural pauses, and emphasis on their own, so a greeting reads like a person rather than a synthesizer. You can audition voices before committing and even email a set of samples to your client so they choose the one that fits their brand. The output is then bandlimited to telephone quality, which is exactly what callers expect to hear.
Does Voice Studio cost extra, or is it locked to a higher plan?
Voice Studio — the wizard, the AI text-to-speech, and the WAV Converter — is included for every Sikurd account. Sikurd bills purely per managed instance with no tiers and no feature gating, so there's no "upgrade to unlock voice" wall. Your first three instances are free, and every feature is available from the start.

Stop fighting sample rates. Generate the prompt, assign the slot, done.

Voice Studio turns a script into a 3CX-spec greeting and pushes it to the exact slot you choose — across every instance you manage. AI text-to-speech and the free WAV Converter are included for every account.