As the deadline have been extended, we are working future more to use of deep learning for recognised to detect dog barking which may harmful of the pedestrian.

We train and deploy under AWS SageMaker and connect our IoT device though AWS IoT by fine turning the pre-trained AudioSet by Google's open source as part of Tensorflow model,

we fine turning on top of this model as output only two variance: Angrydog or other, which acheive around 90% accuracy.


Amazon SageMaker

Amazon SageMaker is a service provided by AWS for run of Machine Learning, the platform provided easy of use tools for build of Machine Learning such as the SageMaker Autopilot running inside new Amazon SageMaker Studio for general data analysis, on the other hand, it also flexible to run custom tasks as specific the run machine and kernel allow install custom modules and git repository.

We first to study run under Amazon SageMaker Studio which provide Autopilot to automate model build, however it only accept csv good for business data analysis but not good for this job require analysis wavform, so we train our model by custom python script inside SageMaker Studio.

In additional, SageMaker Studio also not allow access by external web socket, which we could like to trigger the recognising jobs automatically, so we shift to use SageMaker Notebook instances, which allow us build of model and run of predict and connect outside by websocket inside a virtual machine.

We creative a new Notebook instances with import the Git repositories


The Notebook instances similar the Studio also run on top of Jupyterlab.


We active the Conda Environment tensorflow_p36 which pre-installed Tensorflow:

source activate tensorflow_p36


We also install additional lib: resampy, pysoundfile and libsndfile.

conda install -c conda-forge resampy
conda install -c conda-forge pysoundfile
conda install -c conda-forge libsndfile


Prepare the wav and call of SageMaker

Last blog, our sound clip send to Transcribe for speech recognition, here we redirect the wav file storage under S3 to trigger the SageMaker work for predict.


Change our voice_consumption function to invoke another Lambda function call test_sound_ai:


def run_ai(soundkey):
    event = '{"soundkey":"'+soundkey+'"}'   
    client = boto3.client('lambda')
    response = client.invoke(
    print (response)


Inside the test_sound_ai function we open the websocket terminals of the Notebook instances, start the and pass the arg of the sample's S3 key work for future process.

 ws = websocket.create_connection(
        host=http_hn, id
        origin=http_proto + "//" + http_hn
    ws.send("""[ "stdin", "source activate tensorflow_p36\\r" ]""")
    ws.send("""[ "stdin", "cd /home/ec2-user/SageMaker/models/research/audioset/yamnet\\r" ]""")
    ws.send("""[ "stdin", "python """+soundkey+"""\\r" ]""")



Use of Yamnet Default model

The Yamnet is the pretrained model which provide 521 audio class based on AudioSet,

which included class relative to dog sound:

67 Animal, 69 Dog, 70 Bark is highly relative to our requirement, so if we direct use of this Yamnet default model, we simply calculate the score on top of those classes.


 dogscore = 0
    for i in top5_i:
      if yamnet_id[i] == 67:
        dogscore += prediction[i] * 0.25
      elif yamnet_id[i] == 68:
      elif yamnet_id[i] == 69:
      elif yamnet_id[i] == 79:
     # we hate cat
      elif yamnet_id[i] == 76:



The direct use the dogscore to trigger the device play the alarm sound "Cat Meow" download from S3 storage and send though AWS IoT MQTT.

if dogscore < 0.5 :
    s3 = boto3.resource('s3')
    obj = s3.Object('voicerecognise','alarm.pcm')
    alarm = obj.get()['Body'].read()
    total_alarm_section = int(len(alarm)/1536)
    alarm_section = total_alarm_section
    while alarm_section:
        section_data = base64.b64encode(alarm[alarm_section*1536:(alarm_section+1)*1536]).decode("utf-8")
        message = "{ \"requests\":\"alarm\",\"section\":\""+str(alarm_section)+"\",\"totalsection\":\""+str(total_alarm_section)+"\",\"data\":\""+ section_data + "\"}"
            aitopic = things+'/ai/get'
            response = iotclient.publish(
            print ("UnauthorizedException")


Fine-Turning of Angry Dog

The default Yamnet only detect dog barking, not identify of good dog and bad dog which the barking is angry and likely to attack the innocent, so we need Fine-turning.

After some research and evaluate,  By reference of, we use some approval for fine-turning as give best result,  which is extract the dense before default classifying, by train on top of this dense, as result only with two classified : angrydog and other.


The samples

Under we download and select some very angry dog samples and put into the S3 Bucket "soundsample/angrydog",

we also download some sample for other included Cat, bird, and noise mostly outside of house and put under "soundsample/other".

We also included some pretty dogs sound to help Machine learning to find of different between bad and good dog, the overall accuracy reduced compare just check of dog sound because it harder to recognised.


We also download some sample from Free Sound Clips |  with separated with angrydog and other for test (evaluation) purpose, which used for find the final accuracy of our model.



print(" Loaded samples: " , samples.shape, samples.dtype,  classtypesindex.shape)

input_layer = layers.Input(shape=(1024,))
output = layers.Dense(1024, activation=None)(input_layer)
output = layers.Dense(2, activation='softmax')(output)
model = Model(inputs=input_layer, outputs=output)
opt = SGD(lr=0.002, decay=1e-5, momentum=0.8, nesterov=True)   
model.compile(optimizer=opt, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history =, classtypesindex, epochs=160, validation_split=0.200)
history =, classtypesindex, epochs=5)

samples = []
classtypesindex = []
classtypes = []

test_mse_score, test_mae_score = model.evaluate(samples,classtypesindex)"angrydog.h5", include_optimizer=False)

We load the dataset, shuffle it and train the model on top of default Yamnet.

We also evaluate the accuracy by test samples, as result we got about 90% accuracy.

Because we put bad/good dog barking to confusing the deep learning process so the accuracy not much good, even human sometime is hard to identify the friendly or danger dog sound.


At the end, we save the model for future use.


The termial output like this:

400/400 [==============================] - 0s 57us/sample - loss: 0.5594 - acc: 0.8700
Test Mse: {}, Test Mae: {} 0.5594366431236267 0.87
Model: "model"
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 1024)]            0         
dense_1 (Dense)              (None, 1024)              1049600   
dense_2 (Dense)              (None, 2)                 2050      
Total params: 1,051,650
Trainable params: 1,051,650
Non-trainable params: 0

The accuracy of our test samples is 0.87, as we are intended to put challenge such as small dog barking which is very hard to identify.


We online find some Youtube sound for evaluate the result, for cute dogs the score of angrydog around 0.35, and for angry dog sound the score of angry dog mostly higher that 0.8,

so we set the alarm threshold to 0.8 for the alarming as play Cat meow ten times.


This require about one minutes for upload the sound clips, pass to SageMaker, working for AI sound recognition process and send back result though AWS IoT.


Demo showed on above Video for reference.



By using Amazon SageMaker and Lambra function, we are success to received sound clip pick from Cypress Pioneer Kit PDM's Mic to recognise Angry Dog sound and feedback to Cypress Pioneer Kit's Internal Sound Card to play the alarm all though AWS Cloud.



Our final IOT platform is:


Source Code

We updated the previous summary and the final source code can be download though