How does scribd prevent download

robo picture robo · May 11, 2018 · Viewed 7.7k times · Source

when reading BOOKS on scribd.com the download functionality is not enabled. even browsing through the html source code I was unable to download the actual book. Great stuff ... but HOW did they do this ? I am looking to implement something similar, to display a pdf (or converted from pdf) in such a way that the visitor cannot download the file

Most solutions I have seen are based on obfusticating the url.. but with a little effort people can find the url and download the file. ScribD seems to have covered this quite well..

Any suggestions , ideas how to implement such a download protection ?

Answer

Leandro Luque picture Leandro Luque · Jan 27, 2020

It actually works dinamically building the HTML based on AJAX requests made while you're flipping pages. It is not image based. That's why you're finding it difficult to download the content.

However, it is not that safe for now. I present a solution below to download books that is working today (27th Jan 2020) not for teaching you how to do that (it is not legal), but to show you how you should prevent (or, at least, making it harder) users from downloading content if you're building something similar.

If you have a paid account and open the book page (the one that opens when you click 'Start Reading'), you can download an image of each book page by loading a library such as dom-to-image.

For instance, you could load the library using the developer tools (all code shown below must be typed in the page console):

if (injectDomToImage == undefined) {
    var injectDomToImage = document.createElement('script');
    injectDomToImage.src = "https://cdnjs.cloudflare.com/ajax/libs/dom-to-image/2.6.0/dom-to-image.min.js";
    document.getElementsByTagName('head')[0].appendChild(injectDomToImage);
}

And then, you could define functions such as these:

function downloadPage(page, prefix) {
    domtoimage.toJpeg(document.getElementsByClassName('reader_and_banner_container')[0], {
            quality: 1,
        })
        .then(function(dataUrl) {
            var link = document.createElement('a');
            link.download = `${prefix}_page_${page}.jpg`;
            link.href = dataUrl;
            link.click();
            nextPage(page, prefix);
        });
}

function checkPageChanged(page, oldPageCounter, prefix) {
    let newPageCounter = $('.page_counter').html();
    if (oldPageCounter === newPageCounter) {
        setTimeout(function() {
            checkPageChanged(page, oldPageCounter, prefix);
        }, 500);
    } else {
        setTimeout(function() {
            downloadPage(page + 1, prefix);
        }, 500);
    }
}

function nextPage(page, prefix) {
    let oldPageCounter = $('.page_counter').html();
    $('.next_btn').trigger('click');
    // Wait until page counter has changed (page loading has finished).
    checkPageChanged(page + 1, oldPageCounter, prefix);
}

function download(prefix) {
    downloadPage(1, prefix);
}

Finally, you could download each book page as a JPG image using:

download('test_');

It will download each page as test_page_.jpg

In order to prevent such type of 'robot', they could, for example, have used Re-CAPTCHA v3 that works in background seeking for 'robot'-like behaviour.