WebGrab+Plus

  • Whoa!
    WebGrab+Plus is really complicated to configure!
    I have removed the previews as they overwrite the mdb ini files each time the service is started.
    The easiest therefore remains to copy the mdb ini files manually.
    What a mess
    [hr]
    I have updated the addon to integrate MDB post processing.

    Preview releases (for all projects and architectures)


    The corresponding code is here.

    Workflow

    • If before.sh script exists, it is called
    • WebGrab+Plus creates guide_wgp.xml
    • If MDB post-processing is enabled, it creates guide_mdb.xml, which is then moved to guide_wgp.xml
    • xmltv_time_correct uses guide_wgp.xml to create guide_xtc.xml
    • guide_xtc.xml is copied to guide.xml
    • WG2MP uses guide.xml to create mediaportal.xml
    • If after.sh script exists, it is called


    Changing the file names will break the workflow, ie steps 4 to 6 will fail.

    Schedule
    The workflow is executed when the service is (re)started and subsequently every six hours.
    Disabling/enabling the addon with Kodi restarts the service

    To enable MDB post-processing

    • In the addon home directory, copy the files from the "http://siteini.pack/MDB postprocessor" folder into the "mdb" folder.
    • Configure xml and ini files in the "mdb" folder
    • Install the appropriate preview release


    Please provide your feedback on this thread.
    Thank you for testing

  • Hi awiouy,

    The mdb postproccessing finished so everything went fine.

    Please can you tell me what are the right settings for processing tv series?

    I have this result in webgrab++.log

    [ Debug ] MDB Postprocessor result
    [ Debug ] -------------------------------
    [ Debug ] Mdb data found for 484 out of 865 Movies-candidates
    [ Debug ] Mdb data found for 0 out of 1761 Series-candidates
    [ Debug ] In 1413.51 seconds = 2.92 seconds/matched-show

    I think i missed som settings for

    <selectmovie> and <selectserie> settings or something?

  • Thank you for confirming MDB post-processing works (with the preview release, right?)

    selectserie and matchserie in mdb.config.xml are the defaults provided with WebGrab+Plus.
    I do not know what the right settings are.

    Maybe someone on this thread can help you out.
    Otherwise, you will certainly find help on the WebGrab+Plus forum, or at the WebGrab+Plus configurator thread.

  • Is the work you are doing similar to the WebGrab+Plus Configurator? I already use that to do mdb post processing.

    Also, thanks to MikeKL I have messed around with cron settings because I too didn't want it auto running immediately after startup.

    Sent from my OnePlus 2


  • Is the work you are doing similar to the WebGrab+Plus Configurator? I already use that to do mdb post processing.

    Also, thanks to MikeKL I have messed around with cron settings because I too didn't want it auto running immediately after startup.

    Sent from my OnePlus 2

    Hello Iain!

    The projects are similar, indeed!
    They even went live at about the same time.
    I trust they are complementary.

    In fact, this addon is a spin-off and use case of the Mono for LibreELEC addon.
    The Mono addon enables WebGrab+Plus, Emby and other Mono applications to run on LibreELEC.
    I am happy to say that he Mono addon is now available and stable for all LibreELEC projects and architectures.
    I am confident that WebGrab+Plus is on LibreELEC to stay, in any form.

    As for running (or not) WebGrab+Plus at startup:
    WebGrab+Plus does not interfere with system startup (top shows usage of 10% of one CPU, RPi2 has four CPU).
    Moreover, the service only updates guide.xml at the end of its processing, ie guide.xml does not change during the processing.
    There is therefore no objective reason not to run the service at system startup.
    On the contrary, this ensures that guide.xml will be updated as soon as possible if the system was shut down for a longer period of time.
    Incidentally, if I remember well, this feature was suggested by MikeKL, when he came back from a leave.

    Enjoy!

  • Thank you for confirming MDB post-processing works (with the preview release, right?)

    selectserie and matchserie in mdb.config.xml are the defaults provided with WebGrab+Plus.
    I do not know what the right settings are.

    Maybe someone on this thread can help you out.
    Otherwise, you will certainly find help on the WebGrab+Plus forum, or at the WebGrab+Plus configurator thread.

    Sorry didn't try the preview release yet i wil try this tomorrow.
    If i update to the new preview release do i have to copy the mdb file again to the mdb folder?

    I will check the webgrab+Plus forum thanks!

  • Yeah. I implement MikeKL' s ideas. The reason I don't want it to initiate at startup is because I auto execute open VPN, and the start time in the WebGrab logs is well off - probably due to the VPN connection.

    So I have a valid reason even if it is niche.

    Sent from my OnePlus 2

  • Sorry didn't try the preview release yet i wil try this tomorrow.
    If i update to the new preview release do i have to copy the mdb file again to the mdb folder?

    I will check the webgrab+Plus forum thanks!

    Thank you for testing.
    You do not have to copy the mdb files again, if you already have.
    [hr]


    Yeah. I implement MikeKL' s ideas. The reason I don't want it to initiate at startup is because I auto execute open VPN, and the start time in the WebGrab logs is well off - probably due to the VPN connection.

    So I have a valid reason even if it is niche.

    Sent from my OnePlus 2

    You could use before.sh script to wait for the VPN to be online.

  • @awiuoy

    I currently believe the service that runs immediately after a boot may have a minor issue in sense that webgraplus log gets confused about when boot job actually started/finished. I could not put my finger exactly on how this minor issue occurs but believe the overall recorded job time could be between previous and latest pi re-boot?
    (Adding two boot webgrabplus service jobs together "Start/Finish timewise" if that makes any sense to you)

    Note: Only noticed this occuring for the service job that runs immediately after a boot, all other service jobs running every six hours appear to contain a correctly record time taken to run job in the webgrabplus log. Note: I am running milhouse nighly versions of LibreELEC Krypton.


  • @awiuoy

    I currently believe the service that runs immediately after a boot may have a minor issue in sense that webgraplus log gets confused about when boot job actually started/finished. I could not put my finger exactly on how this minor issue occurs but believe the overall recorded job time could be between previous and latest pi re-boot?
    (Adding two boot webgrabplus service jobs together "Start/Finish timewise" if that makes any sense to you)

    Note: Only noticed this occuring for the service job that runs immediately after a boot, all other service jobs running every six hours appear to contain a correctly record time taken to run job in the webgrabplus log. Note: I am running milhouse nighly versions of LibreELEC Krypton.

    Hello MikeKL!

    The webgrabplus service starts when network is up, which may be before LibreELEC synchronizes its clock with network time servers.
    This produces a time gap in the log, as in the example below.
    This does however not affect WebGrab+Plus, which does not rely on the system clock.

    Code
    -- Logs begin at Fri 2016-08-26 15:43:47 CEST, end at Tue 2016-08-30 19:51:59 CEST. --
    Aug 26 15:43:50 Salon systemd[1]: Started WebGrab+Plus.
    Aug 26 15:43:53 Salon sh[380]: User defined pre-processing
    Aug 26 15:43:53 Salon sh[380]: Calling user defined pre-processing script /storage/.kodi/userdata/addon_data/service.webgrabplus/before.sh
    Aug 26 15:43:53 Salon sh[380]: HELLO
    Aug 30 19:48:48 Salon sh[380]:              WebGrab+Plus/w MDB & REX Postprocess -- version  V1.57
    Aug 30 19:48:48 Salon sh[380]:                                 Jan van Straaten
    Aug 30 19:48:48 Salon sh[380]:                              Francis De Paemeleere
    Aug 30 19:48:48 Salon sh[380]:             thanks to Paul Weterings and all the contributing users

    If you need to delay starting WebGrab+Plus, eg to wait for the VPN to be available, simply create a corresponding before.sh script in the home folder of the addon.

    For example, the script below delays WebGrab+Plus by 15 seconds:

    Code
    sleep 15
  • Thanks for the explanation, yes I already realised it had no actual impact on the actual result of job, just thought it useful (even though late) to report very minor issue in this thread.

    And thanks for the sleep idea, using your recently added before script, assume can use it to set first webgrab run to "anytime" after boot, never thought about that

    ---edit---Looked into answering own question, about setting length of sleep time -> Linux / UNIX: Bash Script Sleep or Delay a Specified Amount of Time – nixCraft

    Edited once, last by MikeKL (August 31, 2016 at 6:52 AM).


  • awiouy - are the changes for MDB processing available in the current release version? I went to install the preview version but the link is no longer valid.

    Thanks again for your hard work.

    Hello!

    The changes for MDB processing are available in addon revision x.0.103.

    Once this revision is installed:

    • in the addon home directory, copy the files from the "http://siteini.pack/MDB postprocessor" folder into the "mdb" folder and
    • configure xml and ini files in the "mdb" folder
  • @awiouy I have read most of this thread ... regarding webgrab+plus timers for grabbing from sites. I can't work out why you need to run it every 6 hours. (or twice in one day)
    Even if it was scraping 1 days worth of epg data , isn't hitting the tv guide sites up this often risking the chance these sites might block web scraping ?

    I had my old setup pulling the guide Mon, Thurs & Sat. at the most. twice in one day, everyday is excessive. Thanks for the work on the addon btw

  • highkick05

    I know that the current scheduling is not optimal for everyone.
    However, for now, it runs and fits most use cases (eg LE permanently on, or LE on for the duration of a movie/TV show).
    Incidentally, unless specified otherwise, the is guide merely updated (as opposed to re-created) at every run, and it is not updated in place.

    I intend to move scheduling from system.d to Kodi/Python, but I did not yet find the time to do it, and the desiderata are plethoric and sometimes incompatible.

    Note that contributions to the addon in LE github are welcome!


  • highkick05

    I know that the current scheduling is not optimal for everyone.
    However, for now, it runs and fits most use cases (eg LE permanently on, or LE on for the duration of a movie/TV show).
    Incidentally, unless specified otherwise, the is guide merely updated (as opposed to re-created) at every run, and it is not updated in place.

    I intend to move scheduling from system.d to Kodi/Python, but I did not yet find the time to do it, and the desiderata are plethoric and sometimes incompatible.

    Note that contributions to the addon in LE github are welcome!

    I know it may not be optimal for LibreElec, but it's not really optimal for WebGrab+Plus.'s purpose

    Most of the people @ WebGrab+Plus developing the siteini's probably wouldn't appreciate that these grabs are going twice a day 24x7. It will end up getting the web scrape banned.

    My opinion the cron settings more configurable the best option for everyone's specific needs :angel: