Quantcast
Channel: SCN : All Content - SAP CRM: Marketing
Viewing all articles
Browse latest Browse all 1578

Harvesting Tweets into Social Intelligence tables using a Python Script

$
0
0

Disclaimer

 

This tutorial is intended as a guide for the creation of demo/test data only. The sample script provided is not intended for use in a productive system.


Purpose

The main purpose of this document is to show you how social posts can be provided for the product SAP Customer Engagement Intelligence (from Release 1.1., SP02 onwards).The following tutorial explains one way of collecting tweets from Twitter based on a search term in order to use them in the Sentiment Engagement workset of the Social Contact Intelligence solution.The pre-installed Python Interpreter from the SAP HANA client is used to execute a Python script from SAP HANA Studio (alternatively an Eclipse IDE like Aptana Studio can be used). The script inserts the collected tweets into Business Suite Foundation database tables SOCIALDATA and SOCIALUSERINFO in a configured schema. If you run the script with the default settings, it harvests 100 records in one run for a given search term. You can, however, change the script settings to retrieve more data per run.To run the script, you will also need to make a few customizing and configuration settings in order to use the Pydev Plugin in SAP HANA studio.

Prerequisites

Make sure that the following prerequisites are met before you start out:

  • Installation of SAP HANA studio and SAP HANA Client

Install SAP HANA studio and SAP HANA Client and apply for a HANA user with Read, Write,and Update authorization for foundation database tables SOCIALDATA and SOCIALUSERINFO in the respective schema.
Note down the following information, you will need it later (see section 'Customizing and Running the Script'):

    1. SAP HANA server name
    2. Port Number of SAP HANA server
    3. Database user and password
    4. Schema where the tables SOCIALDATA and SOCIALUSERINFO are located.

       

      Note:

       

      .The release of software component of the tables SOCIALDATA and SOCIALUSERINFO from SP03 (included in software component SAP_BS_FND 747 )

      Please check SAP help file https://help.sap.com/hana/SAP_HANA_Studio_Installation_Update_Guide_en.pdf for more information about SAP HANA studio installation.

      Tested with SAP HANA studio 64bit with minimal Version: 1.0.7200 and SAP HANA client 64bit, Microsoft Windows 7, 64 bit version.

       

  • Application Registration in Twitter
    If you don't already have them, register your application and get your consumer key and consumer secret as follows:
    1. Open the URL: https://apps.twitter.com/ and register your application.
    2. Note down the consumer key and the consumer secret.

 

Setup
1) Configuring Python in SAP HANA Studio Client

Python Version 2.6 is already embedded in SAP HANA client, so you do not need to install Python from scratch.To configure Python API  to connect to SAP HANA, proceed as follows.

  1. Copy and paste the following files from C:\Program Files\SAP\hdbclient\hdbcli to \Program Files\SAP\hdbclient\Python\Lib
    1. _init_.py
    2. dbapi.py
    3. resultrow.py
  2. Copy and paste the following files from C:\Program Files\SAP\hdbclient to C:\Program Files\SAP\hdbclient\Python\Lib
    1. pyhdbcli.pdb
    2. pyhdbcli.pyd

Note:

In Windows OS, by default the installation path is C:\Program Files\SAP\..  for a  64bit installation SAP HANA studio and SAP HANA Database client.

If you opted for a 32 bit installation, the default path is C:\Program Files(x86)\sap\..

2) Setting up the Editor to Run the File


     Carry out the following steps:

 

  1. Install Pydev plugin to use Python IDE for Eclipse
    The preferred method is to use the Eclipse IDE from SAP HANA studio. To be able to run the python script, you first need to install the Pydev plugin in SAP HANA studio:
    1. Open SAP HANA studio, click Help on menu tab and select Install New Software...
    2. Click the button Add... and enter the following information:
      py.jpg
    3. Select the settings as shown in this screenshot:  

      py2.jpg
    4. Press Next twice
    5. Accept the license agreements, then press Finish.
    6. Restart SAP HANA studio.
  2. Configure the Python Interpreter
    In SAP HANA studio, carry out the following steps:
    1. Select the menu entries Window -> Preferences.
    2. Select PyDev -> Interpreters -> Python Interpreter.
    3. Click New... button, type in an Interpreter Name. Enter in field Interpreter Executable the following executable file C:\Program Files\sap\hdbclient\Python\Python.exe. Press ok twice.
  3. Create a Python project
    In SAP HANA studio, carry out the following steps:
    1. Click File -> New -> Project..., then select Pydev project.
    2. Type in a project name(example: TWITTER_DATA_COLLECTION), then press Finish.
    3. Right-click on your project. Click New -> File, then type your file name (for example: data_harvest_tweets.py). Press Finish.
  4. Create an external library path
    In SAP HANA studio, carry out the following steps:
    1. Right-click on your project, select properties and Pydev-PYTHONPATH.
    2. Select the tab External Libraries, then click on Add source folder C:\Program Files\SAP\hdbclient\Python\Lib.
    3. Press Apply, then OK.
  5. Import the script   Alternatively you can save the python script with .py extension in your file location and import this file to your project.
    1. Copy the  Python script which is available at the end of this Document.
    2. In SAP HANA Studio,Open your newly created Python file from your project and paste the script.
    3. Click Save.

           py5.jpg

Customizing and Running the Script

1) Customizing and User Input for Python Script

  1. In your Eclipse editor, open the file data_harvest_tweets.py and maintain the following parameter in the START OF CUSTOMIZING STEPS section of the script:

    input.png
  2. Save the file.
    Feel free to adopt the search query and its options to your needs. Now you are ready to run the script.

 

2) Executing the Script

  1. Run the script from your editor.

    execute.png
  2. Enter the search term(s).
    Now you will receive data from Twitter, as you can see in your Eclipse IDE.
    The script can be run multiple times: It can extract data based on a search term for up to the last 7 days.
    You have now set up harvesting.

2) Checking the Results in the Database Tables

  1. Execute a select statement of your corresponding database tables.
  2. Check your tweets.

 

REFERENCES:

https://help.sap.com/hana/SAP_HANA_Studio_Installation_Update_Guide_en.pdf

https://apps.twitter.com/

https://dev.twitter.com/docs/using-search

http://scn.sap.com/community/developer-center/hana/blog/2012/06/08/sap-hana-and-python-yes-sir

 

 

 

#Python Script starts here
import urllib2;
import urllib;
import sys;
import json;
import time;
import uuid;
from time import gmtime, strftime
import dbapi;
import binascii;
#This script is used to harvest tweets from Twitter based on the Customizing settings below and then inserts the tweets to Social foundation database tables.(SOCIALDATA and SOCIALUSERINFO)
##########################################################START OF CUSTOMIZING STEPS################################################################################################################################
#1.create connection to database
#Please enter the input parameters to connect to HANA system)
server = ''            #HANA server
port =                   #integer value, so please don't enclose in quotes
username = ''     #Your Database system user name
password = ''         #password for your user
schema = ''          #HANA database table schema name
#2.customer access tokens (Please supply your consumer key and consumer secret token)
#https://apps.twitter.com/app/new (register your application and get access tokens)
c_key        = ''  #CONSUMER_KEY
c_sec        = ''  #CONSUMER_SECRET
#3.Proxy setting
proxy = '';  #if you have Proxy setting, specify it here..otherwise input an empty string
#4. Additional data to fill in database tables
client             = ''  #Client number for your database table
socialmediachannel = ''   #Social media channel name like 'TW'. You can enter Maximum 3 characters,this has to be in sync with SCI customizing
##########################################################END OF CUSTOMIZING ##########################################################################################################################################
#Please do not edit from here..
# Application based OAuth Authentication (OAuth2)
class OAuth2Helper(object):    # variables    c_key = "";         # Consumer key    c_sec = "";         # Consumer secret    proxy = "";    api_url = "";    # predefined    TYPE_TWITTER = "TWITTER";    # more    APIURL_TWITTER = "https://api.twitter.com/oauth2/token";    # api_url could be empty if type is given.    def __init__(self, api_url, customer_key, customer_secret, proxy=""):        self.api_url = api_url;        self.c_key = customer_key;        self.c_sec =customer_secret;        self.proxy = proxy;    # THIS FUNCITON SHOULD BE FINALLY CALLED    # for twitter: header    def generateHeader(self, atype=TYPE_TWITTER):        if atype == self.TYPE_TWITTER:            self.api_url = self.APIURL_TWITTER;            return "Bearer %s" % self.obtainBearerToken_twitter();    def obtainBearerToken_twitter(self):        # step 1. generate bearer token credential        bearer_token_credential =  "%s:%s" % (self.c_key, self.c_sec);        # step 2. generate Base64 encoded credential        base64_bearer_token_credential = binascii.b2a_base64(bearer_token_credential)[:-1];        # step 3. connect        handler = urllib2.BaseHandler();        proxy = self.proxy;        if proxy is not None and proxy != '':            #print ' Using proxy: ' + str(proxy)            handler = urllib2.ProxyHandler({'http': proxy, 'https': proxy})        try:            opener = urllib2.build_opener(handler);            opener.addheaders = [ ('Content-Type', "application/x-www-form-urlencoded;charset=UTF-8"),                                  ('Authorization', "Basic %s" % base64_bearer_token_credential)];                       data = opener.open(self.api_url, data="grant_type=client_credentials").read();            # step 5. parse json string            json_data = json.loads(data, encoding="utf-8");            return json_data["access_token"];        except:            print "[ERROR]\n%s" % ("\n".join("%s" % info for info in sys.exc_info()));            return None;        return base64_bearer_token_credential;    def escapeParameter(self, text):        return urllib.quote(str(text), safe="~");
#############################################################################################################################################################################
#checking mandatory customizing user settings : server,port, username, password, schema, c_key, c_sec
print 'checking the input customizing  settings \n'
count = 0;
var_input = server;
var_name = 'HANA Server name';
def check_config(var_name,var_input):    if (var_input == ""):        print var_name +" "+ 'is not filled.';        return;
while (count < 7):    if (count == 1):        var_input = port;        var_name = 'HANA port';    elif (count == 2):        var_input = username;        var_name = 'HANA system username';    elif (count == 3):        var_input = password;        var_name = 'HANA System password';    elif (count == 4):        var_input = schema;        var_name = 'HANA Database table schema';    elif (count == 5):        var_input = c_key;        var_name = 'Twitter Consumer Key';    elif (count == 6):        var_input = c_sec;        var_name = 'Twitter Consumer secret token';    check_config(var_name,var_input);    count = count + 1;
if( len(server) > 0 or len(port) > 0 or len(username) > 0 or len(password) > 0  or len(schema) > 0 or len(c_key) > 0 or len(c_sec) > 0 ):    print('Configuration settings is filled.. \n');
elif ( server == '' or port == '' or username == '' or password == '' or schema == '' or c_key == '' or c_sec == '' ):    sys.exit('Please fill in configuration steps in Python script and continue..');
#Searching tweets (Please input the search_term)
while True:    resp = raw_input("Please enter your search term? \n")    if resp == "":        resp = raw_input('If you wish to exit enter stop (or) enter your search term \n')    if resp == 'stop':        sys.exit();    if len(resp) > 0:        break;    if not resp:        continue
search_term = resp;
#HANA database connection
hdb_target = dbapi.connect(server, port, username, password);
cursor_target = hdb_target.cursor();
#Authenticating and Connecting to search_url to harvest data
search_url = "https://api.twitter.com/1.1/search/tweets.json";
apiurl = "https://api.twitter.com/oauth2/token";
params = [("q",search_term),("count","100"),("lang",''),("result_type","recent")];
oauth2 = OAuth2Helper(apiurl, c_key, c_sec, proxy);
# connection
handler = urllib2.BaseHandler();
header = oauth2.generateHeader();
#handling proxy
if proxy is not None and proxy != '':    print 'Twitter: Using proxy: ' + str(proxy)    handler = urllib2.ProxyHandler({'http': proxy, 'https': proxy})
#mapping tweets to the fields and inserting it to the database tables.
record_count = 0;
print ('Inserting records into table...');
try:    opener = urllib2.build_opener(handler);    opener.addheaders = [('Authorization', header)];    http_url = ("%s?%s") % (search_url, "&".join(['%s=%s' % (param, oauth2.escapeParameter(value)) for (param, value) in params]));    data = opener.open(http_url, data=None).read();    json_data_tweets = json.loads(data);    #while count < total_len :    for tweet in json_data_tweets["statuses"]:        tweet_id = tweet['id']        from_user_lang = tweet['lang']        id_str = tweet['id_str']        user = tweet['user']        from_user_id = user['id']        from_screen_name = user['screen_name']        from_user_name = user['name']        userProfileLink = 'https://twitter.com/%s' % (from_screen_name)        socialPostLink = 'https://twitter.com/%s/status/%s' % (from_user_id, id_str)        profile_image_url = user['profile_image_url']        replication_createdat = strftime("%Y%m%d%H%M%S", gmtime()) #get current time        created_at = tweet['created_at']        if created_at is None:            continue        created_at = time.strftime('%a, %d %b %Y %H:%M:%S +0000', time.strptime(tweet['created_at'], '%a %b %d %H:%M:%S +0000 %Y'))        text = tweet['text'].encode('utf-8')        text = text.replace('\n', '')        uuid_str = str(uuid.uuid1())        socialdatauuid = uuid_str.replace("-","")        client = client        socialmediachannel = socialmediachannel        if len(text) == 0:            continue        inssql = "insert into" +" "+ schema + ".SOCIALDATA (CLIENT, SOCIALDATAUUID, SOCIALPOST, LANGUAGE, SOCIALMEDIACHANNEL, CREATEDBYUSER, CREATIONDATETIME,  SOCIALPOSTLINK, CREATIONUSERNAME, SOCIALPOSTSEARCHTERMTEXT, SOCIALPOSTTEXT, CREATEDAT) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"        cursor_target.execute(inssql, (client, socialdatauuid, tweet_id, from_user_lang, socialmediachannel, from_user_id, created_at,  socialPostLink, from_user_name, search_term, text, replication_createdat))        inssql = "upsert" +" "+ schema + ".SOCIALUSERINFO (CLIENT, SOCIALMEDIACHANNEL, SOCIALUSER, SOCIALUSERPROFILELINK, SOCIALUSERACCOUNT, SOCIALUSERNAME, SOCIALUSERIMAGELINK, CREATEDAT) values (?, ?, ?, ?, ?, ?, ?, ?) with primary key"        cursor_target.execute(inssql, (client, socialmediachannel, from_user_id, userProfileLink, from_screen_name, from_user_name, profile_image_url, replication_createdat))        hdb_target.commit()        record_count = record_count + 1        print ('record number',record_count,created_at, from_user_name, tweet_id, text)
except:    print "[ERROR]\n%s" % ("\n".join("%s" % info for info in sys.exc_info()));
print 'Total number of records inserted: ', record_count;
#End of Python script

Viewing all articles
Browse latest Browse all 1578

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>