How can I convert these pandas columns containing strings into float while maintaining their meaning?











up vote
1
down vote

favorite












I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.



The csv file looks like this:



2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%


So far I have tried this:



df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']

df = df[~(df['Actual'].isin('|','<']))]
#df = df[~df.Actual.str.contains("|")]

df['Actual'] = df['Actual'].str.replace('%', '')
df['Forecast'] = df['Forecast'].str.replace('%', '')
df['Previous'] = df['Previous'].str.replace('%', '')

df['Actual'] = df['Actual'].str.replace('K', '000')
df['Forecast'] = df['Forecast'].str.replace('K', '000')
df['Previous'] = df['Previous'].str.replace('K', '000')

for i in df['Actual']: float(i)
for i in df['Forecast']: float(i)
for i in df['Previous']: float(i)


The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.



Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.



Hope someone can help. Thanks!










share|improve this question


























    up vote
    1
    down vote

    favorite












    I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.



    The csv file looks like this:



    2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
    2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
    2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
    2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%


    So far I have tried this:



    df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']

    df = df[~(df['Actual'].isin('|','<']))]
    #df = df[~df.Actual.str.contains("|")]

    df['Actual'] = df['Actual'].str.replace('%', '')
    df['Forecast'] = df['Forecast'].str.replace('%', '')
    df['Previous'] = df['Previous'].str.replace('%', '')

    df['Actual'] = df['Actual'].str.replace('K', '000')
    df['Forecast'] = df['Forecast'].str.replace('K', '000')
    df['Previous'] = df['Previous'].str.replace('K', '000')

    for i in df['Actual']: float(i)
    for i in df['Forecast']: float(i)
    for i in df['Previous']: float(i)


    The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.



    Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.



    Hope someone can help. Thanks!










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.



      The csv file looks like this:



      2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
      2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
      2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
      2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%


      So far I have tried this:



      df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']

      df = df[~(df['Actual'].isin('|','<']))]
      #df = df[~df.Actual.str.contains("|")]

      df['Actual'] = df['Actual'].str.replace('%', '')
      df['Forecast'] = df['Forecast'].str.replace('%', '')
      df['Previous'] = df['Previous'].str.replace('%', '')

      df['Actual'] = df['Actual'].str.replace('K', '000')
      df['Forecast'] = df['Forecast'].str.replace('K', '000')
      df['Previous'] = df['Previous'].str.replace('K', '000')

      for i in df['Actual']: float(i)
      for i in df['Forecast']: float(i)
      for i in df['Previous']: float(i)


      The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.



      Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.



      Hope someone can help. Thanks!










      share|improve this question













      I want to convert the columns Actual, Forecast, Previous into float so I can perform calculations on them. The csv also contains some NaNs which should stay in place.



      The csv file looks like this:



      2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
      2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
      2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
      2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%


      So far I have tried this:



      df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']

      df = df[~(df['Actual'].isin('|','<']))]
      #df = df[~df.Actual.str.contains("|")]

      df['Actual'] = df['Actual'].str.replace('%', '')
      df['Forecast'] = df['Forecast'].str.replace('%', '')
      df['Previous'] = df['Previous'].str.replace('%', '')

      df['Actual'] = df['Actual'].str.replace('K', '000')
      df['Forecast'] = df['Forecast'].str.replace('K', '000')
      df['Previous'] = df['Previous'].str.replace('K', '000')

      for i in df['Actual']: float(i)
      for i in df['Forecast']: float(i)
      for i in df['Previous']: float(i)


      The functions for getting rid of the | and < do not work. Many suggestions on the internet seem not to work with NaN values in the file.



      Also I cannot figure out how to replace the % while at the same time move the decimal so the number representation is correct.



      Hope someone can help. Thanks!







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 11 at 0:36









      the painted cow

      62




      62
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          Not the prettiest way to do this, but I believe this is what you want:



          from io import StringIO
          import pandas as pd
          df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
          2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
          2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
          2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')
          df = df.loc[~df['Actual'].str.contains('[|<]')]
          for col in ['Actual', 'Forecast', 'Previous']:
          df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
          df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000





          share|improve this answer




























            up vote
            0
            down vote













            Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.



            import pandas as pd

            # Importing economic calendar
            df = pd.read_csv('EconomicCalendar.csv')
            df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']


            # Remove no and low impact rows, remove votes beacuse of #format not convertable
            df = df[df.Impact != 'Non-Economic']

            event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']
            df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]

            for col in ['Actual', 'Forecast', 'Previous']:
            # Remove rows with certain formats not convertable
            df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]

            # Change %, K, M, B, T into numerics
            df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
            df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000
            df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000
            df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000
            df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000

            # Change all to numeric to perform calculation
            df[col] = pd.to_numeric(df[col])


            # Creating Surprise column which is Actual minus Forecast
            df['Surprise'] = df['Actual']-df['Forecast']





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244794%2fhow-can-i-convert-these-pandas-columns-containing-strings-into-float-while-maint%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              0
              down vote



              accepted










              Not the prettiest way to do this, but I believe this is what you want:



              from io import StringIO
              import pandas as pd
              df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
              2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
              2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
              2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')
              df = df.loc[~df['Actual'].str.contains('[|<]')]
              for col in ['Actual', 'Forecast', 'Previous']:
              df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
              df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000





              share|improve this answer

























                up vote
                0
                down vote



                accepted










                Not the prettiest way to do this, but I believe this is what you want:



                from io import StringIO
                import pandas as pd
                df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
                2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
                2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
                2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')
                df = df.loc[~df['Actual'].str.contains('[|<]')]
                for col in ['Actual', 'Forecast', 'Previous']:
                df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000





                share|improve this answer























                  up vote
                  0
                  down vote



                  accepted







                  up vote
                  0
                  down vote



                  accepted






                  Not the prettiest way to do this, but I believe this is what you want:



                  from io import StringIO
                  import pandas as pd
                  df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
                  2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
                  2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
                  2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')
                  df = df.loc[~df['Actual'].str.contains('[|<]')]
                  for col in ['Actual', 'Forecast', 'Previous']:
                  df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                  df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000





                  share|improve this answer












                  Not the prettiest way to do this, but I believe this is what you want:



                  from io import StringIO
                  import pandas as pd
                  df = pd.read_table(StringIO("""2018-01-04 04:30:00,GBP,Low Impact Expected,Mortgage Approvals,65K,64K,65K
                  2018-01-04 04:51:00,EUR,Low Impact Expected,Spanish 10-y Bond Auction,1.53|1.8,,1.49|2.0
                  2018-01-04 05:01:00,EUR,Low Impact Expected,French 10-y Bond Auction,0.79|1.4,,0.36|1.9
                  2018-01-04 07:30:00,USD,Low Impact Expected,Challenger Job Cuts y/y,-3.6%,,30.1%"""), names=['Date','Currency','Impact','Event','Actual','Forecast','Previous'], sep=',')
                  df = df.loc[~df['Actual'].str.contains('[|<]')]
                  for col in ['Actual', 'Forecast', 'Previous']:
                  df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                  df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 11 at 3:26









                  cosmic_inquiry

                  855210




                  855210
























                      up vote
                      0
                      down vote













                      Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.



                      import pandas as pd

                      # Importing economic calendar
                      df = pd.read_csv('EconomicCalendar.csv')
                      df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']


                      # Remove no and low impact rows, remove votes beacuse of #format not convertable
                      df = df[df.Impact != 'Non-Economic']

                      event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']
                      df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]

                      for col in ['Actual', 'Forecast', 'Previous']:
                      # Remove rows with certain formats not convertable
                      df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]

                      # Change %, K, M, B, T into numerics
                      df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                      df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000
                      df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000
                      df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000
                      df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000

                      # Change all to numeric to perform calculation
                      df[col] = pd.to_numeric(df[col])


                      # Creating Surprise column which is Actual minus Forecast
                      df['Surprise'] = df['Actual']-df['Forecast']





                      share|improve this answer

























                        up vote
                        0
                        down vote













                        Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.



                        import pandas as pd

                        # Importing economic calendar
                        df = pd.read_csv('EconomicCalendar.csv')
                        df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']


                        # Remove no and low impact rows, remove votes beacuse of #format not convertable
                        df = df[df.Impact != 'Non-Economic']

                        event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']
                        df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]

                        for col in ['Actual', 'Forecast', 'Previous']:
                        # Remove rows with certain formats not convertable
                        df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]

                        # Change %, K, M, B, T into numerics
                        df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                        df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000
                        df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000
                        df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000
                        df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000

                        # Change all to numeric to perform calculation
                        df[col] = pd.to_numeric(df[col])


                        # Creating Surprise column which is Actual minus Forecast
                        df['Surprise'] = df['Actual']-df['Forecast']





                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.



                          import pandas as pd

                          # Importing economic calendar
                          df = pd.read_csv('EconomicCalendar.csv')
                          df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']


                          # Remove no and low impact rows, remove votes beacuse of #format not convertable
                          df = df[df.Impact != 'Non-Economic']

                          event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']
                          df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]

                          for col in ['Actual', 'Forecast', 'Previous']:
                          # Remove rows with certain formats not convertable
                          df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]

                          # Change %, K, M, B, T into numerics
                          df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000

                          # Change all to numeric to perform calculation
                          df[col] = pd.to_numeric(df[col])


                          # Creating Surprise column which is Actual minus Forecast
                          df['Surprise'] = df['Actual']-df['Forecast']





                          share|improve this answer












                          Here is my current solution if anyone is interested, thanks to the help of cosmic_inquiry.



                          import pandas as pd

                          # Importing economic calendar
                          df = pd.read_csv('EconomicCalendar.csv')
                          df.columns = ['Date','Currency','Impact','Event','Actual','Forecast','Previous']


                          # Remove no and low impact rows, remove votes beacuse of #format not convertable
                          df = df[df.Impact != 'Non-Economic']

                          event_filter = ['Asset Purchase Facility Votes', 'Official Bank Rate Votes']
                          df = df.loc[~df['Event'].str.contains('|'.join(event_filter))]

                          for col in ['Actual', 'Forecast', 'Previous']:
                          # Remove rows with certain formats not convertable
                          df = df.loc[~df[col].str.contains('|'.join(['|','<']), na=False)]

                          # Change %, K, M, B, T into numerics
                          df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.contains('%'), col].str.replace('%', '')) / 100
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('K'), col].str.replace('K', '')) * 1000
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('M'), col].str.replace('M', '')) * 1000000
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('B'), col].str.replace('B', '')) * 1000000000
                          df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col] = pd.to_numeric(df.loc[pd.notnull(df[col]) & df[col].str.endswith('T'), col].str.replace('T', '')) * 1000000000000

                          # Change all to numeric to perform calculation
                          df[col] = pd.to_numeric(df[col])


                          # Creating Surprise column which is Actual minus Forecast
                          df['Surprise'] = df['Actual']-df['Forecast']






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 11 at 10:44









                          the painted cow

                          62




                          62






























                               

                              draft saved


                              draft discarded



















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244794%2fhow-can-i-convert-these-pandas-columns-containing-strings-into-float-while-maint%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Full-time equivalent

                              さくらももこ

                              13 indicted, 8 arrested in Calif. drug cartel investigation